Data Publication
Guidelines

Guidance and best practices for publishing data

This document provides data curation guidance and best practices for researchers who will use the NHERI DesignSafe cyberinfrastructure (CI) to share and publish natural hazards engineering data. Specific guidelines are provided for researchers using the NHERI Experimental Facilities (EF), for researchers using the NHERI RAPID facility, and for researchers performing simulation-based research.

Data curation is made up of all the activities undertaken to generate organized and documented data that is easy to re-use. Using data management tools in DesignSafe, researchers are empowered to progressively curate their own data as their research progresses. On demand assistance from a curator is available to provide training and to guide users through their data curation and publication needs. When curation is complete, researchers can publish the dataset with a permanent digital object identifier (DOI) that allows the data to be easily located on the web and cited. Features are in place to ensure the authenticity, integrity, security and persistence of the datasets for open access. DesignSafe is committed to the continuity of data preservation beyond the conclusion of the DesignSafe project.

To cite the use of DesignSafe in your research, please reference the following paper:

Rathje, E., Dawson, C. Padgett, J.E., Pinelli, J.-P., Stanzione, D., Adair, A., Arduino, P., Brandenberg, S.J., Cockerill, T., Dey, C., Esteva, M., Haan, Jr., F.L., Hanlon, M., Kareem, A., Lowes, L., Mock, S., and Mosqueda, G. 2017. “DesignSafe: A New Cyberinfrastructure for Natural Hazards Engineering,” ASCE Natural Hazards Review, doi: 10.1061/(ASCE)NH.1527-6996.0000246

Data Sharing and Publishing

DesignSafe provides an end-to-end data management, analysis and publication platform for both experimental and simulation-based research. Within the DesignSafe Data Depot, researchers have access to a private “My Data” space, a collaborative “My Projects” space, and a “Published” space for published datasets.

All research data collected as part of a research project, as well as processing scripts, data analysis products, and simulation models/results generated, can be deposited in the Data Depot from the inception of the project. These data are kept private within a Project until published by the research team. Using a Project to share data with your team members during the course of a project facilitates the progressive curation of data and eventual publishing.

Each research team is responsible for curating its data using the data management tools provided by DesignSafe. These tools help researchers organize, categorize, and describe their data within the DesignSafe Data Depot. Assistance from a curator is available to provide training and to guide users through the data curation process. After data is curated and ready to be published, it will be vetted against the research community’s minimum metadata requirements (www.designsafe-ci.org/rw/support/data-publication) before moving on to receive a DOI for persistent identification and ease of data sharing and citation. Researchers using published data from the DesignSafe Data Depot must cite it using the DOI which includes the DataCite schema for accurate citation (http://schema.datacite.org/).

Guidelines Regarding the Storage and Publication of Protected Data in DesignSafe-CI

Researchers should always comply with the requirements, norms and procedures approved by the Institutional Review Board (IRB) or equivalent body, regarding human subjects’ data storage and publication.

Protected data includes human subjects data with Personal Identifiable Information (PII), data that is protected under HIPPA, FERPA and FISMA regulations, as well as data that involves vulnerable populations and that contains sensitive information.

Storing Protected Data

DesignSafe My Data and My Projects are secure spaces to store raw protected data as long as it is not under HIPPA, FERPA or FISMA regulations. If data needs to comply with these regulations, researchers must contact DesignSafe through a help ticket to evaluate the case and use TACC‘s Protected Data Service. Researchers with doubts are welcome to send a ticket or join curation office hours.

Publishing Protected Data

To publish protected data researchers should adhere to the following procedures:

  1. Do not publish HIPPA, FERPA, FISMA, PII data or other sensitive information in DesignSafe.
  2. To publish protected data and any related documentation (reports, planning documents, field notes, etc.) it must be properly anonymized. No direct identifiers and up to three indirect identifiers are allowed. Direct identifiers include items such as participant names, participant initials, facial photographs (unless expressly authorized by participants), home addresses, social security numbers and dates of birth. Indirect identifiers are identifiers that, taken together, could be used to deduce someone’s identity. Examples of indirect identifiers include gender, household and family compositions, occupation, places of birth, or year of birth/age.
  3. If a researcher needs to restrict public access to data because it includes HIPPA, FERPA, PII data or other sensitive information, consider publishing metadata and other documentation about the data.
  4. Users of DesignSafe interested in the data will be directed to contact the project PI or designated point of contact through a published email address to request access to the data and to discuss the conditions for its reuse.
  5. Please contact DesignSafe through a help ticket or join curation office hours prior to preparing this type of data publication.

Citing datasets in papers

Researchers using published data from the DesignSafe Data Depot must cite it using the DOI, which relies on the DataCite schema for accurate citation. For convenience, users can retrieve a formatted citation from the published data landing page. It is always recommended to insert the citations in the reference section of your paper.

Reusing data from other sources

Frequently you use data from other sources in your research and sometimes you even want to re-publish it. It is always a good practice to give credit to the data creators and or make sure you can re-publish the data if you need to. Please, be aware of the following:

  1. If you cite the data, make sure there is preferably a DOI or a permanent URL in the citation so that users can get directly to the cited data. Use the Related Work box in Edit Project to include the citation/s and corresponding links.
  2. If you use external data in your analyses, you can point to it from the Referenced Data Title box as you create your analyses category.
  3. Be aware of the reused data original license conditions of usage. The license may specify if and how you can modify, distribute, and cite the reused data.
  4. If you have reused images from other sources (online, databases, publications, etc.), be aware that they may have copyrights. We recommend using the following instructions for how to cite them: http://guides.library.ubc.ca/c.php?g=698822&p=4965735

Responsibilities and Timelines

Researchers working at a NHERI EF will receive their bulk data files via the Data Depot. NHERI EF staff will deposit the data files into an existing Project created for the research project. For all other types of research (e.g., simulation, experimental work performed at a non-NHERI lab), it will be the responsibility of the research team to upload their data to the Data Depot. As noted previously, the research team is responsible for data curation and publishing. Although no firm timeline requirements are specified for data publishing, researchers are expected to publish in a timely manner. Recommended timelines for publishing different types of research data (i.e., Experimental, Simulation, and Reconnaissance) are listed in Table 1.

Guidelines specific to RAPID reconnaissance data can be found at rapid.designsafe-ci.org/media/filer_public/b3/82/b38231fb-21c9-41f8-b658-f516dfee87c8/rapid-designsafe_curation_guidelines_v3.pdf

Table 1. Recommended Publishing Timeline for Different Data Types

Project/Data Type

Recommended Publishing Timeline

Experimental

12 months from completion of experiment

Simulation

12 months from completion of simulations

Reconnaissance: Immediate Post-Disaster

3 months from returning from the field

Reconnaissance: Follow-up Research

6 months from returning from the field

Data Embargo

In DesignSafe-CI this refers to time during which a project is not made public awaiting for the review and publication of a corresponding paper. Please submit a help ticket to Data Curation & Publication and we will work with you to accomplish the following requests:

  • to provide access to reviewers before publishing your data;
  • to publish a dataset and obtain a DOI at the same time that you publish the corresponding paper in a journal.

Metadata Requirements

Overview of metadata best practices implementation in DesignSafe Metadata is information that describes data. Metadata schemas provide a structured way for users to share metadata within and across domains. Because there is no standard schema to describe natural hazards engineering research and data, DesignSafe offers metadata sets to describe key components of datasets. These were developed in close consultation with researchers in the natural hazards community. The terms are evolving, and they are and will continue to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their publications. 

To further help describe data, DesignSafe offers the ability to add predefined and custom file tags during the curation process. The file tags are agreed upon terms provided by the natural hazard community. These are optional, but highly recommended. Users can add multiple file tags to one file and add tags to folders.

DesignSafe’s metadata approach maps community terms to elements of widely-used, standardized schemas so that metadata can be exchanged with other platforms. The schemas to which terms have been mapped are: Dublin Core for description of the research project and the data publication, DDI (Data Documentation Initiative) for social science data description, and DataCite for DOI assignment and citation.

Due to variations in research domains and their methods, users may not need to use all of the elements available to describe their research. However, we identified a set of metadata terms that represent the structure of the data, are useful for discovery, and will allow proper citation of data. To ensure the quality of published data in DesignSafe, when users request to publish data the system checks for completeness of these core terms and or whether data are associated with them. The element set is shown below.

KEY (to help understand usage of the metadata below)

Metadata not marked with $ or † is required.

(*) The metadata is repeatable, with multiple entries allowed.

($) Recommended if exists.

(†) System-generated.

 

Experimental Research Project
View Metadata Dictionary

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Experiment*
    • Report
    • DOI
    • Experiment Title
    • Author (PIs/Team Members)*
    • Experiment Description
    • Date of Publication
    • Dates of Experiment
    • Experimental Facility
    • Experiment Type
    • Equipment Type*
    • Model Configuration*
    • Sensor Information*
    • Event*
    • Experiment Report$
  • Analysis*$
    • Analysis Title
    • Description
    • Referenced Data*

 

Simulation Research Project
View Metadata Dictionary

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Simulation*
    • Report
    • Simulation Title
    • Author (PIs/Team Members)*
    • Description
    • Simulation Type
    • Simulation Model
    • Simulation Input*
    • Simulation Output*
    • Referenced Data*
    • Simulation Report$
  • Analysis*$
    • Analysis Title
    • Description
    • Referenced Data*

 

Hybrid Simulation Research Project
View Metadata Dictionary

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Hybrid Simulation*
    • Report
    • Global Model
      • Global Model Title
      • Description
    • Master Simulation Coordinator
      • Master Simulation Coordinator Title
      • Application and Version
      • Substructure Middleware
    • Simulation Substructure*
      • Simulation Substructure Title
      • Application and Version
      • Description
    • Experiment Substructure*
      • Experiment Substructure Title
      • Description

 

Field Research Project
View Metadata Dictionary

  • Project Title
  • PI/Co-PI(s)*
  • Project Type
  • Description
  • Related Work(s)*$
  • Award(s)*$
  • Keywords
  • Natural Hazard Event
  • Natural Hazard Date
  • Documents Collection*$
    • Author(s)*
    • Date of Publication
    • DOI
    • Publisher
    • License(s)*
    • Referenced Data*$
    • Description
  • Mission*
    • Mission Title
    • Author(s)*
    • Date(s) of Mission
    • Mission Site Location
    • Date of Publication
    • DOI
    • Publisher
    • License(s)*
    • Mission Description
    • Research Planning Collection*$
      • Collection Title
      • Data Collector(s)*
      • Referenced Data*$
      • Collection Description
    • Social Sciences Collection*
      • Collection Title
      • Unit of Analysis$
      • Mode(s) of Collection*$
      • Sampling Approach(es)*$
      • Sample Size$
      • Date(s) of Collection
      • Data Collector(s)*
      • Collection Site Location
      • Equipment*
      • Restriction$
      • Referenced Data*$
      • Collection Description
    • Engineering/Geosciences Collection*
      • Collection Title
      • Observation Type*
      • Date(s) of Collection
      • Data Collector(s)*
      • Collection Site Location
      • Equipment*
      • Referenced Data*$
      • Collection Description

 

Other
View Metadata Dictionary

  • DOI
  • Project Title
  • Author(s)*
  • Data Type
  • Description
  • Publisher
  • Date of Publication
  • License(s)
  • Related Works*$
  • Award*
  • Keywords

Licensing

Within DesignSafe, you will choose a license for your material. Because the DesignSafe Data Depot is an open repository, the following licenses will be offered:

  • For datasets: ODC-PDDL and ODC-BY
  • For copyrightable materials (for example, documents, workflows, designs, etc.): CC0 and CC-BY
  • For code: any open, non-commercial license (for example, GPL)

You should select appropriate licenses for your data after identifying which license best fits your needs and institutional standards. Note that datasets are not copyrightable materials.


Available Licenses for Publishing Datasets in DesignSafe

DATASETS

If you are publishing data, such as simulation or experimental data, choose between:

Open Data Commons Attribution
Recommended

 

  • You allow others to freely share, reuse, and adapt your data/database.
  • You expect to be attributed for any public use of the data/database.
  • Derived work from the reuse of this data/database will carry the same terms and conditions of this license.
  • You do not give all of your rights away.

 

Please read the License Website
Open Data Commons Public Domain Dedication
Consider and read carefully

 

  • You allow others to freely share, modify, and use this data/database for any purpose without any restrictions.
  • You do not expect to be attributed for it.
  • You give all of your rights away.

 

Please read the License Website

WORKS

If you are publishing papers, presentations, learning objects, workflows, designs, etc, choose between:

Creative Commons Attribution Share Alike
Recommended

 

  • You allow others to freely share, reuse, and adapt your work.
  • You expect to be attributed for any public use of your work.
  • Derived work that is published will carry the same terms and conditions of this license.
  • You retain your copyright.

 

Please read the License Website
Creative Commons Public Domain Dedication
Consider and read carefully

 

  • You allow others to freely share, modify, and use this work for any purpose without any restrictions.
  • You do not expect to be attributed for it.
  • You give all of your rights away.

 

Please read the License Website

SOFTWARE

If you are publishing community software, scripts, libraries, applications, etc, choose the following:

GNU General Public License

 

  • You give permission to modify, copy, and redistribute the work or any derivative version.
  • The licensee is free to choose whether or not to charge a fee for services that use this work.
  • They cannot impose further restrictions on the rights imposed by this license.

 

Please read the License Website
 

Data Archiving and Preservation

Depositing your data and associated research project materials in the DesignSafe Data Depot will meet NSF requirements for data management. DesignSafe will persistently maintain all uploaded data on storage resources at the Texas Advanced Computing Center, and these resources are redundant and geographically replicated. DesignSafe operates a dedicated Fedora repository platform to ensure the authenticity, integrity, security and persistence of published datasets for open access.