This document provides data curation guidance and best practices for researchers who will use the NHERI DesignSafe cyberinfrastructure (CI) to share and publish natural hazards engineering data. Specific guidelines are provided for researchers using the NHERI Experimental Facilities (EF), for researchers using the NHERI RAPID facility, and for researchers performing simulation-based research.
Data curation is made up of all the activities undertaken to generate organized and documented data that is easy to re-use. Using data management tools in DesignSafe, researchers are empowered to progressively curate their own data as their research progresses. On demand assistance from a curator is available to provide training and to guide users through their data curation and publication needs. When curation is complete, researchers can publish the dataset with a permanent digital object identifier (DOI) that allows the data to be easily located on the web and cited. Features are in place to ensure the authenticity, integrity, security and persistence of the datasets for open access. DesignSafe is committed to the continuity of data preservation beyond the conclusion of the DesignSafe project.
To cite the use of DesignSafe in your research, please reference the following paper:
Rathje, E., Dawson, C. Padgett, J.E., Pinelli, J.-P., Stanzione, D., Adair, A., Arduino, P., Brandenberg, S.J., Cockerill, T., Dey, C., Esteva, M., Haan, Jr., F.L., Hanlon, M., Kareem, A., Lowes, L., Mock, S., and Mosqueda, G. 2017. “DesignSafe: A New Cyberinfrastructure for Natural Hazards Engineering,” ASCE Natural Hazards Review, doi: 10.1061/(ASCE)NH.1527-6996.0000246
DesignSafe provides an end-to-end data management, analysis and publication platform for both experimental and simulation-based research. Within the DesignSafe Data Depot, researchers have access to a private “My Data” space, a collaborative “My Projects” space, and a “Published” space for published datasets.
All research data collected as part of a research project, as well as processing scripts, data analysis products, and simulation models/results generated, can be deposited in the Data Depot from the inception of the project. These data are kept private within a Project until published by the research team. Using a Project to share data with your team members during the course of a project facilitates the progressive curation of data and eventual publishing.
Each research team is responsible for curating its data using the data management tools provided by DesignSafe. These tools help researchers organize, categorize, and describe their data within the DesignSafe Data Depot. Assistance from a curator is available to provide training and to guide users through the data curation process. After data is curated and ready to be published, it will be vetted against the research community’s minimum metadata requirements (www.designsafe-ci.org/rw/support/data-publication) before moving on to receive a DOI for persistent identification and ease of data sharing and citation. Researchers using published data from the DesignSafe Data Depot must cite it using the DOI which includes the DataCite schema for accurate citation (http://schema.datacite.org/).
Guidelines Regarding the Storage and Publication of Protected Data in DesignSafe-CI
Researchers should always comply with the requirements, norms and procedures approved by the Institutional Review Board (IRB) or equivalent body, regarding human subjects’ data storage and publication.
Protected data includes human subjects data with Personal Identifiable Information (PII), data that is protected under HIPPA, FERPA and FISMA regulations, as well as data that involves vulnerable populations and that contains sensitive information.
Storing Protected Data
DesignSafe My Data and My Projects are secure spaces to store raw protected data as long as it is not under HIPPA, FERPA or FISMA regulations. If data needs to comply with these regulations, researchers must contact DesignSafe through a help ticket to evaluate the case and use TACC‘s Protected Data Service. Researchers with doubts are welcome to send a ticket or join curation office hours.
Publishing Protected Data
To publish protected data researchers should adhere to the following procedures:
Researchers using published data from the DesignSafe Data Depot must cite it using the DOI, which relies on the DataCite schema for accurate citation. For convenience, users can retrieve a formatted citation from the published data landing page. It is always recommended to insert the citations in the reference section of your paper.
Frequently you use data from other sources in your research and sometimes you even want to re-publish it. It is always a good practice to give credit to the data creators and or make sure you can re-publish the data if you need to. Please, be aware of the following:
Researchers working at a NHERI EF will receive their bulk data files via the Data Depot. NHERI EF staff will deposit the data files into an existing Project created for the research project. For all other types of research (e.g., simulation, experimental work performed at a non-NHERI lab), it will be the responsibility of the research team to upload their data to the Data Depot. As noted previously, the research team is responsible for data curation and publishing. Although no firm timeline requirements are specified for data publishing, researchers are expected to publish in a timely manner. Recommended timelines for publishing different types of research data (i.e., Experimental, Simulation, and Reconnaissance) are listed in Table 1.
Guidelines specific to RAPID reconnaissance data can be found at rapid.designsafe-ci.org/media/filer_public/b3/82/b38231fb-21c9-41f8-b658-f516dfee87c8/rapid-designsafe_curation_guidelines_v3.pdf
Project/Data Type |
Recommended Publishing Timeline |
Experimental |
12 months from completion of experiment |
Simulation |
12 months from completion of simulations |
Reconnaissance: Immediate Post-Disaster |
3 months from returning from the field |
Reconnaissance: Follow-up Research |
6 months from returning from the field |
In DesignSafe-CI this refers to time during which a project is not made public awaiting for the review and publication of a corresponding paper. Please submit a help ticket to Data Curation & Publication and we will work with you to accomplish the following requests:
Overview of metadata best practices implementation in DesignSafe Metadata is information that describes data. Metadata schemas provide a structured way for users to share metadata within and across domains. Because there is no standard schema to describe natural hazards engineering research and data, DesignSafe offers metadata sets to describe key components of datasets. These were developed in close consultation with researchers in the natural hazards community. The terms are evolving, and they are and will continue to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their publications.
To further help describe data, DesignSafe offers the ability to add predefined and custom file tags during the curation process. The file tags are agreed upon terms provided by the natural hazard community. These are optional, but highly recommended. Users can add multiple file tags to one file and add tags to folders.
DesignSafe’s metadata approach maps community terms to elements of widely-used, standardized schemas so that metadata can be exchanged with other platforms. The schemas to which terms have been mapped are: Dublin Core for description of the research project and the data publication, DDI (Data Documentation Initiative) for social science data description, and DataCite for DOI assignment and citation.
Due to variations in research domains and their methods, users may not need to use all of the elements available to describe their research. However, we identified a set of metadata terms that represent the structure of the data, are useful for discovery, and will allow proper citation of data. To ensure the quality of published data in DesignSafe, when users request to publish data the system checks for completeness of these core terms and or whether data are associated with them. The element set is shown below.
KEY (to help understand usage of the metadata below)
Metadata not marked with $ or † is required.
(*) The metadata is repeatable, with multiple entries allowed.
($) Recommended if exists.
(†) System-generated.
Experimental Research Project
View Metadata Dictionary
Simulation Research Project
View Metadata Dictionary
Hybrid Simulation Research Project
View Metadata Dictionary
Field Research Project
View Metadata Dictionary
Other
View Metadata Dictionary
Within DesignSafe, you will choose a license for your material. Because the DesignSafe Data Depot is an open repository, the following licenses will be offered:
You should select appropriate licenses for your data after identifying which license best fits your needs and institutional standards. Note that datasets are not copyrightable materials.
Available Licenses for Publishing Datasets in DesignSafe
If you are publishing data, such as simulation or experimental data, choose between:
Please read the License Website
|
Please read the License Website
|
If you are publishing papers, presentations, learning objects, workflows, designs, etc, choose between:
Please read the License Website
|
Please read the License Website
|
If you are publishing community software, scripts, libraries, applications, etc, choose the following:
GNU General Public License |
Depositing your data and associated research project materials in the DesignSafe Data Depot will meet NSF requirements for data management. DesignSafe will persistently maintain all uploaded data on storage resources at the Texas Advanced Computing Center, and these resources are redundant and geographically replicated. DesignSafe operates a dedicated Fedora repository platform to ensure the authenticity, integrity, security and persistence of published datasets for open access.