Data Publication
Guidelines

Guidance and best practices for publishing data

This document provides data curation guidance and best practices for researchers who will use the NHERI DesignSafe cyberinfrastructure (CI) to share and publish natural hazards engineering data. Specific guidelines are provided for researchers using the NHERI Experimental Facilities (EF), for researchers using the NHERI RAPID facility, and for researchers performing simulation-based research.

Data curation is made up of all the activities undertaken to generate organized and documented data that is easy to re-use. Using data management tools in DesignSafe, researchers are empowered to progressively curate their own data as their research progresses. On demand assistance from a curator is available to provide training and to guide users through their data curation and publication needs. When curation is complete, researchers can publish the dataset with a permanent digital object identifier (DOI) that allows the data to be easily located on the web and cited. Features are in place to ensure the authenticity, integrity, security and persistence of the datasets for open access. DesignSafe is committed to the continuity of data preservation beyond the conclusion of the DesignSafe project.

To cite the use of DesignSafe in your research, please reference the following paper:

Rathje, E., Dawson, C. Padgett, J.E., Pinelli, J.-P., Stanzione, D., Adair, A., Arduino, P., Brandenberg, S.J., Cockerill, T., Dey, C., Esteva, M., Haan, Jr., F.L., Hanlon, M., Kareem, A., Lowes, L., Mock, S., and Mosqueda, G. 2017. “DesignSafe: A New Cyberinfrastructure for Natural Hazards Engineering,” ASCE Natural Hazards Review, doi: 10.1061/(ASCE)NH.1527-6996.0000246

Data Sharing and Publishing

DesignSafe provides an end-to-end data management, analysis and publication platform for both experimental and simulation-based research. Within the DesignSafe Data Depot, researchers have access to a private “My Data” space, a collaborative “My Projects” space, and a “Published” space for published datasets.

All research data collected as part of a research project, as well as processing scripts, data analysis products, and simulation models/results generated, can be deposited in the Data Depot from the inception of the project. These data are kept private within a Project until published by the research team. Using a Project to share data with your team members during the course of a project facilitates the progressive curation of data and eventual publishing.

Each research team is responsible for curating its data using the data management tools provided by DesignSafe. These tools help researchers organize, categorize, and describe their data within the DesignSafe Data Depot. Assistance from a curator is available to provide training and to guide users through the data curation process. After data is curated and ready to be published, it will be vetted against the research community’s minimum metadata requirements (www.designsafe-ci.org/rw/support/data-publication) before moving on to receive a DOI for persistent identification and ease of data sharing and citation. Researchers using published data from the DesignSafe Data Depot must cite it using the DOI which includes the DataCite schema for accurate citation (http://schema.datacite.org/).

Citing datasets in papers

Researchers using published data from the DesignSafe Data Depot must cite it using the DOI, which relies on the DataCite schema for accurate citation. For convenience, users can retrieve a formatted citation from the published data landing page. It is always recommended to insert the citations in the reference section of your paper.

Reusing data from other sources

Frequently you use data from other sources in your research and sometimes you even want to re-publish it. It is always a good practice to give credit to the data creators and or make sure you can re-publish the data if you need to. Please, be aware of the following:

  1. If you cite the data, make sure there is preferably a DOI or a permanent URL in the citation so that users can get directly to the cited data. Use the Related Work box in Edit Project to include the citation/s and corresponding links.
  2. If you use external data in your analyses, you can point to it from the Referenced Data Title box as you create your analyses category.
  3. Be aware of the reused data original license conditions of usage. The license may specify if and how you can modify, distribute, and cite the reused data.
  4. If you have reused images from other sources (online, databases, publications, etc.), be aware that they may have copyrights. We recommend using the following instructions for how to cite them: http://guides.library.ubc.ca/c.php?g=698822&p=4965735

Responsibilities and Timelines

Researchers working at a NHERI EF will receive their bulk data files via the Data Depot. NHERI EF staff will deposit the data files into an existing Project created for the research project. For all other types of research (e.g., simulation, experimental work performed at a non-NHERI lab), it will be the responsibility of the research team to upload their data to the Data Depot. As noted previously, the research team is responsible for data curation and publishing. Although no firm timeline requirements are specified for data publishing, researchers are expected to publish in a timely manner. Recommended timelines for publishing different types of research data (i.e., Experimental, Simulation, and Reconnaissance) are listed in Table 1.

Guidelines specific to RAPID reconnaissance data can be found at rapid.designsafe-ci.org/media/filer_public/b3/82/b38231fb-21c9-41f8-b658-f516dfee87c8/rapid-designsafe_curation_guidelines_v3.pdf

Table 1. Recommended Publishing Timeline for Different Data Types

Project/Data Type

Recommended Publishing Timeline

Experimental

12 months from completion of experiment

Simulation

12 months from completion of simulations

Reconnaissance: Immediate Post-Disaster

3 months from returning from the field

Reconnaissance: Follow-up Research

6 months from returning from the field

Licensing

Within DesignSafe, you will choose a license for your material. Because the DesignSafe Data Depot is an open repository, the following licenses will be offered:

  • For datasets: ODC-PDDL and ODC-BY
  • For copyrightable materials (for example, documents, workflows, designs, etc.): CC0 and CC-BY
  • For code: any open, non-commercial license (for example, GPL)

You should select appropriate licenses for your data after identifying which license best fits your needs and institutional standards. Note that datasets are not copyrightable materials.

Data Archiving and Preservation

Depositing your data and associated research project materials in the DesignSafe Data Depot will meet NSF requirements for data management. DesignSafe will persistently maintain all uploaded data on storage resources at the Texas Advanced Computing Center, and these resources are redundant and geographically replicated. DesignSafe operates a dedicated Fedora repository platform to ensure the authenticity, integrity, security and persistence of published datasets for open access.

Data Embargo

In DesignSafe-CI this refers to time during which a project is not made public awaiting for the review and publication of a corresponding paper. Please submit a help ticket to Data Curation & Publication and we will work with you to accomplish the following requests:

  • to provide access to reviewers before publishing your data;
  • to reserve a DOI of your data before making it public;
  • to publish a dataset at the same time that you publish the corresponding paper in a Journal.

Metadata Requirements

Overview of metadata best practices implementation in DesignSafe Metadata is information that describes data. Metadata schemas provide a structured way for users to share metadata within and across domains. Because there is no standard schema to describe natural hazards engineering research and data, DesignSafe offers metadata sets to describe key components of datasets. These were developed in close consultation with researchers in the natural hazards community. The terms are evolving, and they are and will continue to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their publications.

DesignSafe’s metadata approach maps community terms to elements of widely-used, standardized schemas so that metadata can be exchanged with other platforms. The schemas to which terms have been mapped are: Dublin Core for description of the research project and the data publication, PROV to display provenance relationships between data and the processes from which it derives, and DataCite for DOI assignment and citation.

Due to variations in research domains and their methods, users may not need to use all of the elements available to describe their research. However, we identified a set of metadata terms that represent the structure of the data, are useful for discovery, and will allow proper citation of data. To ensure the quality of published data in DesignSafe, when users request to publish data the system checks for completeness of these core terms and or whether data are associated with them. The element set is shown below.

KEY (to help understand usage of the terms below)

(bold) Denotes the structure of the data. For example, an experimental project may have more than one experiment and more than one corresponding analysis.

(*) The metadata is repeatable, with multiple entries allowed.

($) Recommended if exists. For example, not every project will include an analysis.

(†) System-generated.

 

Experimental Research Project

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Experiment*
    • Report
    • DOI
    • Experiment Title
    • Author (PIs/Team Members)*
    • Experiment Description
    • Date of Publication
    • Dates of Experiment
    • Experimental Facility
    • Experiment Type
    • Equipment Type*
    • Model Configuration*
    • Sensor Information*
    • Event*
    • Experiment Report$
  • Analysis*$
    • Analysis Title
    • Description
    • Referenced Data*

 

Simulation Research Project

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Simulation*
    • Report
    • Simulation Title
    • Author (PIs/Team Members)*
    • Description
    • Simulation Type
    • Simulation Model
    • Simulation Input*
    • Simulation Output*
    • Referenced Data*
    • Simulation Report$
  • Analysis*$
    • Analysis Title
    • Description
    • Referenced Data*

 

Hybrid Simulation Research Project

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords
  • Hybrid Simulation*
    • Report
    • Global Model
      • Global Model Title
      • Description
    • Master Simulation Coordinator
      • Master Simulation Coordinator Title
      • Application and Version
      • Substructure Middleware
    • Simulation Substructure*
      • Simulation Substructure Title
      • Application and Version
      • Description
    • Experiment Substructure*
      • Experiment Substructure Title
      • Description

 

Field Research Project

  • DOI
  • Project Title
  • Author (PIs/Team Members)
  • Participant Institution
  • Project Type
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works
  • Award
  • Keywords
  • Natural Hazard Event
  • Natural Hazard Date
  • Mission
    • Mission Title
    • Date(s) of Mission
    • Mission Site Location
    • Mission Description
    • Collection
      • Collection Title
      • Observation Type
      • Date(s) of Collection
      • Data Collector(s)
      • Referenced Data
      • Collection Site Location
      • Instrument
      • Instrument Manufacturer and Model
      • Collection Description
      • Asset
        • Description$
        • Damage type$
        • Data type$
        • Data format$
        • Asset Site Information$
        • Observation Type$
  • Report
    • Title
    • Type
    • Data Collector
    • Date
    • Description
    • Referenced Data

 

Other

  • DOI
  • Project Title
  • Author (PIs/Team Members)*
  • Participant Institution*
  • Project Type*
  • Description
  • Publisher
  • Date of Publication
  • Licenses
  • Related Works*$
  • Award*
  • Keywords

Available Licenses for Publishing Datasets in DesignSafe

DATASETS

If you are publishing data, such as simulation or experimental data, choose between:

Open Data Commons Attribution
Recommended

 

  • You allow others to freely share, reuse, and adapt your data/database.
  • You expect to be attributed for any public use of the data/database.
  • Derived work from the reuse of this data/database will carry the same terms and conditions of this license.
  • You do not give all of your rights away.

 

Please read the License Website
Open Data Commons Public Domain Dedication
Consider and read carefully

 

  • You allow others to freely share, modify, and use this data/database for any purpose without any restrictions.
  • You do not expect to be attributed for it.
  • You give all of your rights away.

 

Please read the License Website

WORKS

If you are publishing papers, presentations, learning objects, workflows, designs, etc, choose between:

Creative Commons Attribution Share Alike
Recommended

 

  • You allow others to freely share, reuse, and adapt your work.
  • You expect to be attributed for any public use of your work.
  • Derived work that is published will carry the same terms and conditions of this license.
  • You retain your copyright.

 

Please read the License Website
Creative Commons Public Domain Dedication
Consider and read carefully

 

  • You allow others to freely share, modify, and use this work for any purpose without any restrictions.
  • You do not expect to be attributed for it.
  • You give all of your rights away.

 

Please read the License Website

SOFTWARE

If you are publishing community software, scripts, libraries, applications, etc, choose the following:

GNU General Public License

 

  • You give permission to modify, copy, and redistribute the work or any derivative version.
  • The licensee is free to choose whether or not to charge a fee for services that use this work.
  • They cannot impose further restrictions on the rights imposed by this license.

 

Please read the License Website