CURATION & PUBLICATION POLICIES

Data Publication and Usage


Protected Data

Protected data are information subject to regulation under relevant privacy and data protection laws, such as HIPAA, FERPA and FISMA, as well as human subjects data containing Personally Identifiable Information (PII) and data involving vulnerable populations and or containing sensitive information

Publishing protected data in the DDR involves complying with the requirements, norms, and procedures approved by the data producers Institutional Review Board (IRB) or equivalent body regarding human subjects data storage and publication, and managing direct and indirect identifiers in accordance with accepted means of data de-identification. In the DDR protected data issues are considered at the onset of the curation and publication process and before storing data. Researchers working with protected data in DDR have the possibility to  communicate this to the curation team when they select a project type in DDR and the curator gets in touch with them to discuss options and procedures.

Unless approved by an IRB, most forms of protected data cannot be published in DesignSafe. No direct identifiers and only up to three indirect identifiers are allowed in published datasets. However, data containing PII can be published in the DDR with proper consent from the subject(s) and documentation of that consent in the project's IRB paperwork. In all publications involving human subjects, researchers should include and publish their IRB documentation showing the agreement.

If as a consequence of data de-identification the data looses meaning, it is possible to publish a description of the data, the corresponding IRB documents,  the data instruents if applicable, and obtain a DOI and a citation for the dataset. In this case, the dataset will show as with Restricted Access. In addition, authors should include information of how to reach them in order to gain access or discuss more information about the dataset. The responsibility to maintain the protected dataset in compliance with the IRB comitements and for the long term will lie on the authors, and they can use TACC's Protected Data Services if they need to. For more information on how to manage this case see our Protected Data Best Practices.  

It is the user’s responsibility to adhere to these policies and the procedures and standards of their IRB or other equivalent institution, and DesignSafe will not be held liable for any violations of these terms regarding improper publication of protected data. User uploads that we are notified of that violate this policy may be removed from the DDR with or without notice, and the user may be asked to suspend their use of the DDR and other DesignSafe resources. We may also contact the user’s IRB and/or other respective institution with any cases of violation, which could incur in an active audit (See 24) of the research project, so users should review their institution’s policies regarding publishing with protected data before using DesignSafe and DDR.

For any data not subject to IRB oversight but may still contain PII, such as Google Earth images containing images of people not studied in the scope of the research project, we recommend blocking out or blurring any information that could be considered PII before publishing the data in the DDR. We still invite any researchers that are interested in seeing the raw data to contact the PI of the research project to try and attain that. See our Protected Data Best Practices for information on how to manage protected data in DDR.

Subsequent Publishing

Attending to the needs expressed by the community, we enable the possibility to publish data and other products subsequently within a project, each with a DOI. This arises from the longitudinal and/or tiered structure of some research projects such as experiments and field research missions which happen at different time periods, may involve multiple distinct teams, have the need to publish different types of materials or to release information promptly after a natural hazards event and later publish related products. Subsequent publishing is enabled in My Project interface where users and teams manage and curate their active data throughout their projects' lifecycle. 

Timely Data Publication  

Although no firm deadline requirements are specified for data publishing, as an NSF-funded platform we expect researchers to publish in a timely manner, so we provide recommended timelines for publishing different types of research data in our Timely Data Publication Best Practices.

Peer Review

Users that need to submit their data for revision prior to publishing and assigning a DOI have the opportunity to do so by: a) adding reviewers to their My Project, when there is no need for annonymous review, or b) by contacting the DesignSafe data curator through a Help ticket to obtain a Public Accessibility Data Delay (See below). Note that the data must be fully curated prior to requesting a Public Accessibility Delay. 

Public Accessibility Delay

Many researchers request a DOI for their data before it is made publicly available to include in papers submitted to journals for review. In order to assign a DOI in the DDR, the data has to be curated and ready to be published. Once the DOI is in place, we provide services to researchers with such commitments to delay the public accessibility of their data publication in the DDR, i.e. to make the user’s data publication, via their assigned DOI, not web indexable through DataCote and or not publicly available in DDR's data browser until the corresponding paper is published in a journal, or for up to one year after the data is deposited. The logic behind this policy is that once a DOI has been assigned, it will inevitably be published, so this delay can be used to provide reviewers access to a data publication before it is broadly distributed. Note that data should be fully curated, and that while not broadly it will be eventually indexed by search engines. Users that need to amend/correct their publications will be able to do so via version control. See our Data Delay Best Practices for more information on obtaining a public accessibility delay.

Data Licenses

DDR provides users with 5 licensing options to accommodate the variety of research outputs generated and how researchers in this community want to be attributed. The following licenses were selected after discussions within our community. In general, DDR users are keen about sharing their data openly but expect attribution. In addition to data, our community issues reports, survey instruments, presentations, learning materials, and code. The licenses are: Creative Commons Attribution (CC-BY), Creative Commons Public Domain Dedication (CC-0), Open Data Commons Attribution (ODC-BY), Open Data Commons Public Domain Dedication (ODC-PPDL), and GNU General Public License (GNU-GPL).  During the publication process  users have the option of selecting one license per publication with a DOI. More specifications of these license options and the works they can be applied to can be found in Licensing Best Practices

DDR also requires that users reusing data from others in their projects do so in compliance with the terms of the data original license.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with licenses are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that the licencing policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

Data Citation

DDR abides by and promotes the Joint Declaration of Data Citation Principles amongst its users. 

We encourage and facilitate researchers using data from the DDR to cite it using the DOI and citation language available in the datasets landing page. The DOI relies on the DataCite schema for citation and accurate access.

For users publishing data in DDR, we enable referencing works and or data reused in their projects. For this we provide two fields, Related Work and Referenced Data, for citing data and works in their data publication landing page.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with data citation are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that citation policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account. 

However, given that it is not feasible to know with certainty if users comply with data citation, our approach is to educate our community by reinforcing citation in a positive way.  For this we implement outreach strategies to stimulate data citation.  Through diverse documentation, FAQs webinars, and via emails, we regularly train our users on data citation best practices. And, by tracking and publishing information about the impact and science contributions of the works they publish citing the data that they use, we demonstrate the value of data reuse and further stimulate publishing and citing data.

Data Publication Agreement

This agreement is read and has to be accepted by the user prior to publishing a dataset. 

This submission represents my original work and meets the policies and requirements established by the DesignSafe Policies and Best Practices. I grant the Data Depot Repository (DDR) all required permissions and licenses to make the work I publish in the DDR available for archiving and continued access.  These permissions include allowing DesignSafe to:

  1. Disseminate the content in a variety of distribution formats according to the DDR Policies and Best Practices.

  2. Promote and advertise the content publicly in DesignSafe.

  3. Store, translate, copy, or re-format files in any way to ensure its future preservation and accessibility, 

  4. Improve usability and/or protect respondent confidentiality.

  5. Exchange and or incorporate metadata or documentation in the content into public access catalogues.

  6. Transfer data, metadata with respective DOI to other institution for long-term accessibility if needed for continuos access. 

I understand the type of license I choose to distribute my data, and I guarantee that I am entitled to grant the rights contained in them. I agree that when this submission is made public with a unique digital object identifier (DOI), this will result in a publication that cannot be changed. If the dataset requires  revision, a new version of the data publication will be published under the same DOI.

I warrant that I am lawfully entitled and have full authority to license the content submitted, as described in this agreement. None of the above supersedes any prior contractual obligations with third parties that require any information to be kept confidential. 

If applicable, I warrant that I am following the IRB agreements in place for my research and following Protected Data Best Practices

I understand that the DDR does not approve data publications before they are posted; therefore, I am solely responsible for the submission, publication, and all possible confidentiality/privacy issues that may arise from the publication.

Data Usage Agreement

Users who access, preview, download or reuse data and metadata from the DesignSafe Data Depot Repository (DDR) agree to the following policies. If these policies are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

  • Use of the data includes, but is not limited to, viewing parts or the whole of the content; comparing with data or content in other datasets; verifying research results and using any part of the content in other projects, publications, or other related work products.
  • Users will not use the data in any way prohibited by applicable laws, distribution licenses, and permissions explicit in the data publication landing pages.
  • The data are provided “as is,” and its use is at the users' risk. While the DDR promotes data and metadata quality, the data authors and publishers do not guarantee that:
    1. the materials are accurate, complete, reliable or correct;
    2. any defects or errors will be corrected;
    3. the materials and accompanying files are free of viruses or other harmful components; or
    4. the results of using the data will meet the user’s requirements.
  • Use of data in the DDR abides by the DesignSafe Privacy Policy.
  • Users are responsible for abiding by the restrictions outlined by the data author in their publications' landing pages and by the DDR in this agreement, but they are not responsible for any restrictions not otherwise explicitly described here or in the landing pages.
  • Users will not obtain personal information associated with DDR data that results in directly or indirectly identifying research subjects, individuals, or organizations with the aid of other information acquired elsewhere.
  • Users will not in any event hold the DDR or the data authors liable for any and all losses, costs, expenses, or damages arising from use of DDR data or any other violation of this agreement, including infringement of licenses, intellectual property rights, and other rights of people or entities contained in the data.
  • We do not gather IP addresses about public users that preview or download files from the DDR.
  • Our system logs file actions completed by registered users in the DDR including previewing, downloading or copying published data to My Data or My Projects. We only use this information in aggregate for metrics purposes and do not link it to the user’s identity.

Amends and Version Control

Users can amend and version their data publications. Since the DDR came online, we have helped users correct and or improve the metadata applied to their datasets after publication. Most requests involve improving the text of the descriptions, changing the order of the authors, and adding references of papers publised using the data in the project; users also required the possibility to version their datasets. Our amends and version control policy derives from meeting our users needs. 

Changes allowed during amends are:

  • Adding Related Works such as a paper they published after the data.
  • Correct typos and or improve the abstract and the keyword list. 
  • Correct or add an award.
  • Change the order of the authors.

If users need to add or delete files or change the content of the files, they have the opportunity to version their data publication. The following are the 

  • Versions will have the same DOI, and the title will indicate the version number. The decision to maintain the same DOI was agreed upon by our community to facilitate DOI management to data publishers and users.
  • Users will be able to view all existing versions in the publication's landing page. 
  • The DOI will always resolve in the latest version of the publication. 
  • Versions are documented by data publishers so other users understand what changed and why. The documentation is publicly displayed 

Documentation of versions requires including the name of the file/s changed, removed or added, and identifying within which category they are located. We include guidance on how to document versions within the curation and publication onboarding instruction.   

The Fedora repository manages all amends and versions so there is a record of all changes. Version number is passed to DataCite as metadata.

More information about the reasons for amends and versioning are in Publication Best Practices.

Leave Data Feedback

Users can click a “Leave Feedback” button on the projects’ landing pages to provide comments on any publication. This feedback is forwarded to the curation team for any needed actions, including contacting the authors. In addition, it is possible for users to message the authors directly as their contact information is available via the authors field in the publication landing pages. We encourage users to provide constructive feedback and suggest themes they may want to discuss about the publication in our Leave Data Feedback Best Practices

Data Impact

We understand data impact as a strategy that includes complementary efforts at the crossroads of data discoverability, usage metrics, and scholarly communications. 

Search Engine Optimization (SEO)

We have in place SEO methods to enhance the web visibility of the data publications. To increase discoverability and indexing of our publications  we follow guidance from Google Search Console and Google Data Search.

Data Usage Metrics

Our metrics follow the Make your Data Count Counter Code of Practice for Research Data.

Below are the definitions for each metric:

File Preview: Examining data in the portal such as clicking on a file name brings up a modal window that allows previewing files. Not all document types can be previewed. Among those that can are: text, spreadsheets, graphics and code files. (example extensions: .txt, .doc, .docx, .csv, .xlsx, .pdf, .jpg, .m, .ipynb). Those that can't include binary executables, MATLAB containers, compressed files, and video (eg. .bin, .mat, .zip, .tar, mp4, .mov).

File Download: Copying a file to the machine the user is running on, or to a storage device that machine has access to. This can be done by ticking the checkbox next to a document and selecting "Download" at the top of the project page. With documents that can be previewed, clicking "Download" at the top of the preview modal window has the same effect. Downloads are counted per project and per individual files. We also consider counts of copying a file from the published project to the user's My data, My projects, or to Tools and applications in DesignSafe or one of the connected spaces (Box, Dropbox, Google Drive). Tick the checkbox next to a document and select "Copy" at the top of the project page.

File Requests: Total file downloads + total file previews. 

Project Downloads: Total downloads of a compressed entire project to a user's machine. 

We report the metrics in the publications landing pages. To provide context to the metrics, we indicate the total amount of files in each publication.

We started counting since May 17, 2021. We update the reports on a monthly basis and we report data metrics to NSF every quarter. Currently we are in the process of formatting the reports to participate in the Make your Data Count initiative. 

Data Vignettes

Since 2020 we conduct Data Reuse Vignettes. For this, we identify published papers and interview researchers that have reused data published in DDR. In this context, reuse means that researchers are using data published by others for purposes different than those intended by the data creators. During the interviews we use a semi-structured questionnaire to discuss the academic relevance of the research, the ease of access to the data in DDR, and the understandability of the data publication in relation to metadata and documentation clarity and completeness.  We feature the data stories on the DesignSafe website and use the feedback to make changes and to design new reuse strategies. The methodology used in this project was presented at the International Qualitative and Quantitative Methods in Libraries 2020 International Conference . See Perspectives on Data Reuse from the Field of Natural Hazards Engineering

Data Awards

In 2021 we launched the first Data Publishing Award to encourage excellence in data publication and to stimulate reuse. Data publications are nominated by our user community based on contribution to scientific advancement and curation