DesignSafe-ci.org will provide a comprehensive environment for experimental, theoretical, and computational engineering and science, providing a place not only to steward data from its creation through archive, but also the workspace in which to understand, analyze, collaborate and publish that data.
Our vision is that DesignSafe will be an integral part of research and discovery, providing researchers access to cloud-based tools that support their work to analyze, visualize, and integrate diverse data types. As a result, researchers will want to store and share their data in the DesignSafe data repository, even if not required to do so, because of the access to these capabilities. To achieve this vision, DesignSafe will provide a flexible data repository with straightforward mechanisms for data/metadata upload and will enable the next generation of research discovery through a cloud-based interface that allows data analysis and visualization tools to work directly on data stored in the data repository. These functionalities will allow researchers to use the CI to interact with their data in the cloud, bypassing time-consuming downloads/uploads. Not only will the cloud-based interface allow researchers to analyze, visualize, and integrate data, but they will also be able to share analysis scripts and link tasks to support workflows that facilitate research discovery.
DesignSafe will be comprised of the following services and components (Figure 1):
The portal will be a primary point of entry for users of the DesignSafe capabilities and the NHERI community. The portal will provide NHERI wide information on experimental facilities, the Facility Scheduling Dashboard, and Education and Community Outreach (ECO) activities. To ensure maximum interoperability with diverse software architectures and modes of access, the portal will be developed according to current web standards for accessibility and performance, ensuring a consistent and responsive experience on any modern web browser or mobile device. Furthermore, the portal will be powered by an extensive set of flexible and reusable Application Programming Interfaces (APIs), enabling full programmatic access to all aspects of the center’s infrastructure.
At the heart of the cyberinfrastructure, the Data Depot is the central shared data repository that supports the full research lifecycle, from data creation to analysis to curation and publication. One of its fundamental components is stability, which assures both reliability of the processes that are conducted and data sustainability. The Data Depot is built upon a foundation of a multi-petabyte repository that features mirroring between two physical sites and the option of a third copy in a tape archive so that data is continuously safeguarded and can be recovered in case of disasters. The system is monitored continuously (24/7) by staff on site, and is protected by both firewalls and an intrusion detection system (IDS). Long-term preservation involves sound system administration practices, a stable infrastructure, as well as policies and their right implementation and monitoring. For DesignSafe, these aspects are addressed in collaboration with the University of Texas Libraries who provide critical expertise to assure long-term access to sustainable data.
The Data Depot will support connection of metadata to all data objects through tools that facilitate data organization and description and allow the metadata tagging to occur progressively during the research phase, as the data is being created, used, and curated. This will enhance data publication, as well as data discovery and understandability for reuse. The Data Depot also must provide an intuitive interface with data to facilitate user’s interaction with the data. Upload/download of data is streamlined through a range of interactive and automated options for both single file and bulk transfer, including drag and drop file upload, federation with existing cloud data services (e.g. Box.com, Dropbox, S3, or Google Drive), command line interfaces that can be automated by power users, and interactive web tools that will lead the user through an interactive interface to input data and create the minimum necessary metadata. The Data Depot will accept any data the user wishes to supply into a local workspace, even if the data type is unknown or only partial metadata is provided.
The Data Depot provides direct support for data sharing and collaboration. DesignSafe supports the sharing of all objects in the CI - with a simple click, data from a user’s private directory can be shared with a peer or a research team, or with the entire public through the web. Data may be a file, a set of notes from the Discovery Workspace, an image, a movie, or a link to a saved workspace to allow a collaborator to perform the same analysis. In addition, users will be able to access a control list to enable permissions to the data. Also possible is to set a unique public URL to a dataset and create a DOI (Digital Object Identifier) to it.
The Discovery Workspace will be a web-based environment that provides researchers with access to data analysis tools, computational simulation tools, visualization tools, educational tools, and user-contributed tools within the cloud to support research workflows, learning, and discovery. The portal will provide a desktop metaphor, with a data window to give the user access to the contents of the Data Depot (which includes experimental, simulation, and reconnaissance data, as well as others) and a tools window giving the user access to a list of available tools, scripts, etc. For example, Figure 2 shows an embedded R Workspace that is currently available from the TACC Analytics Portal and allows researchers to use the program R to analyze their data. This type of interface within the Discovery Workspace will allow users to take advantage of powerful analysis capabilities to fully investigate and explore their data, all within the cloud.
The software tools made available within the Discovery Workspace will be identified through discussions with the NHERI research community and will also include those developed by the SimCenter awardee. Our initial discussions with a subset of the community have identified a range of new software tools that are of interest to the community. These tools encompass both data analytics and visualization tools (e.g. MATLAB, ParaView), as well as computational simulation tools (e.g., OpenSees, ABAQUS, ADCIRC, OpenFOAM). Additionally, the tools span all of the technical domains included in NHERI. In particular, the wind community has unique computational simulation and data requirements through its use of Database Assisted Design, called DAD. We will facilitate and promote DAD through the availability of wind data from multiple sources and a suite of DAD simulation tools (e.g., windPRESSURE from NIST) within the Discover Workspace. DesignSafe will make commercial codes available through a “Bring-Your-Own-License” approach, which allows the CI to confirm that a user has an active license for the software at their home institution. This functionality has been used at TACC for widely-used software packages, such as MATLAB. We will expand the “Bring-Your-OwnLicense” functionality to the commercial software packages required for the NHERI community.
The Discovery Workspace will be implemented using TACC’s highly scalable and extensible Agave science-as-a-service platform, which is the evolution of the successful iPlant Foundation API (Dooley et al. 2012). Agave has generalized the core functionality of the Foundation API to provide a science-as-aservice platform for gateway development that works seamlessly in High Performance Computing (HPC), campus, commercial, and cloud environments alike. Using Agave as a platform to develop the Discovery Workspace will provide several advantages:
The Reconnaissance Integration Portal will be the main access point to data collected during the reconnaissance of windstorm and earthquake events. These data may be collected by the RAPID experimental facility, its users, or other researchers participating in reconnaissance. The reconnaissance data may include infrastructure performance data (e.g., damage estimates, ground movements, subsurface information), remotely sensed data (e.g., photos, video, LIDAR point clouds, satellite imagery data), or human experiential data (e.g., social media data, societal impact data). These data represent diverse data types with different metadata requirements, but their use hinges on information regarding the location from which the data were collected. Therefore, a geospatial framework (GoogleEarth and GIS) will be used to interface with much of the data to provide the contextual location of the data with respect to the windstorm or earthquake event. The reconnaissance data will be physically located in the Data Depot and accessible by analytics and visualization tools, but the Reconnaissance Integration Portal will provide the initial interface to the data. TACC has developed geospatial interfaces for other and will take advantage of this experience to develop the Reconnaissance Integration Portal in coordination with the RAPID facility awardee. Our collaboration with the RAPID facility awardee will ensure that we meet the needs of this community.
The Learning Center will be the central repository for self-paced, on demand materials to teach users (e.g., undergraduate students, graduate students, researchers, and faculty) to take advantage of the CI capabilities of DesignSafe. The availability of on demand instructional materials at DesignSafe will ensure that the NHERI community has access to training when and where they need it. Online materials in the Learning Center will be built based on the principle that online content requires attention to format and content unique to the interactive online metaphor; simple posting of slide decks and recorded lectures are insufficient. Learning Center modules will be interactive, include exercises, and navigation to allow users to mark and save progress, and jump quickly to needed content. The Learning Center will be extensible, and support publication of modules developed by all NHERI awardees.
The Developer’s Portal will be the central place for users and developers wishing to extend the capabilities of the DesignSafe infrastructure. Through the portal, users can access a tool builder which will support the deployment of new Apps (ranging from simple data conversion scripts to complex simulation applications) to the Discovery Workspace, or access complete information on the DesignSafe Application Programming Interfaces (APIs). All capabilities of DesignSafe will be exposed through the API layer. While most users will simply use the Discovery Workspace, Data Depot, or Reconnaissance Integration Portal, all of the capabilities in these tools will be exposed to programmers through the API. API functions will include the ability to ingest or download data, run analysis jobs, translate data types, or create public identifiers for data. Through this interface, users can embed DesignSafe capabilities in other applications. The Developer’s Portal transforms the DesignSafe from simply a static web application built by the design team, to a user extensible “App store” that can grow with changes in the community and the creativity of individual research teams.