A Case for Jupyter and DesignSafe

Published on May 15, 2017

Scott Brandenberg, associate professor of Civil and Environmental Engineering at the University of California Los Angeles, is a new user of Jupyter notebooks. In a recent interview with the Texas Advanced Computing Center, which hosts the NHERI DesignSafe-CI, Brandenburg shared his experiences as a new user of Jupyter notebooks.

Brandenburgs research specialties are geotechnical earthquake engineering, water distribution, and seismic effects on underground structures. He conducts experiments at the NHERI Centrifuge Experimental Facility at the UC Davis Center for Geotechnical Modeling.

What is the nature of the research for which you're using the Jupyter notebooks?

Scott Brandenberg: The topic of the research project for which we've developed a Jupyter notebook is seismic effects on underground structures. The project looks at the development of earth pressures during earthquakes. One of the key problems that engineers have to solve when designing underground structures is how much pressure the soil exerts on the structure when there's an earthquake. As an earthquake is occurring, the ground shakes and there can be increases in pressure on an underground structure caused by the earthquake waves interacting with the structure, such as a culvert or tunnel.

One of the big areas that building owners are struggling with is that the approach engineers are currently using to compute these earth pressures is not particularly realistic, and they're tending to over predict these pressures. What we're doing is developing new analytical methods that are more realistic and are based on more fundamental theory than the current methods. We ran centrifuge experiments on underground structures to generate seismic earth pressure data that is being used to evaluate our new theory as well as the existing methods. We hope that the new method will do a better job of explaining the observations during these experiments.

What do you find most appealing about DesignSafe?

Scott Brandenberg: I'm going to compare DesignSafe with NEES because I'm focused on the improvements that have happened. NEES developed a data repository. That meant that when we completed an experiment we'd archive the data and metadata (that describes the data) like Excel tables and sensor lists. Then, the data would be in NEES for other people to download and use. However, we wouldn't use the data within the NEES data repository; we'd go to the repository to download the data and then be operating on our own local instance of the data. Uploading the data to NEES was therefore an extra step that had to be undertaken by the project team, and did not directly facilitate discovery of new findings from the data.

What I really like about DesignSafe is that it's changing that paradigm. Now we're able to analyze the data within the DesignSafe cyberinfrastructure itself. We upload the data, we can operate on the data there, and we can share the tools for analyzing data. One thing that tended to happen with NEES is that I would supervise a PhD student who ran the experiments. They would write their own scripts to process the data. We'd write papers; submit them; and then the student would graduate and take all of their scripts with them. I'd have access to the data but not all of the processing files. DesignSafe is going to solve that problem. We're going to work on scripts in the cloud and everyone will have access to them. When a student is finished with a project, all of the work that they've done will be archived in DesignSafe. Their whole workflow will be documented and available for further reuse. That's a really attractive feature to me as a PI for a lot of these projects.

What is your own description of a Jupyter notebook?

Scott Brandenberg is an Associate Professor in the Department of Civil and Environmental Engineering at University of California Los Angeles (UCLA).

Scott Brandenberg: It's a powerful tool. I think it's going to change the way that people in the hazard community work with data. The Jupyter notebook is a program that allows you to integrate a variety of different coding languages Python or R, for example within an active document that runs on the web. The Jupyter notebooks enable us to have code blocks that are actually operating on data. And those are combined with markdown cells that provide explanations of what's happening. It's a good way of synthesizing the calculations with the explanation. It makes it easier to go back and figure out what was happening and what workflow was used. The Jupyter notebooks can run in the cloud in DesignSafe, which means we can be operating on the data that we've collected and put it into DesignSafe without having to download it onto our own local computer first. That's really nice because that means that a student can work on a script, and let me know that they made some changes. Then I can log in and take a look at their Jupyter notebook and add new changes or comments myself. I've been using other calculation tools for about 20 years, so the notebooks are fairly new to me. It was the first time that I'd ever used Python, for example.

What are some of the ways your research was done prior to using the Jupyter notebooks?

Scott Brandenberg: I think that the most common approach used by researchers was that students would have data on their own computer and would use tools like MATLAB or Mathcad to process that data and publish those results. The data might be shared but the scripts were not shared, or if they were shared they were linked to the particular directories in which the students stored the data and therefore were not easily transferrable. Experimental data researchers would also write "data reports", written documentation about the dataset. The data report was critical for other researchers to understand how to use the dataset (which column corresponds to which sensor, for example). Usually those data reports would be .pdf files that users would download. And they would include hundreds of pages of data plots. What we've done with our culvert project is to make an interactive digital data report in DesignSafe using a Jupyter notebook. Because the Jupyter notebook uses markdown language we were able to take all of the text that would usually appear in the .pdf file, format it as html, and put it in the notebook.

How is the data shared and distributed?

Scott Brandenberg: Users can interact with the data using the Jupyter notebook, but they can also download the data outside of Jupyter, too. We upload data files and we decide when we want to make the data public; once it's public anyone can log into DesignSafe and access those data files. So they could go to Jupyter and look at the data using that tool or they could directly download the data for their own use. Or they could even develop their own Jupyter notebook in DesignSafe and process the data there. Also, they could simulate the experiment using a finite element a program like OpenSEES, and write a Jupyter notebook that compares the results from the computer simulation with the experimental data.

How do you access the Jupyter notebooks?

Scott Brandenberg: In general, you access the Jupyter notebooks through DesignSafe. If you log into DesignSafe, you'll see the Research Workbench area which includes the Data Depot. That's where you go to access published data, as well as your own private project data. DesignSafe also includes the Discovery Workspace, which includes tools, such as Jupyter, that can access the data in the Data Depot. To open Jupyter in DesignSafe you just click a button within the Discovery Workspace, it opens right in your web browser, and then you can open a Jupyter notebook from a directory in the Data Depot. In addition, the Jupyter notebooks are open source so you can download the client and run it locally on your computer.

My Jupyter notebook is available to everyone in the Community Data directory.

What are the primary benefits of Jupyter notebooks?

Scott Brandenberg: The main benefit I can see for other PIs is that all of your student's scripts are available for everyone in the project. Everything that the students are working on is all there to share, which is better than having the students work locally on their own computer. There will always be users who are reluctant to adopt a new technology. The way Python works is similar to MATLAB, so the leap is not that big, and our research team has adapted pretty easily.

Finally, how do you think the notebooks are helping to advance your research?

Scott Brandenberg: The Jupyter notebooks are not directly providing new computational methods or scripts that we didn't have before. The real benefit is having the same processes together in the same workflow, so the data is there and the processing scripts are there with it in the cloud. I think that's the real innovation. It's more about the quality of the workflow and having everything well documented in one place. I think it does have capabilities to fundamentally change how we're doing our work.

For example, I'm part of another effort right now that's using DesignSafe to build a large database of field case histories from liquefaction events that have happened all over the world. There's going to be quite a bit of data more data than any single user would want to download and try to process on their own computer. The Jupyter notebooks are providing us with the ability to operate on all of that data within DesignSafe so we don't have to download it. It has big impact when we're analyzing a lot of data at the same time. We'll be able to use these cloud resources to do things that we weren't able to do before.

Originally published on April 28, 2017 by Faith Singer-Villalobos
at the Texas Advanced Computing Center (TACC).