We are a mentor community across NASA Earth science data centers (Distributed Active Archive Centers - DAACs). We are co-creating and teaching common tutorials to support researchers as they migrate analytical workflows to the Cloud.
We presented at several workshops and events this fall, including a 25-minute talk at NASA EOSDIS Systems Engineering Technical Interchange Meeting (SE TIM) that we reused at NASA’s Open Source Science Data Repositories (OSSDR) Workshop. These presentations shared perspectives from the NASA Openscapes DAAC mentors that we have developed through collaborating across DAACs, developing teaching resources, teaching, and listening to feedback from learners, which we iterate from after each time.
The purpose of these talks was to share and learn from each other, grow and improve DAAC support of cloud-archived science and applications users, while following open source, open science best practices. We will be presenting these ideas further at the AGU Fall Meeting in December 2022. This post shares brief highlights from these talks.
DAAC Internal Training - Knowledge From Within
Throughout 2022 the GES DISC DAAC led internal trainings with our DAAC colleagues to give DAAC staff hands-on experience in the cloud. These kinds of internal trainings are being shared and leveraged by other DAACs, who will be exploring this same exact model and leveraging these materials in the future. The goals were to “walk a mile in the user’s shoes”:
Teach cloud basics and definitions
Interact with one cloud data access method: direct S3 access
Grant access to cloud workspace
How did we do it? Step 1 was “Educate” - we presented and showcased cloud user resources at weekly meetings ahead of workshops, inviting people to join our workshop to learn hands-on. Step 2 was “Prepare” - we split participants by skill and experience level with Python and cloud computing. Step 3 was “Interact” - we taught with bite-sized lessons using Jupyter Notebooks, and encouraged folks to follow hands-on through our teaching style of live-coding and type-along during this interactive workshop.
How did it go? We taught 33 colleagues this way. In terms of skill and comfort level in advance, there were 12 Beginners, 14 Intermediate, and 7 Advanced. Positive feedback included: “More simple & straightforward than I thought”; Breaking up code pieces helped; Pre-filled notebooks useful to refer to; Easy to follow; GitHub clone was easy!; Breakout rooms for troubleshooting. Challenges included: Window/tab management; Too much material; Live-coding typing was too fast; Continue to define jargon/terminology throughout the workshop. We also asked about prerequisite knowledge colleagues would have benefited from, and they listed familiarity with Earthdata tools; Basic Python; Basic cloud (AWS) understanding & terminology; Account set up for cloud access.
Overall, we had four main lessons learned:
- Learning curve is steep! No one left an expert and we continue to help staff who are experimenting in the cloud
- Provide resources that are easy to revisit! Website, slides, instructions, recordings are all useful for staff who spend more time with the material
- Continued support and education are critical - Necessary to host refresher workshops to exercise the knowledge and introduce new tools and methods
- Lay a foundation with cloud basics and terminologies - Introduce terms and concepts early, perhaps in a separate meeting or clinic, and continue defining them throughout the workshop
End-user Training Events
Throughout 2022 we also led many training events with end-users - largely academic faculty and students using NASA Earthdata.
|Event||Date||Focus Area / Goals|
|Cloud Hackathon||November 2021||Five day collaborative open science learning experience aimed at exploring, creating, and promoting effective cloud-based science and applications workflows using NASA Earthdata Cloud data, tools, and services (among others).|
|AGU Workshop||December 2021||Half-day workshop focused on enabling Analysis in the Cloud using NASA Earth Science Data.|
|SWOT Oceanography Workshop||March 2022||Preparing for Surface Water and Ocean Topography (SWOT) and enable the (oceanography) science team to be ready for processing and handling the large volumes of SWOT SSH data in the cloud.|
|ECOSTRESS Workshop||April 2022||Exposing ECOSTRESS data users to ECOSTRESS version 2 (v2) data products in the cloud. Learning objectives focus on how to find and access ECOSTRESS v2 data from Earthdata Cloud either by downloading or accessing the data on the cloud.|
|LP DAAC University Events||April, May 2022||A series of Jupyter Notebooks, written in Python, demonstrating how to get started with NASA Earthdata in the cloud. Topics include: Cloud Data Access in AWS, Cloud Optimized Data, Data Discovery using STAC via NASA’s CMR-STAC API, Working with Cloud Data.|
The main outcome was that these events markedly raise cloud comfort level.
Understanding the “why” was a big part of this - why move workflows to the Cloud. Learners’ takeaways predominantly centered around improved conceptual understanding of why and when to use, or not use, the cloud… while also recognizing that there is a significant learning curve and time investment required for adoption.
We also saw common pain points: inconsistent data availability and service offerings leads to difficulties reusing a given workflow; lack of common and robust learning resources; the Earthdata Cloud ecosystem is complex and overwhelming, leading to difficulties knowing when to use a given workflow or tool/service.
Moving forward, we have ideas for supporting users in the Cloud:
Recognizing easy cloud access as a core service - our 2i2c Jupyter Hub has been critical for reducing barriers to cloud entry and having a shared environment to meet users where they are (see this blog by Luis Lopez for more).
Continuing to close the loop between the users we work with and our engineers to build solutions together - leveraging the 2i2c environment as we understand cost and funding mechanisms.
Earthdata Cloud Cookbook: Workflow and Vocab Cheatsheets
We have been creating Workflow and Vocab Cheatsheets to increase accessibility to data & resources, as well as many tools, and lots of (new) jargon. They are conceptual, practical, or reference guides to help users find the paths and tools most useful for a given need; we recognize there is a range of where in the learning process the users find themselves → a range of guides & cheatsheets. These cheatsheets and guides have been developed with NASA Openscapes and other DAAC mentors - consistency across DAACs, in messaging, information, and user experience. They are currently available digitally:
Tools & Services Roadmap - a practical guide for learning and selecting the right tool or service for a given use case
Workflow Cheatsheet - Practical reference guide as user begins taking the conceptual pieces to explore and implement in their own workflows
- Guide to selecting from available tools to enable and implement the Access Pathway(s)
- Links concepts and tool resources (tutorials)
The Earthdata Cloud Cookbook itself is a curated collection of tutorials we’ve iterated on and adapted following versions and feedback from live (virtual) training events. It focuses on the common steps across DAACs/users, and is available for self-paced learning. It is under active, open development, and links back to the underlying GitHub repo. It is a place to learn, share, and experiment with NASA Earthdata on the Cloud. We know this has a lot of moving parts, and we are iterating as we go, and welcome feedback and contributions.
earthaccess python library
earthaccess supports NASA Data Search and Access in Python. (It was previously named earthdata, and we are updating documentation).
Reproducible workflows are extremely important in the age of cloud data access, cloud computing, and open science. In this context, we are developing
earthaccess, a python library that aims to simplify data discovery and access for those using the PyData ecosystem (xarray, dask, numpy). Using this library eliminates the need to know the intricacies of NASA’s Application Programming Interfaces (APIs) and cloud data storage systems.
Currently, in order to programmatically access NASA datasets, users must be familiar with many APIs (we call this “API fragmentation”), including EDL (How to use it with OAuth, CURL, WGET etc, and .netrc); CMR (How to query for what we want, How to read the metadata that CMR returns); Cloud (AWS, S3 buckets, S3 credentials), etc.
earthdata will abstract this complexity for the user so that they can run a few lines of Python code and have the output return their query as Python objects, and they can carry on with their important science!
earthaccess. Note that with the renaming of the library, this code would now run following
import earthaccessrather than
We are developing this Python library with several use cases in mind:
- We work at the granule, or file, level
- If we already use OPeNDAP/Harmony in an effective way,
earthaccessdoes not support subsetting, or GIS operations (yet)
- If we already use OPeNDAP/Harmony in an effective way,
- We don’t require granule pre-processing.
- (see above)
- We are not cloud experts and just want to access the data.
See it in action!! Analyzing Sea Level Rise Using Earth Data in the Cloud
These are highlights from our work as a NASA Openscapes Mentor community. We’ve focused on teaching adult learners across experiences and career stages, visualizing concepts and complex workflows, and creating a python package that abstracts many of these backend components to make cloud access truly only a few lines of code. We have more to share, which we will continue to do this year at the AGU Fall Meeting, Tuesday morning in the IN22C - Environmental Data User Support with Cloud-Based User Services II Poster session. If you’re there, please find us, we’d love to connect more about supporting researchers with NASA Earthdata in the cloud and open science!