NASA Openscapes: efforts to support end users in the journey to the cloud

We are a mentor community across NASA Earth science data centers (Distributed Active Archive Centers - DAACs). We are co-creating and teaching common tutorials to support researchers as they migrate analytical workflows to the Cloud.

We presented at several workshops and events this fall, including a 25-minute talk at NASA EOSDIS Systems Engineering Technical Interchange Meeting (SE TIM) that we reused at NASA’s Open Source Science Data Repositories (OSSDR) Workshop. These presentations shared perspectives from the NASA Openscapes DAAC mentors that we have developed through collaborating across DAACs, developing teaching resources, teaching, and listening to feedback from learners, which we iterate from after each time.

The purpose of these talks was to share and learn from each other, grow and improve DAAC support of cloud-archived science and applications users, while following open source, open science best practices. We will be presenting these ideas further at the AGU Fall Meeting in December 2022. This post shares brief highlights from these talks.

Slides

SE TIM Workshop, August 2022
Open Source Science Data Repositories Workshop, September 2022

Recording

DAAC Internal Training - Knowledge From Within

Throughout 2022 the GES DISC DAAC led internal trainings with our DAAC colleagues to give DAAC staff hands-on experience in the cloud. These kinds of internal trainings are being shared and leveraged by other DAACs, who will be exploring this same exact model and leveraging these materials in the future. The goals were to “walk a mile in the user’s shoes”:

Teach cloud basics and definitions
Interact with one cloud data access method: direct S3 access
Grant access to cloud workspace

How did we do it? Step 1 was “Educate” - we presented and showcased cloud user resources at weekly meetings ahead of workshops, inviting people to join our workshop to learn hands-on. Step 2 was “Prepare” - we split participants by skill and experience level with Python and cloud computing. Step 3 was “Interact” - we taught with bite-sized lessons using Jupyter Notebooks, and encouraged folks to follow hands-on through our teaching style of live-coding and type-along during this interactive workshop.

How did it go? We taught 33 colleagues this way. In terms of skill and comfort level in advance, there were 12 Beginners, 14 Intermediate, and 7 Advanced. Positive feedback included: “More simple & straightforward than I thought”; Breaking up code pieces helped; Pre-filled notebooks useful to refer to; Easy to follow; GitHub clone was easy!; Breakout rooms for troubleshooting. Challenges included: Window/tab management; Too much material; Live-coding typing was too fast; Continue to define jargon/terminology throughout the workshop. We also asked about prerequisite knowledge colleagues would have benefited from, and they listed familiarity with Earthdata tools; Basic Python; Basic cloud (AWS) understanding & terminology; Account set up for cloud access.

diagram titled What did we learn? with four blue squares in a 2 x 2 layout. Text in the blue squares is written out in the blog post text below.

What did we learn from internal training: four key elements.

Overall, we had four main lessons learned:

Learning curve is steep! No one left an expert and we continue to help staff who are experimenting in the cloud
Provide resources that are easy to revisit! Website, slides, instructions, recordings are all useful for staff who spend more time with the material
Continued support and education are critical - Necessary to host refresher workshops to exercise the knowledge and introduce new tools and methods
Lay a foundation with cloud basics and terminologies - Introduce terms and concepts early, perhaps in a separate meeting or clinic, and continue defining them throughout the workshop

End-user Training Events

Throughout 2022 we also led many training events with end-users - largely academic faculty and students using NASA Earthdata.

Event	Date	Focus Area / Goals
Cloud Hackathon	November 2021	Five day collaborative open science learning experience aimed at exploring, creating, and promoting effective cloud-based science and applications workflows using NASA Earthdata Cloud data, tools, and services (among others).
AGU Workshop	December 2021	Half-day workshop focused on enabling Analysis in the Cloud using NASA Earth Science Data.
SWOT Oceanography Workshop	March 2022	Preparing for Surface Water and Ocean Topography (SWOT) and enable the (oceanography) science team to be ready for processing and handling the large volumes of SWOT SSH data in the cloud.
ECOSTRESS Workshop	April 2022	Exposing ECOSTRESS data users to ECOSTRESS version 2 (v2) data products in the cloud. Learning objectives focus on how to find and access ECOSTRESS v2 data from Earthdata Cloud either by downloading or accessing the data on the cloud.
LP DAAC University Events	April, May 2022	A series of Jupyter Notebooks, written in Python, demonstrating how to get started with NASA Earthdata in the cloud. Topics include: Cloud Data Access in AWS, Cloud Optimized Data, Data Discovery using STAC via NASA’s CMR-STAC API, Working with Cloud Data.

The main outcome was that these events markedly raise cloud comfort level.

two pie charts labeled Before, After, with legend to the right. Before pie has 3 slices with 61.9% 'Little to no experience using the cloud - not comfortable'; 33.3% 'Beginner cloud experience - somewhat comfortable'; remainder 'Intermediate cloud experience - comfortable'. After pie has 3 slices with 66.7% 'Intermediate cloud experience - comfortable'; 19% 'Advanced cloud experience - very comfortable'; 14.3% 'Beginner cloud experience - somewhat comfortable'. Quoted text below pies says [We need] Better documentation/tutorials for how to access data over the cloud. It would have been **extremely difficult** to do any of this **without the help of the hackathon**.

Users’ comfort levels before and after training

Understanding the “why” was a big part of this - why move workflows to the Cloud. Learners’ takeaways predominantly centered around improved conceptual understanding of why and when to use, or not use, the cloud… while also recognizing that there is a significant learning curve and time investment required for adoption.

We also saw common pain points: inconsistent data availability and service offerings leads to difficulties reusing a given workflow; lack of common and robust learning resources; the Earthdata Cloud ecosystem is complex and overwhelming, leading to difficulties knowing when to use a given workflow or tool/service.

Moving forward, we have ideas for supporting users in the Cloud:

Recognizing easy cloud access as a core service - our 2i2c Jupyter Hub has been critical for reducing barriers to cloud entry and having a shared environment to meet users where they are (see this blog by Luis Lopez for more).
Continuing to close the loop between the users we work with and our engineers to build solutions together - leveraging the 2i2c environment as we understand cost and funding mechanisms.

diagram with center zigzag line with 4 circles at the angles, labeled Continuing to support Openscapes 2i2C Hub; Spinning up a permanent cloud environment; Advanced Cloud Processing; Open Science Enablement

Ideas for supporting users in the cloud

Earthdata Cloud Cookbook: Workflow and Vocab Cheatsheets

We have been creating Workflow and Vocab Cheatsheets to increase accessibility to data & resources, as well as many tools, and lots of (new) jargon. They are conceptual, practical, or reference guides to help users find the paths and tools most useful for a given need; we recognize there is a range of where in the learning process the users find themselves → a range of guides & cheatsheets. These cheatsheets and guides have been developed with NASA Openscapes and other DAAC mentors - consistency across DAACs, in messaging, information, and user experience. They are currently available digitally:

NASA Earthdata Cloud Cookbook
On DAAC websites e.g. PO.DAAC - Cloud Data Page

Examples:

Tools & Services Roadmap - a practical guide for learning and selecting the right tool or service for a given use case

screenshot of Workflow Cheatsheet that is accessible from link in the figure caption

A Workflow Cheatsheet for NASA Earthdata Cloud, outlining steps for: find data, transform, download, authentication, access, and tech set up. See the Cheatsheet for clickable icons that lead to tutorials!

Workflow Cheatsheet - Practical reference guide as user begins taking the conceptual pieces to explore and implement in their own workflows

Guide to selecting from available tools to enable and implement the Access Pathway(s)
Links concepts and tool resources (tutorials)

The Earthdata Cloud Cookbook itself is a curated collection of tutorials we’ve iterated on and adapted following versions and feedback from live (virtual) training events. It focuses on the common steps across DAACs/users, and is available for self-paced learning. It is under active, open development, and links back to the underlying GitHub repo. It is a place to learn, share, and experiment with NASA Earthdata on the Cloud. We know this has a lot of moving parts, and we are iterating as we go, and welcome feedback and contributions.

`earthaccess` python library

earthaccess supports NASA Data Search and Access in Python. (It was previously named earthdata, and we are updating documentation).

Reproducible workflows are extremely important in the age of cloud data access, cloud computing, and open science. In this context, we are developing earthaccess, a python library that aims to simplify data discovery and access for those using the PyData ecosystem (xarray, dask, numpy). Using this library eliminates the need to know the intricacies of NASA’s Application Programming Interfaces (APIs) and cloud data storage systems.

Currently, in order to programmatically access NASA datasets, users must be familiar with many APIs (we call this “API fragmentation”), including EDL (How to use it with OAuth, CURL, WGET etc, and .netrc); CMR (How to query for what we want, How to read the metadata that CMR returns); Cloud (AWS, S3 buckets, S3 credentials), etc.

earthdata will abstract this complexity for the user so that they can run a few lines of Python code and have the output return their query as Python objects, and they can carry on with their important science!

Example code using earthaccess. Note that with the renaming of the library, this code would now run following import earthaccess rather than import earthdata.

We are developing this Python library with several use cases in mind:

We work at the granule, or file, level
- If we already use OPeNDAP/Harmony in an effective way, earthaccess does not support subsetting, or GIS operations (yet)
We don’t require granule pre-processing.
- (see above)
We are not cloud experts and just want to access the data.

See it in action!! Analyzing Sea Level Rise Using Earth Data in the Cloud

Onward!

These are highlights from our work as a NASA Openscapes Mentor community. We’ve focused on teaching adult learners across experiences and career stages, visualizing concepts and complex workflows, and creating a python package that abstracts many of these backend components to make cloud access truly only a few lines of code. We have more to share, which we will continue to do this year at the AGU Fall Meeting, Tuesday morning in the IN22C - Environmental Data User Support with Cloud-Based User Services II Poster session. If you’re there, please find us, we’d love to connect more about supporting researchers with NASA Earthdata in the cloud and open science!

DAAC Internal Training - Knowledge From Within

End-user Training Events

Earthdata Cloud Cookbook: Workflow and Vocab Cheatsheets

earthaccess python library

Onward!

`earthaccess` python library