Earthdata Search and Discovery

Author

Eli Holmes adapted from work by Luis Lopez and Carl Boettiger

Learning Objectives
  1. How to authenticate with earthdatalogin
  2. How to use earthdatalogin to search for data using spatial and temporal filters
  3. How to make a plot without downloading the data

Summary

In this example we will use the earthdatalogin R package to search for data collections from NASA Earthdata. earthdatalogin is a R package that simplifies data discovery and access to NASA’s Common Metadata Repository (CMR) API Search API for NASA Earthdata. Despite the name, the NASA Earthdata also holds NOAA data (which we will use today).

For more on earthdatalogin visit the earthdatalogin GitHub page and/or the earthdatalogin documentation site. Be aware that earthdatalogin is under active development.

Terminology

  • NetCDF files: network Common Data Form; is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction. Each of these variables can be displayed through a dimension (such as time) in ArcGIS by making a layer or table view from the netCDF file. Learn more here.
  • tif or tiff or geo tiff file: is used as an interchange format for georeferenced raster imagery. GeoTIFF is in wide use in NASA Earth science data systems. Learn more here.
  • raster: is a matrix of cells (or pixels) organized into rows and columns (or a grid) where each cell contains a value representing information, such as temperature. Rasters are digital aerial photographs, imagery from satellites, digital pictures, or even scanned maps.Learn more here.
  • GDAL: is a translator library for raster and vector geospatial data formats. As a library, it presents a single raster abstract data model and single vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing. Learn more here.

Prerequisites

An Earthdata Login account is required to access data from NASA Earthdata. Please visit https://urs.earthdata.nasa.gov to register as a new user and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

You will need the R package earthdatalogin with version 0.0.2.99 (dev version) or later.

Load Required Packages

We are using the JupyterHub and all necessary packages are already installed for you.

Note: See the set-up tab (in left nav bar) for instructions on getting set up on your own computer, but be aware that it is common to run into trouble getting GDAL set up properly to handle netCDF files. Using a Docker image (and Python) is often less aggravating.

# devtools::install_github("boettiger-lab/earthdatalogin")
library(earthdatalogin)
library(terra)
terra 1.7.71

Authentication for NASA Earthdata

We will start by authenticating using our Earthdata Login credentials. Authentication is not necessarily needed to search for publicly available data collections in Earthdata, but is always needed to download or access data from the NASA Earthdata archives. We can use edl_netrc() from the earthdatalogin package to create a .netrc file that will store our credentials.

The first time you run authentication use:

earthdatalogin::edl_netrc(
  username = "user", # add your user name
  password = "password" # add you password
)

This will put your login info in a netrc file located at:

earthdatalogin:::edl_netrc_path()
[1] "/home/jovyan/.local/share/R/earthdatalogin/netrc"

You can open a terminal and run cat /home/jovyan/.local/share/R/earthdatalogin/netrc to see that it has your username and login.

Once your netrc file is saved, you can use earthdatalogin::edl_netrc() to authenticate.

earthdatalogin::edl_netrc()

For the purposes of this workshop, edl_netrc() will work by using a default public account login. Feel free to login with your own [NASA Earthdata account](https://urs.earthdata.nasa.gov/home.

Search for data

There are multiple keywords we can use to discover data from collections. The table below contains the short_name, concept_id, and doi for some collections we are interested in for the tutorials today. Each of these can be used to search for data or information related to the collection we are interested in.

Shortname Collection Concept ID DOI
MUR-JPL-L4-GLOB-v4.1 C1996881146-POCLOUD 10.5067/GHGMR-4FJ04
AVHRR_OI-NCEI-L4-GLOB-v2.1 C2036881712-POCLOUD 10.5067/GHAAO-4BC21

How can we find the shortname, concept_id, and doi for collections not in the table above? Let’s take a quick detour:

  1. Navigate to https://search.earthdata.nasa.gov/search 2: Search for “GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis” or click this link (screenshot below).

If we hover over the top box, find and click on the more information button (an i with a circle around it). On this page, you will see the DOI. Now click “View More Info” to get to https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html.

On that page you will see the “short name” MUR-JPL-L4-GLOB-v4.1. Note the short name was also on the first search page (though it wasn’t labeled as the short name, there).

Search by short name

short_name <- 'MUR-JPL-L4-GLOB-v4.1'

Let’s set some time bounds.

tbox <- c("2020-01-16", "2020-12-16")

And now we search!

results <- earthdatalogin::edl_search(
    short_name = short_name,
    version = "4.1",
    temporal = tbox
)
length(results) # how many links were returned
[1] 336
results[1:3] # let's see the first 3 of these links
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200116090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200117090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200118090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"

In this example we used the short_name parameter to search from our desired data set. However, there are multiple ways to specify the collection(s) we are interested in. Alternative parameters include:

  • doi: request collection by digital object identifier (e.g., doi = '10.5067/GHAAO-4BC21')

NOTE: Each Earthdata collect has a unique concept_id and doi. This is not the case with short_name, which can be associated with multiple versions of a collection. If multiple versions of a collection are publicly available, using the short_name parameter with return all versions available. It is advised to use the version parameter in conjunction with the short_name parameter with searching.

We can refine our search by passing more parameters that describe the spatiotemporal domain of our use case. Here, we use the temporal parameter to request a date range and the bounding_box parameter to request granules that intersect with a bounding box.

bbox <- c(xmin=-73.5, ymin=33.5, xmax=-43.5, ymax=43.5) 
bbox
 xmin  ymin  xmax  ymax 
-73.5  33.5 -43.5  43.5 
results <- earthdatalogin::edl_search(
    short_name = short_name,
    version = "4.1",
    temporal = tbox,
    bounding_box = paste(bbox,collapse=",")
)
length(results)
[1] 336
results[1:3]
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200116090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200117090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200118090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"

Working with earthdatalogin returns

Following the search for data, you’ll likely take one of two pathways with those results. You may choose to download the assets that have been returned to you or you may choose to continue working with the search results within the R environment.

Download earthdatalogin results

In some cases you may want to download your assets. The earthdatalogin::edl_download() function makes downloading the data from the search results very easy. We won’t download the MUR SST file for this tutorial because it is 673 Gb, but you could with the code below, if inclined.

earthdatalogin::edl_download(
    results[1],
    dest = here::here("test.nc")
)

Work in the cloud

We do not have to download the data to work with it or at least not until we need to compute with it or plot it. Let’s look at a smaller dataset.

oi <- earthdatalogin::edl_search(
    short_name = "AVHRR_OI-NCEI-L4-GLOB-v2.1",
    version = "2.1",
    temporal = c("2020-01-16", "2020-01-17")
)
oi
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200116120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200117120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"

Let’s try plotting this. I am going to authenticate again just to make sure my token did not expire. To search, we don’t need to authenticate, but to plot or download, we do.

# Re-authenticate (just in case)
earthdatalogin::edl_netrc()
library(terra)
ras <- terra::rast(x = oi[1], vsi=TRUE)
plot(ras)

Troubleshooting

If you get the following error:

Error: [rast] file does not exist: /vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc

The following are possible problems:

  • you are not logged in using earthdatalogin::edl_netrc(). Try running the code again.
  • you inadvertantly installed terra from source. Try deleting the package via the ‘x’ inthe package tab and reinstalling.
  • you do not have the End User Licence Agreement (EULA)/permissions to use that data set. Go to https://urs.earthdata.nasa.gov/home, log in and then go to the EULA tab.
  • GDAL installation is not properly handling netCDF files. That is hard. You may be out of luck.

Also try this example script from the ?earthdatalogin::edl_netrc documentation that uses a .tif file instead of .netCDF.

url <- earthdatalogin::lpdacc_example_url()
ras <- terra::rast(url, vsi=TRUE)
plot(ras)

How to you accept EULA’s? Go to https://urs.earthdata.nasa.gov/profile. Look for the EULA tab and accept the one that looks likely. Unfortunately, it is hard to find out which one you need. You can just accept all of them (or all that look possible). Then try your code again.

Conclusion

This concludes tutorial 1. You have worked with remote-sensing data in the cloud and plotted it. Way to go!

Next we will learn to subset the data so we can work with bigger datasets in the cloud without downloading the whole dataset.