# devtools::install_github("boettiger-lab/earthdatalogin")
library(earthdatalogin)
library(terra)
terra 1.7.71
Eli Holmes adapted from work by Luis Lopez and Carl Boettiger
earthdatalogin
earthdatalogin
to search for data using spatial and temporal filtersIn this example we will use the earthdatalogin
R package to search for data collections from NASA Earthdata. earthdatalogin
is a R package that simplifies data discovery and access to NASA’s Common Metadata Repository (CMR) API Search API for NASA Earthdata. Despite the name, the NASA Earthdata also holds NOAA data (which we will use today).
For more on earthdatalogin
visit the earthdatalogin
GitHub page and/or the earthdatalogin
documentation site. Be aware that earthdatalogin
is under active development.
NetCDF
files: network Common Data Form; is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction. Each of these variables can be displayed through a dimension (such as time) in ArcGIS by making a layer or table view from the netCDF file. Learn more here.tif
or tiff
or geo tiff file: is used as an interchange format for georeferenced raster imagery. GeoTIFF is in wide use in NASA Earth science data systems. Learn more here.An Earthdata Login account is required to access data from NASA Earthdata. Please visit https://urs.earthdata.nasa.gov to register as a new user and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
You will need the R package earthdatalogin with version 0.0.2.99 (dev version) or later.
We are using the JupyterHub and all necessary packages are already installed for you.
Note: See the set-up tab (in left nav bar) for instructions on getting set up on your own computer, but be aware that it is common to run into trouble getting GDAL set up properly to handle netCDF files. Using a Docker image (and Python) is often less aggravating.
We will start by authenticating using our Earthdata Login credentials. Authentication is not necessarily needed to search for publicly available data collections in Earthdata, but is always needed to download or access data from the NASA Earthdata archives. We can use edl_netrc()
from the earthdatalogin
package to create a .netrc
file that will store our credentials.
The first time you run authentication use:
This will put your login info in a netrc
file located at:
You can open a terminal and run cat /home/jovyan/.local/share/R/earthdatalogin/netrc
to see that it has your username and login.
Once your netrc
file is saved, you can use earthdatalogin::edl_netrc()
to authenticate.
For the purposes of this workshop, edl_netrc()
will work by using a default public account login. Feel free to login with your own [NASA Earthdata account](https://urs.earthdata.nasa.gov/home.
There are multiple keywords we can use to discover data from collections. The table below contains the short_name
, concept_id
, and doi
for some collections we are interested in for the tutorials today. Each of these can be used to search for data or information related to the collection we are interested in.
Shortname | Collection Concept ID | DOI |
---|---|---|
MUR-JPL-L4-GLOB-v4.1 | C1996881146-POCLOUD | 10.5067/GHGMR-4FJ04 |
AVHRR_OI-NCEI-L4-GLOB-v2.1 | C2036881712-POCLOUD | 10.5067/GHAAO-4BC21 |
How can we find the shortname
, concept_id
, and doi
for collections not in the table above? Let’s take a quick detour:
If we hover over the top box, find and click on the more information button (an i with a circle around it). On this page, you will see the DOI
. Now click “View More Info” to get to https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html.
On that page you will see the “short name” MUR-JPL-L4-GLOB-v4.1
. Note the short name was also on the first search page (though it wasn’t labeled as the short name, there).
Let’s set some time bounds.
And now we search!
results <- earthdatalogin::edl_search(
short_name = short_name,
version = "4.1",
temporal = tbox
)
length(results) # how many links were returned
[1] 336
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200116090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200117090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200118090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
In this example we used the short_name
parameter to search from our desired data set. However, there are multiple ways to specify the collection(s) we are interested in. Alternative parameters include:
doi
: request collection by digital object identifier (e.g., doi = '10.5067/GHAAO-4BC21'
)NOTE: Each Earthdata collect has a unique concept_id
and doi
. This is not the case with short_name
, which can be associated with multiple versions of a collection. If multiple versions of a collection are publicly available, using the short_name
parameter with return all versions available. It is advised to use the version
parameter in conjunction with the short_name
parameter with searching.
We can refine our search by passing more parameters that describe the spatiotemporal domain of our use case. Here, we use the temporal
parameter to request a date range and the bounding_box
parameter to request granules that intersect with a bounding box.
xmin ymin xmax ymax
-73.5 33.5 -43.5 43.5
results <- earthdatalogin::edl_search(
short_name = short_name,
version = "4.1",
temporal = tbox,
bounding_box = paste(bbox,collapse=",")
)
length(results)
[1] 336
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200116090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200117090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20200118090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
earthdatalogin
returnsFollowing the search for data, you’ll likely take one of two pathways with those results. You may choose to download the assets that have been returned to you or you may choose to continue working with the search results within the R environment.
earthdatalogin
resultsIn some cases you may want to download your assets. The earthdatalogin::edl_download()
function makes downloading the data from the search results very easy. We won’t download the MUR SST file for this tutorial because it is 673 Gb, but you could with the code below, if inclined.
We do not have to download the data to work with it or at least not until we need to compute with it or plot it. Let’s look at a smaller dataset.
oi <- earthdatalogin::edl_search(
short_name = "AVHRR_OI-NCEI-L4-GLOB-v2.1",
version = "2.1",
temporal = c("2020-01-16", "2020-01-17")
)
oi
[1] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"
[2] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200116120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"
[3] "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200117120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc"
Let’s try plotting this. I am going to authenticate again just to make sure my token did not expire. To search, we don’t need to authenticate, but to plot or download, we do.
If you get the following error:
Error: [rast] file does not exist: /vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc
The following are possible problems:
earthdatalogin::edl_netrc()
. Try running the code again.terra
from source. Try deleting the package via the ‘x’ inthe package tab and reinstalling.netCDF
files. That is hard. You may be out of luck.Also try this example script from the ?earthdatalogin::edl_netrc
documentation that uses a .tif
file instead of .netCDF
.
How to you accept EULA’s? Go to https://urs.earthdata.nasa.gov/profile. Look for the EULA tab and accept the one that looks likely. Unfortunately, it is hard to find out which one you need. You can just accept all of them (or all that look possible). Then try your code again.
This concludes tutorial 1. You have worked with remote-sensing data in the cloud and plotted it. Way to go!
Next we will learn to subset the data so we can work with bigger datasets in the cloud without downloading the whole dataset.