We will authenticate our Earthaccess session, and then open the results like we did in the Search & Discovery section.
auth = earthaccess.login()# are we authenticated?ifnot auth.authenticated:# ask for credentials and persist them in a .netrc file auth.login(strategy="interactive", persist=True)
Get a vector of urls to our nc files
short_name ='MUR-JPL-L4-GLOB-v4.1'version ="4.1"date_start ="2020-01-01"date_end ="2020-04-01"date_range = (date_start, date_end)# min lon, min lat, max lon, max latbbox = (-75.5, 33.5, -73.5, 35.5) results = earthaccess.search_data( short_name = short_name, version = version, cloud_hosted =True, temporal = date_range, bounding_box = bbox,)
Granules found: 92
Crop and plot one netCDF file
Each MUR SST netCDF file is large so I do not want to download. Instead we will subset the data on the server side. We will start with one file.
Note that xarray works with “lazy” computation whenever possible. In this case, the metadata are loaded into JupyterHub memory, but the data arrays and their values are not — until there is a need for them.
Let’s print out all the variable names.
for v in ds.variables:print(v)
time
lat
lon
analysed_sst
analysis_error
mask
sea_ice_fraction
dt_1km_data
sst_anomaly
Of the variables listed above, we are interested in analysed_sst.
In addition to directly accessing the files archived and distributed by each of the NASA DAACs, many datasets also support services that allow us to customize the data via subsetting, reformatting, reprojection/regridding, and file aggregation. What does subsetting mean? To subset means to extract only the portions of a dataset that are needed for a given purpose.
There are three primary types of subsetting that we will walk through: 1. Temporal 2. Spatial 3. Variable
In each case, we will be excluding parts of the dataset that are not wanted using xarray. Note that “subsetting” is also called a data “transformation”.
# Display the full dataset's metadatads
<xarray.Dataset> Size: 29GB
Dimensions: (time: 1, lat: 17999, lon: 36000)
Coordinates:
* time (time) datetime64[ns] 8B 2020-01-16T09:00:00
* lat (lat) float32 72kB -89.99 -89.98 -89.97 ... 89.98 89.99
* lon (lon) float32 144kB -180.0 -180.0 -180.0 ... 180.0 180.0
Data variables:
analysed_sst (time, lat, lon) float64 5GB ...
analysis_error (time, lat, lon) float64 5GB ...
mask (time, lat, lon) float32 3GB ...
sea_ice_fraction (time, lat, lon) float64 5GB ...
dt_1km_data (time, lat, lon) timedelta64[ns] 5GB ...
sst_anomaly (time, lat, lon) float64 5GB ...
Attributes: (47)
estimated error standard deviation of analysed_sst
units :
kelvin
valid_min :
0
valid_max :
32767
comment :
uncertainty in "analysed_sst"
[647964000 values with dtype=float64]
mask
(time, lat, lon)
float32
...
long_name :
sea/land field composite mask
valid_min :
1
valid_max :
31
flag_masks :
[ 1 2 4 8 16]
flag_meanings :
open_sea land open_lake open_sea_with_ice_in_the_grid open_lake_with_ice_in_the_grid
comment :
mask can be used to further filter the data.
source :
GMT "grdlandmask", ice flag from sea_ice_fraction data
[647964000 values with dtype=float32]
sea_ice_fraction
(time, lat, lon)
float64
...
long_name :
sea ice area fraction
standard_name :
sea_ice_area_fraction
valid_min :
0
valid_max :
100
source :
EUMETSAT OSI-SAF, copyright EUMETSAT
comment :
ice fraction is a dimensionless quantity between 0 and 1; it has been interpolated by a nearest neighbor approach.
[647964000 values with dtype=float64]
dt_1km_data
(time, lat, lon)
timedelta64[ns]
...
long_name :
time to most recent 1km data
valid_min :
-127
valid_max :
127
source :
MODIS and VIIRS pixels ingested by MUR
comment :
The grid value is hours between the analysis time and the most recent MODIS or VIIRS 1km L2P datum within 0.01 degrees from the grid point. "Fill value" indicates absence of such 1km data at the grid point.
[647964000 values with dtype=timedelta64[ns]]
sst_anomaly
(time, lat, lon)
float64
...
long_name :
SST anomaly from a seasonal SST climatology based on the MUR data over 2003-2014 period
units :
kelvin
valid_min :
-32767
valid_max :
32767
comment :
anomaly reference to the day-of-year average between 2003 and 2014
Oceans > Ocean Temperature > Sea Surface Temperature
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
standard_name_vocabulary :
NetCDF Climate and Forecast (CF) Metadata Convention
southernmost_latitude :
-90.0
northernmost_latitude :
90.0
westernmost_longitude :
-180.0
easternmost_longitude :
180.0
spatial_resolution :
0.01 degrees
geospatial_lat_units :
degrees north
geospatial_lat_resolution :
0.01
geospatial_lon_units :
degrees east
geospatial_lon_resolution :
0.01
acknowledgment :
Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
creator_name :
JPL MUR SST project
creator_email :
ghrsst@podaac.jpl.nasa.gov
creator_url :
http://mur.jpl.nasa.gov
project :
NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
publisher_name :
GHRSST Project Office
publisher_url :
http://www.ghrsst.org
publisher_email :
ghrsst-po@nceo.ac.uk
processing_level :
L4
cdm_data_type :
grid
Now we will prepare a subset. We’re using essentially the same spatial bounds as above; however, as opposed to the earthaccess inputs above, here we must provide inputs in the formats expected by xarray. Instead of a single, four-element, bounding box, we use Python slice objects, which are defined by starting and ending numbers.
estimated error standard deviation of analysed_sst
units :
kelvin
valid_min :
0
valid_max :
32767
comment :
uncertainty in "analysed_sst"
[40401 values with dtype=float64]
mask
(time, lat, lon)
float32
...
long_name :
sea/land field composite mask
valid_min :
1
valid_max :
31
flag_masks :
[ 1 2 4 8 16]
flag_meanings :
open_sea land open_lake open_sea_with_ice_in_the_grid open_lake_with_ice_in_the_grid
comment :
mask can be used to further filter the data.
source :
GMT "grdlandmask", ice flag from sea_ice_fraction data
[40401 values with dtype=float32]
sea_ice_fraction
(time, lat, lon)
float64
...
long_name :
sea ice area fraction
standard_name :
sea_ice_area_fraction
valid_min :
0
valid_max :
100
source :
EUMETSAT OSI-SAF, copyright EUMETSAT
comment :
ice fraction is a dimensionless quantity between 0 and 1; it has been interpolated by a nearest neighbor approach.
[40401 values with dtype=float64]
dt_1km_data
(time, lat, lon)
timedelta64[ns]
...
long_name :
time to most recent 1km data
valid_min :
-127
valid_max :
127
source :
MODIS and VIIRS pixels ingested by MUR
comment :
The grid value is hours between the analysis time and the most recent MODIS or VIIRS 1km L2P datum within 0.01 degrees from the grid point. "Fill value" indicates absence of such 1km data at the grid point.
[40401 values with dtype=timedelta64[ns]]
sst_anomaly
(time, lat, lon)
float64
...
long_name :
SST anomaly from a seasonal SST climatology based on the MUR data over 2003-2014 period
units :
kelvin
valid_min :
-32767
valid_max :
32767
comment :
anomaly reference to the day-of-year average between 2003 and 2014
Create a data cube by combining multiple netCDF files
When we open multiple files, we use open_mfdataset(). Once again, we are doing lazy loading. Note this method works best if you are in the same Amazon Web Services (AWS) region as the data (us-west-2) and can use S3 connection. For the EDM workshop, we are on an Azure JupyterHub and are using https connection so this is much much slower. If we had spun up this JupyterHub on AWS us-west-2 where the NASA data are hosted, we could load a whole year of data instantly. We will load just a few days so it doesn’t take so long.
The grid value is hours between the analysis time and the most recent MODIS or VIIRS 1km L2P datum within 0.01 degrees from the grid point. "Fill value" indicates absence of such 1km data at the grid point.
The grid value is hours between the analysis time and the most recent MODIS or VIIRS 1km L2P datum within 0.01 degrees from the grid point. "Fill value" indicates absence of such 1km data at the grid point.
We learned how to subset xarray data cubes by time and space using sel() and slice(). Next we will show how to select via a shapefile. If you want to jump instead to creating monthly and seasonal means from a data cube, you can look at the 4-data-cube.ipynb tutorial or explore the gallery of xarray examples.