library(terra)
library(earthdatalogin)
library(sf)
library(ggplot2)
Extract data within a boundary
- How to access and download sea surface temperature from NASA Earthdata
- How to apply shapefiles as masks to satellite data
- How to compute monthly average sea surface temperature
Summary
In this example, we will utilize the earthdatalogin
R package to retrieve sea surface temperature data from NASA Earthdata search. The earthdatalogin
R package simplifies the process of discovering and accessing NASA Earth science data.
This example is adapted from the NOAA CoastWatch Satellite Data Tutorials. To explore the full range of tutorials on accessing and utilizing oceanographic satellite data, visit the NOAA CoastWatch Tutorial Github repository.
For more on earthdatalogin
visit the earthdatalogin
GitHub page and/or the earthdatalogin
documentation site. Be aware that earthdatalogin
is under active development and that we are using the development version on GitHub.
Terminology
shapefiles
: is a simple, nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). Learn more here.
Prerequisites
The tutorials today can be run with the guest Earthdata Login that is in earthdatalogin
. However, if you will be using the NASA Earthdata portal more regularly, please register for an Earthdata Login account. Please https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
Import Required Packages
Note: See the set-up tab (in left nav bar) for instructions on getting set up on your own computer, but be aware that getting it is common to run into trouble getting GDAL set up properly to handle netCDF files. Using a Docker image (and Python) is often less aggravating.
Datasets used
GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI
This NOAA blended SST is a moderate resolution satellite-based gap-free sea surface temperature (SST) product. We will use the daily data. https://cmr.earthdata.nasa.gov/search/concepts/C2036881712-POCLOUD.html
Longhurst Marine Provinces
The dataset represents the division of the world oceans into provinces as defined by Longhurst (1995; 1998; 2006). This division has been based on the prevailing role of physical forcing as a regulator of phytoplankton distribution.
The Longhurst Marine Provinces dataset is available online (https://www.marineregions.org/downloads.php) and within the shapes folder associated with this repository. For this exercise we will use the Gulf Stream province (ProvCode: GFST
)
Load packages
Load boundary coordinates
The shapefile for the Longhurst marine provinces includes a list of regions.
For this exercise, we will only use the boundary of one province, the Gulf Stream region (“GFST”).
# Set directory path for shapefile
<- '../resources/longhurst_v4_2010/'
dir_path
# Import shape files (Longhurst coordinates)
<- sf::read_sf(dsn = dir_path, layer = "Longhurst_world_v4_2010")
shapes
# Example List of all the province names
$ProvCode shapes
[1] "BPLR" "ARCT" "SARC" "NADR" "GFST" "NASW" "NATR" "WTRA" "ETRA" "SATL"
[11] "NECS" "CNRY" "GUIN" "GUIA" "NWCS" "MEDI" "CARB" "NASE" "BRAZ" "FKLD"
[21] "BENG" "MONS" "ISSG" "EAFR" "REDS" "ARAB" "INDE" "INDW" "AUSW" "BERS"
[31] "PSAE" "PSAW" "KURO" "NPPF" "NPSW" "TASM" "SPSG" "NPTG" "PNEC" "PEQD"
[41] "WARM" "ARCH" "ALSK" "CCAL" "CAMR" "CHIL" "CHIN" "SUND" "AUSE" "NEWZ"
[51] "SSTC" "SANT" "ANTA" "APLR"
# Get boundary coordinates for Gulf Stream region (GFST)
<- shapes[shapes$ProvCode == "GFST",]
GFST
<- sf::st_coordinates(GFST)[,1]
xcoord <- sf::st_coordinates(GFST)[,2] ycoord
Search data from NASA Earthdata with the dataset unique name and coordinates/dates
First, connect to NASA Earthdata with no credentials
::edl_netrc() earthdatalogin
Then, define your search and cropping criteria
# Dataset unique name
<- 'AVHRR_OI-NCEI-L4-GLOB-v2.1'
short_name
# Set boundaries based on the shapefile
<- c(xmin=min(xcoord), ymin=min(ycoord), xmax=max(xcoord), ymax=max(ycoord))
bbox
# Set time range
<- c("2020-01-01", "2020-04-01")
tbox
# Search data that match the boundaries and time range
<- earthdatalogin::edl_search(
results short_name = short_name,
version = "2.1",
temporal = tbox,
bounding_box = paste(bbox, collapse = ",")
)
# Check number of files
length(results)
[1] 93
There are 93 files.
Apply shapefiles as mask to satellite data
# Select the first result
<- terra::rast(results[1], vsi = TRUE)
ras
# Extract SST from the multi-layer raster data
<- ras[["analysed_sst"]] ras_sst
Convert shape to SpatVector for terra
.
# Vectorize shapes
<- terra::vect(shapes)
shp
# Get boundaries for GFST
<- shp[shp$ProvCode == "GFST",] GFST
Plot the SST data.
plot(ras_sst)
Plot GFST boundaries from shapefile.
plot(GFST,col='red')
Mask SST with the GFST boundaries and plot.
<- terra::mask(ras_sst, GFST)
masked_rc
# Visualize the SST in GFST Province and crop to the GFST extent
plot(masked_rc, ext = GFST)
Compute monthly average of SST
We will construct a data cube to compute monthly average for sea surface temperature data within the boundary. To minimize data loading times, the first 10 results, which correspond to approximately two months of data, will be used for this exercise.
Select the first 10 SST results (end of January and beginning of February).
<- terra::rast(results[c(25:35)], vsi = TRUE) ras_all
Select SST data. The trim and mask operations are memory intensive and we want to select only the layer we will be working with.
<- ras_all["analysed_sst",] rc_sst
Crop to the GFST boundaries.
<- terra::crop(rc_sst, GFST) rc_sst
Trim the SST data to the boundaries of GFST.
<- terra::mask(rc_sst, GFST) rc_sst
Calculate mean SST over the entire time series (10 days) and map it.
<- terra::mean(rc_sst, na.rm=TRUE) raster_mean
Plot the mean SST.
plot(raster_mean)
Calculate monthly mean SST means across rasters
First create a function to convert times to year-month format. This will create a vector that shows which datas are in which month-year.
<- function(x) {
year_month format(as.Date(time(x), format="%Y-%m-%d"), "%Y-%m")
}<- year_month(rc_sst)
ym ym
[1] "2020-01" "2020-01" "2020-01" "2020-01" "2020-01" "2020-01" "2020-01"
[8] "2020-01" "2020-02" "2020-02" "2020-02"
Compute raster mean grouped by Year-month. tapp
is the terra
equivalent of tapply
and allow you to apply a function to groups of raster time layers. This allows you to do temporal aggregation. Use ?tapp
to learn about this function.
<- terra::tapp(rc_sst, ym, fun = mean)
monthly_mean_rast monthly_mean_rast
class : SpatRaster
dimensions : 40, 120, 2 (nrow, ncol, nlyr)
resolution : 0.25, 0.25 (x, y)
extent : -73.5, -43.5, 33.5, 43.5 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
source(s) : memory
names : X2020.01, X2020.02
min values : 277.0425, 276.9367
max values : 297.1500, 296.9033
Compute mean across raster grouped by Year-month. global()
applies a function to the entire raster. Use ?global
to learn about this function.
<- terra::global(monthly_mean_rast, fun = mean, na.rm=TRUE)
monthly_means monthly_means
mean
X2020.01 289.3776
X2020.02 289.0583
Convert to data frame
We convert to a data frame to plot time series.
# Convert raster into data.frame
<- as.data.frame(monthly_means)
monthly_means_df
# Create a year_month column
$year_month <- sub("X", "", rownames(monthly_means_df)) monthly_means_df
Plot monthly mean of sea surface temperature within GFST province
ggplot(data = monthly_means_df, aes(x = year_month, y = mean, group = 1)) +
geom_line() +
geom_point() +
xlab("Year.Month") +
ylab("Mean SST (F)")
Troubleshooting
If you get the following error:
Warning: Opening a /vsi file with the netCDF driver requires Linux userfaultfd to be available. Or you may set the GDAL_SKIP=netCDF configuration option to force the use of the HDF5 driver. (GDAL error 1)Error: [rast] file does not exist: /vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20191231120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc
Then go back and run the authentication code at the top.