Python - Accessing CEFI Data via OPeNDAP

Author

Chia-Wei Hsu (NOAA PSL)

Packages used

To use python to access the ERDDAP server directly from your python script or jupyter-notebook, you will need

Xarray
netcdf4
matplotlib
folium

Note re packages

The package netcdf4 develop by UNIDATA is not needed in the import part of the python script. However, it is the essential package that support netCDF format output from Xarray and OPeNDAP access. The package matplotlib is also not needed in the import part of the python script. It is the essential package that support quick visualization from Xarray.

On this page, we will explore the process of extracting a subset of model simualtion data produce from the regional MOM6 model. The model output is to support the Climate Ecosystem and Fishery Initiative. We will showcase how to utilize OPeNDAP for accessing the data and visualize it on an interactive map. The currently available data can be viewed here. The contents of this folder encompass historical simulations derived from the regional MOM6 model spanning the years 1993 to 2019.

Import python packages

Thanks to the netCDF4/Pydap library, accessing data through OPeNDAP directly from Xarray is made seamless. For detailed usage guidance, the Xarray documentation offers excellent examples and explanations. Ex: if the OPeNDAP server require authentication, backends.PydapDataStore can help on storing the login data.

import xarray as xr

Access data (regular grid product)

From the previous page explaining OPeNDAP interface, we know that the OPeNDAP URL can be obtained from the form. In our case, we will get the URL from this form

opendap_url = "http://psl.noaa.gov/thredds/dodsC/Projects/CEFI/regional_mom6/northwest_atlantic/hist_run/regrid/ocean_monthly.199301-201912.ssh.nc"
ds_ssh = xr.open_dataset(opendap_url)

ds_ssh

Tip

When we use xr.open_dataset() to load the data from OPeNDAP, we actually only load the metadata and coordinates information. This provide a great way to peek at the data’s dimension and availability on our local machine in the xarray object format. The actual gridded data values at each grid point will only be downloaded from the PSL server when we add .load() to the dataset.

Since most OPeNDAP server will set a single time data transfer limit (PSL server has a 500MB limit), we cannot ds_ssh.load() the whole dataset. However, we can load the sea level map at a single time step which is smaller than 500MB.

ds_ssh_subset = ds_ssh.sel(time='2012-12').load()

ds_ssh_subset

From the xarray dataset object output above, we will notice the ssh value in ds_ssh_subset is now avaible and printed out while the ssh value in ds_ssh is not. This means the ssh values has been downloaded from the OPeNDAP server to the local machine memory.

Quick view of the variable

To check if the data makes sense and fits the chosen time and region, you can use Xarray’s .plot() method. This way, you see a graph instead of just numbers, making it easier to understand.

ds_ssh_subset.ssh.plot()

Science info

The map illustrates a clear distinction between high sea levels, primarily from tropical regions (hot and fresh water), and low sea levels, mainly from the Labrador Sea or other polar areas (cold and salty sea water). The meandering pattern of the sharp sea level gradient, indicating rapid sea level changes over a short distance, highlights the location of the Gulf Stream. This distinctive sea level feature is a result of the existence of western boundary currents, commonly observed on the western edges of major ocean basins, such as the Kuroshio in the Pacific basin.

Plotting the data on a interactive leaflet map

In this section, there are more detail figure manipulation to produce the map shown above and be able to zoom in and out from a interactive map. This is not neccessary for data downloading and preprocessing. However, this shows how the map on the CEFI portal is generated.

Load package for the interactive map

matplotlib to generate the color shaded map above
folium the python interface of generating a leaflet interactive map
branca the colorbar module included when installing folium package

import matplotlib.pyplot as plt
import folium
import branca.colormap as cm

Specify figure setup

# figure setting
colormap_name = 'RdBu_r'              # colormap name in matplotlib colormap blue to red
n_increments = 20                     # number of increment in the colormap
varmin = -1                           # minimum value on the colorbar
varmax = 1                            # maximum value on the colorbar
varname = 'Sea Surface Height'        # legend show on map
da_regrid_data = ds_ssh_subset['ssh'] # Xarray DataArray object used to plot the map

Create RGBA code for each grid point

This part use the matplotlib to create a special map (2D) which assigns a RGBA code (an array of four number) for each grid point. This will create a array with the size of [nlon,nlat,4].

# setup the colormap from matplotlib
picked_cm = plt.get_cmap(colormap_name, n_increments)
# normalized data (0,1) for colormap to applied on 
normed_data = (da_regrid_data - varmin) / (varmax - varmin)
colored_data = picked_cm(normed_data.data[0,:,:])

colored_data.shape

(844, 774, 4)

Start the base map from folium

# folium map base map
fm = folium.Map(
    location=[float(da_regrid_data.lat.mean().data), float(da_regrid_data.lon.mean().data)],
    tiles="Cartodb Positron",
    zoom_start=3
)

Overlay the colored data on the map

folium.raster_layers.ImageOverlay(
    image=colored_data,
    bounds=[[float(da_regrid_data.lat.min().data),
             float(da_regrid_data.lon.min().data)],
            [float(da_regrid_data.lat.max().data),
             float(da_regrid_data.lon.max().data)]],
    mercator_project=True,   # applied data to web mercator projection (essential)
    origin="lower",          # plot data from lower bound (essential)
    opacity=1,
    zindex=1
).add_to(fm)

<folium.raster_layers.ImageOverlay at 0x7f44dc05cb50>

Create the colorbar for the interactive map

import numpy as np
tick_inc = np.abs(varmax-varmin)/10.

# start constructing the branca colormap to put on folium map
index_list = range(0,n_increments)    
cmap_list = picked_cm(range(n_increments)).tolist()
cmap_foliump = cm.LinearColormap(
    colors=cmap_list,
    vmin=varmin,
    vmax=varmax,
    caption='fcmap',
    max_labels=n_increments+1,
    tick_labels=list(np.arange(varmin,varmax+tick_inc*0.000001,tick_inc))
).to_step(n_increments)

# Add the colormap to the folium map
cmap_foliump.caption = varname
cmap_foliump.add_to(fm)

fm

Make this Notebook Trusted to load map: File -> Trust Notebook

Access data (raw grid product)

The raw grid product differs from the regular grid in that it represents the original output of the model. The model grid uses the Arakawa C grid. The MOM6 documentation provides an excellent visualization of the Arakawa C grid. We can concentrate on the h point, which is where scalar values like sea level height are stored. The grid does not have uniform spacing and is, in fact, a curvilinear grid.

Info

The raw model output is beneficial when calculating the energy budget, which includes factors like heat and momentum. This output preserves each term’s original values, avoiding any potential distortions caused by interpolation. By using the raw grid product, we can ensure the energy budget balances within a closed system.

To use the raw grid data, one need to get two files. 1. the variable file 2. the static file

opendap_url = "http://psl.noaa.gov/thredds/dodsC/Projects/CEFI/regional_mom6/northwest_atlantic/hist_run/ocean_monthly.199301-201912.ssh.nc"
ds_ssh = xr.open_dataset(opendap_url)

opendap_static_url = "http://psl.noaa.gov/thredds/dodsC/Projects/CEFI/regional_mom6/northwest_atlantic/hist_run/ocean_static.nc"
ds_static = xr.open_dataset(opendap_static_url)

ds_ssh

ds_static

Tip

Again, only the metadata and coordinate information are loaded into local memory, not the actual data. The gridded data values at each grid point are only downloaded from the PSL server when we append .load() to the dataset.

Warning

In the ds_static dataset, you can observe the one-dimensional xh, yh grid. The numbers on this grid may resemble longitude and latitude values, but they do not represent actual geographical locations. For accurate longitude and latitude data, you should refer to the geolon, geolat variables in the list. These variables, which are dimensioned by xh, yh, provide the correct geographical coordinates.

Now let’s take a look at the grid area distribution which give a great sense of how it is different from regular spacing grid

ds_static = ds_static.set_coords(['geolon','geolat'])
ds_static.areacello.plot(x='geolon',y='geolat')

ds_static.areacello.plot(x='xh',y='yh')

We can see that the most significant differences between the two maps are located in the higher latitudes. Since the grid information is important in the raw grid, the easiest way to merge the two dataset so we can have the accurate lon lat that follows the data that we are going to work on.

ds_ssh_subset = ds_ssh.sel(time='2012-12').load()
ds = xr.merge([ds_ssh_subset,ds_static])
ds = ds.isel(time=slice(1,None))  # exclude the 1980 empty field due to merge

ds

From the xarray dataset object output above, we will notice the ssh value in ds is now avaible and the coordinate information in teh coordinate is also include into a single Dataset object.

Quick view of the variable

To check if the data makes sense and fits the chosen time and region, you can use Xarray’s .plot() method. This way, you see a graph instead of just numbers, making it easier to understand.

ds.ssh.plot(x='geolon',y='geolat')