import xarray as xr
Delaware Bay Operational Forecast System (DBOFS)
📘 Learning Objectives
- Get more practice making data cubes from netcdf on THREDDS servers using OPeNDAP
- Do some basic data aggregation and plotting
Overview
This example is very similar to the first tutorial using NCEP-NCAR Reanalysis 1, but the netcdfs are slightly different and you will get more practice. This tutorial uses an example where the server doesn’t require authentication (username and password).
Deleware Bay Forecast
We will create a data cube for data from the Delaware Bay Operational Forecast System (DBOFS). The approach is the same. We go to the THREDDS server for NOS and navigate through until we find the OPeNDAP page for a file. Then we need to get the url format for each file. Here is an example for one day. Note they only keep recent data so this url will break after March 2025.
https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/2025/03/14/dbofs.t18z.20250314.regulargrid.n001.nc
First step is to create some file urls. I am going to get the files for yesterday since they only keep the gridded data for couple days.
from datetime import datetime, timedelta, timezone
# Get yesterday's date in UTC
= datetime.now(timezone.utc) - timedelta(days=1)
yesterday = yesterday.strftime('%Y'), yesterday.strftime('%m'), yesterday.strftime('%d')
year, month, day
# Base URL with placeholders
= f'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/{year}/{month}/{day}/dbofs.t%2.2dz.{year}{month}{day}.regulargrid.n001.nc'
base
# Generate URLs for different hours
= [base % d for d in range(0, 24, 6)]
urls urls
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/2025/03/17/dbofs.t00z.20250317.regulargrid.n001.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/2025/03/17/dbofs.t06z.20250317.regulargrid.n001.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/2025/03/17/dbofs.t12z.20250317.regulargrid.n001.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS/2025/03/17/dbofs.t18z.20250317.regulargrid.n001.nc']
Then we can open these as usual with xarray.
%%time
= xr.open_mfdataset(urls, parallel=True)
ds ds
CPU times: user 1.48 s, sys: 358 ms, total: 1.84 s
Wall time: 5.97 s
<xarray.Dataset> Size: 392MB Dimensions: (time: 4, ny: 487, nx: 529, Depth: 22) Coordinates: * Depth (Depth) float64 176B 0.0 1.0 2.0 4.0 ... 80.0 90.0 100.0 125.0 Latitude (ny, nx) float64 2MB dask.array<chunksize=(487, 529), meta=np.ndarray> Longitude (ny, nx) float64 2MB dask.array<chunksize=(487, 529), meta=np.ndarray> * time (time) datetime64[ns] 32B 2025-03-16T19:00:00 ... 2025-03-17... Dimensions without coordinates: ny, nx Data variables: h (time, ny, nx) float64 8MB dask.array<chunksize=(1, 487, 529), meta=np.ndarray> mask (time, ny, nx) float64 8MB dask.array<chunksize=(1, 487, 529), meta=np.ndarray> zeta (time, ny, nx) float32 4MB dask.array<chunksize=(1, 487, 529), meta=np.ndarray> zetatomllw (time, ny, nx) float32 4MB dask.array<chunksize=(1, 487, 529), meta=np.ndarray> u_eastward (time, Depth, ny, nx) float32 91MB dask.array<chunksize=(1, 22, 487, 529), meta=np.ndarray> v_northward (time, Depth, ny, nx) float32 91MB dask.array<chunksize=(1, 22, 487, 529), meta=np.ndarray> temp (time, Depth, ny, nx) float32 91MB dask.array<chunksize=(1, 22, 487, 529), meta=np.ndarray> salt (time, Depth, ny, nx) float32 91MB dask.array<chunksize=(1, 22, 487, 529), meta=np.ndarray> Attributes: (12/36) file: dbofs.t00z.20250317.fields.nowcast_0002.nc format: netCDF-4/HDF5 file Conventions: CF-1.4, SGRID-0.3 type: ROMS/TOMS history file title: dbofs nowcast RUN in operational mode var_info: varinfo.yaml ... ... compiler_flags: -fp-model precise -ip -O3 tiling: 008x016 history: ROMS/TOMS, Version 4.2, Monday - March 1... ana_file: ROMS/Functionals/ana_btflux.h, ROMS/Func... CPP_options: mode, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ... DODS_EXTRA.Unlimited_Dimension: time
# each file is about 100Mb
print(f"{ds.isel(time=1).nbytes / 1e6} Mb")
100.9884 Mb
Next we can plot a map of the temperature at one time point.
=1, time=1).plot(x='Longitude', y='Latitude'); ds.temp.isel(Depth
And we can get the mean temperature for the 4 time points.
%%time
= ds["temp"].isel(Depth=1).mean(dim=['ny', 'nx'])
ds_mean ; ds_mean.plot()
CPU times: user 58.2 ms, sys: 3.28 ms, total: 61.5 ms
Wall time: 1.19 s
A plot of temperature by depth
Here I will make a plot of temperature by depth in the middle of the bay.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Plot the full dataset
= plt.subplots(figsize=(8, 6))
fig, ax =1, time=1).plot(x="Longitude", y="Latitude", ax=ax)
ds2.temp.isel(Depth
# Define the slice box coordinates
= -74.5, -74
lon_min, lon_max = 38, 38.5
lat_min, lat_max
# Create a rectangular patch (bounding box)
= patches.Rectangle(
box # Bottom-left corner (lon, lat)
(lon_min, lat_min), - lon_min, # Width (longitude range)
lon_max - lat_min, # Height (latitude range)
lat_max =2, edgecolor='red', facecolor='none', linestyle="--"
linewidth
)
# Add the box to the plot
ax.add_patch(box)
# Customize the plot
"Temperature with Slice Region Highlighted")
ax.set_title("Longitude")
ax.set_xlabel("Latitude")
ax.set_ylabel(
plt.show()
First, I am going to fix the indices to use lat/lon.
# because I want to slice with actual lat lon
= ds.isel(time=1, Depth=1, nx=1).Latitude.values
lat_values = ds.isel(time=1, Depth=1, ny=1).Longitude.values
lon_values = ds.assign_coords({"ny": lat_values, "nx": lon_values})
ds = ds.rename({"ny": "lat", "nx": "lon"}) ds
Create a mean by depth for each time period.
= ds["temp"].sel(lon=slice(-74.5,-74), lat=slice(38, 38.5)) depth_slice
=['lat', 'lon']).plot.line(x="Depth"); depth_slice.mean(dim
Data on AWS
The data are also available on AWS in a S3 bucket. Let’s compare data access to that. https://noaa-nos-ofs-pds.s3.amazonaws.com/index.html The data are here but you need to know how to make s3 urls.
https://noaa-nos-ofs-pds.s3.amazonaws.com/dbofs/netcdf/2025/03/14/dbofs.t00z.20250314.regulargrid.n001.nc
becomes this
s3://noaa-nos-ofs-pds/dbofs/netcdf/2025/03/14/dbofs.t00z.20250314.regulargrid.n001.nc
To open netcdf on s3, we need to create a “fileset”; we cannot just us the list of urls like we can for the OPeNDAP links.
# create the file urls to s3 bucket by processing our original files list
= [
s3_urls
url.replace("https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/DBOFS/MODELS",
"s3://noaa-nos-ofs-pds/dbofs/netcdf"
for url in urls
)
] s3_urls
['s3://noaa-nos-ofs-pds/dbofs/netcdf/2025/03/16/dbofs.t00z.20250316.regulargrid.n001.nc',
's3://noaa-nos-ofs-pds/dbofs/netcdf/2025/03/16/dbofs.t06z.20250316.regulargrid.n001.nc',
's3://noaa-nos-ofs-pds/dbofs/netcdf/2025/03/16/dbofs.t12z.20250316.regulargrid.n001.nc',
's3://noaa-nos-ofs-pds/dbofs/netcdf/2025/03/16/dbofs.t18z.20250316.regulargrid.n001.nc']
# Run this code once to set up s3 access
import s3fs
= s3fs.S3FileSystem(anon=True)
fs
# Create a fileset
= [fs.open(url) for url in s3_urls] fileset
# each file is about 100Mb
1])/1e6 # MB# fs.size(s3_urls[
101.415959
We open up the fileset.
%%time
= xr.open_mfdataset(fileset) ds2
CPU times: user 2.18 s, sys: 1.2 s, total: 3.38 s
Wall time: 20.5 s
We now have a data cube that we can work with same as with our data cube from the OPeNDAP server. The data are only loaded when we need them (to plot or compute). Data access is considerably slower than for the OPeNDAP server. I don’t know why that is.
%%time
= ds2["temp"].isel(Depth=1).mean(dim=['ny', 'nx'])
ds2_mean ; ds2_mean.plot()
CPU times: user 1.89 s, sys: 877 ms, total: 2.77 s
Wall time: 22.7 s
Conclusion
We worked through another example of getting data off an OPeNDAP server and compared to getting the data off an S3 bucket.
References
- Open files in S3 bucket https://nbviewer.org/gist/rsignell-usgs/111222351c4fee9e99827844351ab952
- https://www.jamstec.go.jp/ridinfo/xarray-and-opendap/
- More opendap + xarray debugging https://github.com/stuckyb/gcdl/issues/24
- Another opendap example https://github.com/ornldaac/daymet-python-opendap-xarray/blob/master/1_daymetv4_discovery_access_subsetting.ipynb
- Nice example https://rabernat.github.io/research_computing_2018/xarray-tips-and-tricks.html
"metadata.zarr", consolidated=True, compute=False) ds.to_zarr(
Delayed('_finalize_store-304e974e-7cd0-41d4-b610-fe79bd608544')
= xr.open_zarr("metadata.zarr", consolidated=True) ds_lazy
=1, time=1).plot(x='Longitude', y='Latitude'); ds_lazy.temp.isel(Depth
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[18], line 1 ----> 1 ds_lazy.temp.isel(Depth=1, time=1).plot(x='Longitude', y='Latitude'); File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/plot/accessor.py:48, in DataArrayPlotAccessor.__call__(self, **kwargs) 46 @functools.wraps(dataarray_plot.plot, assigned=("__doc__", "__annotations__")) 47 def __call__(self, **kwargs) -> Any: ---> 48 return dataarray_plot.plot(self._da, **kwargs) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/plot/dataarray_plot.py:310, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, **kwargs) 306 plotfunc = hist 308 kwargs["ax"] = ax --> 310 return plotfunc(darray, **kwargs) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/plot/dataarray_plot.py:1607, in _plot2d.<locals>.newplotfunc(***failed resolving arguments***) 1603 raise ValueError("plt.imshow's `aspect` kwarg is not available in xarray") 1605 ax = get_axis(figsize, size, aspect, ax, **subplot_kws) -> 1607 primitive = plotfunc( 1608 xplt, 1609 yplt, 1610 zval, 1611 ax=ax, 1612 cmap=cmap_params["cmap"], 1613 vmin=cmap_params["vmin"], 1614 vmax=cmap_params["vmax"], 1615 norm=cmap_params["norm"], 1616 **kwargs, 1617 ) 1619 # Label the plot with metadata 1620 if add_labels: File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/plot/dataarray_plot.py:2316, in pcolormesh(x, y, z, ax, xscale, yscale, infer_intervals, **kwargs) 2313 y = _infer_interval_breaks(y, axis=0, scale=yscale) 2315 ax.grid(False) -> 2316 primitive = ax.pcolormesh(x, y, z, **kwargs) 2318 # by default, pcolormesh picks "round" values for bounds 2319 # this results in ugly looking plots with lots of surrounding whitespace 2320 if not hasattr(ax, "projection") and x.ndim == 1 and y.ndim == 1: 2321 # not a cartopy geoaxis File /srv/conda/envs/notebook/lib/python3.12/site-packages/matplotlib/__init__.py:1521, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs) 1518 @functools.wraps(func) 1519 def inner(ax, *args, data=None, **kwargs): 1520 if data is None: -> 1521 return func( 1522 ax, 1523 *map(cbook.sanitize_sequence, args), 1524 **{k: cbook.sanitize_sequence(v) for k, v in kwargs.items()}) 1526 bound = new_sig.bind(ax, *args, **kwargs) 1527 auto_label = (bound.arguments.get(label_namer) 1528 or bound.kwargs.get(label_namer)) File /srv/conda/envs/notebook/lib/python3.12/site-packages/matplotlib/axes/_axes.py:6522, in Axes.pcolormesh(self, alpha, norm, cmap, vmin, vmax, colorizer, shading, antialiased, *args, **kwargs) 6519 shading = shading.lower() 6520 kwargs.setdefault('edgecolors', 'none') -> 6522 X, Y, C, shading = self._pcolorargs('pcolormesh', *args, 6523 shading=shading, kwargs=kwargs) 6524 coords = np.stack([X, Y], axis=-1) 6526 kwargs.setdefault('snap', mpl.rcParams['pcolormesh.snap']) File /srv/conda/envs/notebook/lib/python3.12/site-packages/matplotlib/axes/_axes.py:6026, in Axes._pcolorargs(self, funcname, shading, *args, **kwargs) 6024 if funcname == 'pcolormesh': 6025 if np.ma.is_masked(X) or np.ma.is_masked(Y): -> 6026 raise ValueError( 6027 'x and y arguments to pcolormesh cannot have ' 6028 'non-finite values or be of type ' 6029 'numpy.ma.MaskedArray with masked values') 6030 nrows, ncols = C.shape[:2] 6031 else: ValueError: x and y arguments to pcolormesh cannot have non-finite values or be of type numpy.ma.MaskedArray with masked values
"metadata.nc", compute=False) ds.to_netcdf(
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 1:
#000: H5F.c line 660 in H5Fcreate(): unable to synchronously create file
major: File accessibility
minor: Unable to create file
#001: H5F.c line 614 in H5F__create_api_common(): unable to create file
major: File accessibility
minor: Unable to open file
#002: H5VLcallback.c line 3605 in H5VL_file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#003: H5VLcallback.c line 3571 in H5VL__file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#004: H5VLnative_file.c line 94 in H5VL__native_file_create(): unable to create file
major: File accessibility
minor: Unable to open file
#005: H5Fint.c line 1870 in H5F_open(): unable to truncate a file which is already open
major: File accessibility
minor: Unable to open file
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError: File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/jovyan/nmfshackdays-2025/topics-2025/2025-opendap/metadata.nc',), 'a', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '5982d31b-d457-4ea0-a129-4860759a8ef3'] During handling of the above exception, another exception occurred: PermissionError Traceback (most recent call last) Cell In[16], line 1 ----> 1 ds.to_netcdf("metadata.nc", compute=False) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:2380, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf, auto_complex) 2377 encoding = {} 2378 from xarray.backends.api import to_netcdf -> 2380 return to_netcdf( # type: ignore[return-value] # mypy cannot resolve the overloads:( 2381 self, 2382 path, 2383 mode=mode, 2384 format=format, 2385 group=group, 2386 engine=engine, 2387 encoding=encoding, 2388 unlimited_dims=unlimited_dims, 2389 compute=compute, 2390 multifile=False, 2391 invalid_netcdf=invalid_netcdf, 2392 auto_complex=auto_complex, 2393 ) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/api.py:1911, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf, auto_complex) 1908 if auto_complex is not None: 1909 kwargs["auto_complex"] = auto_complex -> 1911 store = store_open(target, mode, format, group, **kwargs) 1913 if unlimited_dims is None: 1914 unlimited_dims = dataset.encoding.get("unlimited_dims", None) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:452, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, auto_complex, lock, lock_maker, autoclose) 448 kwargs["auto_complex"] = auto_complex 449 manager = CachingFileManager( 450 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 451 ) --> 452 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:393, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose) 391 self._group = group 392 self._mode = mode --> 393 self.format = self.ds.data_model 394 self._filename = self.ds.filepath() 395 self.is_remote = is_remote_uri(self._filename) File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:461, in NetCDF4DataStore.ds(self) 459 @property 460 def ds(self): --> 461 return self._acquire() File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:455, in NetCDF4DataStore._acquire(self, needs_lock) 454 def _acquire(self, needs_lock=True): --> 455 with self._manager.acquire_context(needs_lock) as root: 456 ds = _nc4_require_group(root, self._group, self._mode) 457 return ds File /srv/conda/envs/notebook/lib/python3.12/contextlib.py:137, in _GeneratorContextManager.__enter__(self) 135 del self.args, self.kwds, self.func 136 try: --> 137 return next(self.gen) 138 except StopIteration: 139 raise RuntimeError("generator didn't yield") from None File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock) 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2521, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:2158, in netCDF4._netCDF4._ensure_nc_success() PermissionError: [Errno 13] Permission denied: '/home/jovyan/nmfshackdays-2025/topics-2025/2025-opendap/metadata.nc'
= xr.open_dataset("metadata.nc") ds_lazy