Applies the VirtualiZarr workflow to the NOAA Climate Data Record (CDR) of AVHRR NDVI — a 0.05° global daily vegetation index dataset publicly available on AWS S3. Data:NOAA CDR NDVI v5 (AVHRR) Bucket:s3://noaa-cdr-ndvi-pds (anonymous access, us-east-1)
# Colab users, uncomment and run this#!pip install -q icechunk virtualizarr xarray obspec_utils obstore hvplot
import warningsimport shutilfrom pathlib import Pathimport xarray as xrimport icechunkfrom obstore.store import from_urlfrom virtualizarr import open_virtual_dataset, open_virtual_mfdatasetfrom virtualizarr.parsers import HDFParserfrom obspec_utils.registry import ObjectStoreRegistrywarnings.filterwarnings("ignore", message="Numcodecs codecs are not in the Zarr version 3 specification*", category=UserWarning,)
Setup: S3 store and registry
Point obstore at the public NDVI bucket and register it so VirtualiZarr can resolve chunk references.
bucket ="s3://noaa-cdr-ndvi-pds"base ="data/2000"# 5 consecutive daily files — January 2000filenames = ["AVHRR-Land_v005_AVH13C1_NOAA-14_20000101_c20170623095628.nc","AVHRR-Land_v005_AVH13C1_NOAA-14_20000102_c20170623101557.nc","AVHRR-Land_v005_AVH13C1_NOAA-14_20000103_c20170623103338.nc","AVHRR-Land_v005_AVH13C1_NOAA-14_20000104_c20170623105028.nc","AVHRR-Land_v005_AVH13C1_NOAA-14_20000105_c20170623110559.nc",]urls = [f"{bucket}/{base}/{f}"for f in filenames]store = from_url(bucket, region="us-east-1", skip_signature=True)registry = ObjectStoreRegistry({bucket: store})parser = HDFParser()urls[0]
open_virtual_dataset reads only the file metadata — no chunk data is downloaded. The result looks like an xarray dataset but chunk references point back to the original S3 file.
We load time, lat, and lon as concrete arrays (they are small and needed for coordinates).
Normalized Difference Vegetation Index parameters derived from NOAA-14 GAC data for day 2000/001
institution :
NASA/GSFC/SED/ESD/HBSL/TIS/MODIS-LAND > MODIS Land Science Team, Terrestrial Information Systems, Hydrospheric and Biospheric Science Laboratory, Earth Sciences Division, Science and Exploration Directorate, Goddard Space Flight Center, NASA
Conventions :
CF-1.6, ACDD-1.3
standard_name_vocabulary :
CF Standard Name Table (v25, 05 July 2013)
naming_authority :
gov.noaa.ncei
license :
See the Use Agreement for this CDR available from the NOAA CDR webpage
cdm_data_type :
Grid
time_coverage_start :
2000-01-01T00:00:00Z
time_coverage_end :
2000-01-01T23:59:59Z
product_version :
v5r0
platform :
NOAA-3 > National Oceanic & Atmospheric Administration-3
sensor :
AVHRR > Advanced Very High Resolution Radiometer
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
platform_vocabulary :
Global Change Master Directory (GCMD) Platform Keywords
instrument_vocabulary :
Global Change Master Directory (GCMD) Instrument Keywords
keywords :
EARTH SCIENCE > BIOSPHERE > VEGETATION > VEGETATION INDEX
AVHRR GAC data from NOAA-14 for 2000, days 1 to 1, processed by the Long-Term Land Data Record (LTDR) project (v3.5.45) into normalized difference vegetation index (NDVI) and quality-control flags.
These files have a dimensionless ncrs coordinate used for the CRS definition. Because ncrs has no index, coordinate-based combining can fail. Instead of using combine="by_coords", we open each file individually and concatenate explicitly along time with xr.concat(...). We then drop nv, ncrs, and crs because they are file-level metadata variables that interfere with concatenation and are not needed for this virtualized NDVI data product.
NASA/GSFC/SED/ESD/HBSL/TIS/MODIS-LAND > MODIS Land Science Team, Terrestrial Information Systems, Hydrospheric and Biospheric Science Laboratory, Earth Sciences Division, Science and Exploration Directorate, Goddard Space Flight Center, NASA
Conventions :
CF-1.6, ACDD-1.3
standard_name_vocabulary :
CF Standard Name Table (v25, 05 July 2013)
naming_authority :
gov.noaa.ncei
license :
See the Use Agreement for this CDR available from the NOAA CDR webpage
cdm_data_type :
Grid
product_version :
v5r0
platform :
NOAA-3 > National Oceanic & Atmospheric Administration-3
sensor :
AVHRR > Advanced Very High Resolution Radiometer
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
platform_vocabulary :
Global Change Master Directory (GCMD) Platform Keywords
instrument_vocabulary :
Global Change Master Directory (GCMD) Instrument Keywords
keywords :
EARTH SCIENCE > BIOSPHERE > VEGETATION > VEGETATION INDEX
spatial_resolution :
0.050000 degrees per pixel
geospatial_lat_min :
-90.0
geospatial_lat_max :
90.0
geospatial_lon_min :
-180.0
geospatial_lon_max :
180.0
metadata_link :
https://doi.org/10.7289/V5ZG6QH9
program :
NOAA Climate Data Record Program for satellites
cdr_variable :
NDVI
Process :
LTDR_GAPS
PostProcessingVersion :
2.9
PFIIVersion :
3.5.45
Satellite :
NOAA-14
Instrument :
AVHRR
InputDataType :
GAC
ESDT :
AVH13C1
RangeBeginningTime :
00:00:00.0000
RangeEndingTime :
23:59:59.9999
PercentValidClearDaytimeWater :
0.00
PercentValidDaytimeWaterInCloudShadow :
0.00
3a. Export to kerchunk JSON (optional)
Write the combined virtual references to a single kerchunk-compatible JSON file that any fsspec-aware reader can consume.
out ="combined_refs.json"combined_vds.vz.to_kerchunk( filepath=out,format="json",)
3b. Write to Icechunk
Create a local Icechunk repository and write the virtual references into it. The VirtualChunkContainer tells Icechunk where to fetch the actual chunk bytes at read time — the original S3 files are never copied.
---------------------------------------------------------------------------NameError Traceback (most recent call last)
CellIn[1], line 1----> 1 repo_path = Path("ndvi-icechunk-concat")
2if repo_path.exists():
3 shutil.rmtree(repo_path)
NameError: name 'Path' is not defined
4. Read back and plot
Open the Icechunk store with xarray — all 5 days appear as a single continuous dataset. Chunk data is fetched lazily from S3 on demand.
cloudy cloud_shadow water sunglint dense_dark_vegetation night ch1_to_5_valid ch1_invalid ch2_invalid ch3_invalid ch4_invalid ch5_invalid rho3_invalid BRDF_corr_problem polar_flag
long_name :
Quality Assurance
grid_mapping :
crs
[129600000 values with dtype=int16]
lon_bnds
(longitude, nv)
float32
...
[14400 values with dtype=float32]
lat_bnds
(latitude, nv)
float32
...
[7200 values with dtype=float32]
TIMEOFDAY
(time, latitude, longitude)
float64
...
long_name :
Time since Start of Data Day
valid_range :
[0, 2399]
grid_mapping :
crs
[129600000 values with dtype=float64]
institution :
NASA/GSFC/SED/ESD/HBSL/TIS/MODIS-LAND > MODIS Land Science Team, Terrestrial Information Systems, Hydrospheric and Biospheric Science Laboratory, Earth Sciences Division, Science and Exploration Directorate, Goddard Space Flight Center, NASA
Conventions :
CF-1.6, ACDD-1.3
standard_name_vocabulary :
CF Standard Name Table (v25, 05 July 2013)
naming_authority :
gov.noaa.ncei
license :
See the Use Agreement for this CDR available from the NOAA CDR webpage
cdm_data_type :
Grid
product_version :
v5r0
platform :
NOAA-3 > National Oceanic & Atmospheric Administration-3
sensor :
AVHRR > Advanced Very High Resolution Radiometer
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
platform_vocabulary :
Global Change Master Directory (GCMD) Platform Keywords
instrument_vocabulary :
Global Change Master Directory (GCMD) Instrument Keywords
keywords :
EARTH SCIENCE > BIOSPHERE > VEGETATION > VEGETATION INDEX
spatial_resolution :
0.050000 degrees per pixel
geospatial_lat_min :
-90.0
geospatial_lat_max :
90.0
geospatial_lon_min :
-180.0
geospatial_lon_max :
180.0
metadata_link :
https://doi.org/10.7289/V5ZG6QH9
program :
NOAA Climate Data Record Program for satellites
cdr_variable :
NDVI
Process :
LTDR_GAPS
PostProcessingVersion :
2.9
PFIIVersion :
3.5.45
Satellite :
NOAA-14
Instrument :
AVHRR
InputDataType :
GAC
ESDT :
AVH13C1
RangeBeginningTime :
00:00:00.0000
RangeEndingTime :
23:59:59.9999
PercentValidClearDaytimeWater :
0.00
PercentValidDaytimeWaterInCloudShadow :
0.00
import hvplot.xarray # noqa# Global NDVI map for the first dayds["NDVI"].isel(time=0).hvplot(rasterize=True, geo=True, global_extent=True, x="longitude", y="latitude", tiles='OSM', cmap="YlGn", clim=(-0.1, 1.0), title="AVHRR NDVI — 2000-01-01", width=800, height=400,)
ds['crs'].values
# Global mean NDVI over the 5-day periodds["NDVI"].mean(["latitude", "longitude"]).hvplot( title="Global mean NDVI — Jan 1–5, 2000", ylabel="NDVI",)