{
"cells": [
{
"cell_type": "raw",
"id": "d65bf4ac-e115-4963-be73-cb5985647d20",
"metadata": {},
"source": [
"---\n",
"title: \"Creating virtual data sets\"\n",
"author: Eli Holmes (NOAA)\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "f66db920-c1b4-4b04-a937-0ef89df602a5",
"metadata": {},
"source": [
"[][colab-link]\n",
"\n",
" \n",
" [][download-link]\n",
"\n",
"[download-link]: https://nmfs-opensci.github.io/NMFSHackDays-2025/topics-2025/2025-02-14-earthdata/5-virtual-dataset.ipynb\n",
"[colab-link]: https://colab.research.google.com/github/nmfs-opensci/nmfshackdays-2025/blob/main/topics-2025/2025-02-14-earthdata/5-virtual-dataset.ipynb\n",
"[jupyter-link]: https://nmfs-openscapes.2i2c.cloud/hub/user-redirect/lab?fromURL=https://raw.githubusercontent.com/nmfs-opensci/nmfshackdays-2025/main/topics-2025/2025-02-14-earthdata/5-virtual-dataset.ipynb"
]
},
{
"cell_type": "markdown",
"id": "de649ab7-2153-4e72-94d5-de93e575b213",
"metadata": {},
"source": [
">📘 Learning Objectives\n",
"> 1. Create a big data cube faster without creating a file set"
]
},
{
"cell_type": "markdown",
"id": "b10f08fb-381b-4d6c-83da-fe2385039917",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"If we have many many files (granules), then running `earthaccess.open(results)` is going to be very slow. Let's use `earthaccess.open_virtual_mfdataset()` instead to create our metadata for `xarray`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "03dde1aa-c5ca-4078-81b5-e6cfd96e7f34",
"metadata": {},
"outputs": [],
"source": [
"import earthaccess\n",
"import xarray"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "361b64fd-f9b9-4e15-a992-01253317793e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results = earthaccess.search_data(count=100, short_name=\"MUR-JPL-L4-GLOB-v4.1\")\n",
"len(results)"
]
},
{
"cell_type": "markdown",
"id": "e590a847-ad66-4d15-87a9-8c045c22e75b",
"metadata": {},
"source": [
"Create a virtual representation of the data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "11521d06-c147-4aee-a47c-59ce4e3680bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 7.24 s, sys: 502 ms, total: 7.74 s\n",
"Wall time: 13.4 s\n"
]
},
{
"data": {
"text/html": [
"
<xarray.Dataset> Size: 2TB\n", "Dimensions: (time: 100, lat: 17999, lon: 36000)\n", "Coordinates:\n", " * lat (lat) float32 72kB -89.99 -89.98 -89.97 ... 89.98 89.99\n", " * lon (lon) float32 144kB -180.0 -180.0 -180.0 ... 180.0 180.0\n", " * time (time) datetime64[ns] 800B 2002-06-01T09:00:00 ... 2002...\n", "Data variables:\n", " analysed_sst (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1023, 2047), meta=np.ndarray>\n", " analysis_error (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1023, 2047), meta=np.ndarray>\n", " mask (time, lat, lon) float32 259GB dask.array<chunksize=(1, 1447, 2895), meta=np.ndarray>\n", " sea_ice_fraction (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1447, 2895), meta=np.ndarray>\n", "Attributes: (12/41)\n", " Conventions: CF-1.5\n", " title: Daily MUR SST, Final product\n", " summary: A merged, multi-sensor L4 Foundation SST anal...\n", " references: http://podaac.jpl.nasa.gov/Multi-scale_Ultra-...\n", " institution: Jet Propulsion Laboratory\n", " history: created at nominal 4-day latency; replaced nr...\n", " ... ...\n", " project: NASA Making Earth Science Data Records for Us...\n", " publisher_name: GHRSST Project Office\n", " publisher_url: http://www.ghrsst.org\n", " publisher_email: ghrsst-po@nceo.ac.uk\n", " processing_level: L4\n", " cdm_data_type: grid
<xarray.Dataset> Size: 2TB\n", "Dimensions: (time: 100, lat: 17999, lon: 36000)\n", "Coordinates:\n", " * lat (lat) float32 72kB -89.99 -89.98 -89.97 ... 89.98 89.99\n", " * lon (lon) float32 144kB -180.0 -180.0 -180.0 ... 180.0 180.0\n", " * time (time) datetime64[ns] 800B 2002-06-01T09:00:00 ... 2002...\n", "Data variables:\n", " analysed_sst (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1023, 2047), meta=np.ndarray>\n", " analysis_error (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1023, 2047), meta=np.ndarray>\n", " mask (time, lat, lon) float32 259GB dask.array<chunksize=(1, 1447, 2895), meta=np.ndarray>\n", " sea_ice_fraction (time, lat, lon) float64 518GB dask.array<chunksize=(1, 1447, 2895), meta=np.ndarray>\n", "Attributes: (12/41)\n", " Conventions: CF-1.5\n", " title: Daily MUR SST, Final product\n", " summary: A merged, multi-sensor L4 Foundation SST anal...\n", " references: http://podaac.jpl.nasa.gov/Multi-scale_Ultra-...\n", " institution: Jet Propulsion Laboratory\n", " history: created at nominal 4-day latency; replaced nr...\n", " ... ...\n", " project: NASA Making Earth Science Data Records for Us...\n", " publisher_name: GHRSST Project Office\n", " publisher_url: http://www.ghrsst.org\n", " publisher_email: ghrsst-po@nceo.ac.uk\n", " processing_level: L4\n", " cdm_data_type: grid