Why Cloud(-Native) Data?

FTP

flowchart LR

%% Server-Based Access
    S_C1[Client A] -- connect --> S_SRV[Central Server<br/>I/O limited by its capacity<br />No extra services]
    S_SRV -- download file --> S_C1
    S_SRV === DISK[NetCDF Files]

ERDDAP / OPeNDAP

flowchart LR

    S_C1[Client] -- query --> S_SRV[Central Server\nI/O limited by its capacity\nextra services]
    S_SRV -- subset of file --> S_C1
    S_SRV === S_DISK[NetCDF Files]

Cloud object storage

flowchart LR

    S_D1[Client] -- read data chunk --- S_SRV[Cloud Object Storage\nNo client limits\nChunked NetCDF Files\nNo extra services]
    S_D2[Client] -- read data chunk --- S_SRV
    S_D3[Client] -- read data chunk --- S_SRV
    S_D4[Client] -- read data chunk --- S_SRV
    S_D5[Client] -- read data chunk --- S_SRV
    S_D6[Client] -- read data chunk --- S_SRV

Server versus Object Storage

Let’s use a metaphor of a customers wanting to get sandwiches. A server system (ERDDAP/OPeNDAP) is like a restaurant while the cloud-native data in object storage buckets (S3, GCS, etc) is like a food court with pre-prepared sandwiches.

Model	Metaphor	How It Works
ERDDAP / OPeNDAP	Restaurant with multiple waiters but one kitchen that prepares the sandwiches	Each client request is handled by a thread (waiter), but all data is read from the same disk (kitchen). Concurrent access is limited by server I/O.
Cloud-Native in Object Storage (S3/GCS)	Food court with many self-serve stations and pre-prepared sandwiches	Clients fetch just the data chunks they need directly from cloud storage. No central bottleneck — reads happen in parallel and scale with demand.

Key difference

Cloud-native formats and object storage buckets remove the kitchen bottleneck by letting each client serve themselves from pre-prepared, independently accessible data chunks.
Cloud-native formats are ‘pre-packaged’ into to chunks that ready for grab and go. Cloud-native can also be thought of as ‘read-optimized’.

Examples of cloud-native formats: Zarr, GeoTIFF, legacy netCDFs with a sidecar file (kerchunk, VirtualiZarr) that let’s you grab chunks

Cloud-Native is Read-Optimized

This means (among other things) chunked data.

Why not just download the data?

How will you work with massive data files and data sets? How will you get them?

Tackle it in chunks using infrastructure that allows you to work next to the data in the cloud.