Geoscience in the Cloud

Goals
  • Get an introduction to some of the jargon
  • Understand the difference between cloud versus on-premise
  • Understand the difference between working on data natively in the cloud versus downloading
  • Learn some of the major community groups in “big data Geoscience”
  • Learn some of the major tools (in 2023)

Open infrastructure for geoscience

Let’s start by watch a short video by James Colliander, founder of 2i2c which supports many community JupyterHubs to increase open access to cloud computing.

Why are we using a cloud environment?

“Anyone working with large-scale Earth System data today faces the same general problems:

  • The data we want to work with are huge (typical analyses involve several TB at least)
  • The data we need are produced and distributed by many different organizations (NASA, NOAA, ESGF, Copernicus, etc.)
  • We want to apply a wide range of different analysis methodologies to the data, from simple statistics to signal processing to machine learning.

The community is waking up to the idea that we can’t simply expect scientists to download all this data to their personal computers for processing.”

Ryan Abernathey, Pangeo Project.

Download-based workflow. From Abernathey, Ryan (2020): Data Access Modes in Science

Pangeo Link to tutorial on Pangeo ecosystem

Cloud optimized geospatial data

References