# It is bad to install via pip into a conda environment. I know this
# Remove the commenting and run
# ! pip install odc-stac stac
# ! pip install geopolars
Introduction to Geospatial Analyses in Python in the Cloud
This is based on the “Examining Environmental Justice through Open Source, Cloud-Native Tools” notebook from Carl Boettiger. Please follow Carl’s notebook for the background. This tutorial just focuses on the code.
Set up
You install a few things that are not on this JHub.
import odc.stac
import rioxarray
from pystac_client import Client
import geopandas as gpd
from rasterstats import zonal_stats
import dask.distributed
= dask.distributed.Client()
client =True, client=client) odc.stac.configure_rio(cloud_defaults
Data discovery
Here we use a STAC Catalog API to recover a list of candidate data. This example searches for images in a lon-lat bounding box from a collection of Cloud-Optimized-GeoTIFF (COG) images taken by Sentinel2 satellite mission. This function will not download any imagery, it merely gives us a list of metadata about available images.
= [-122.51, 37.71, -122.36, 37.81]
box = (
items
Client.open("https://earth-search.aws.element84.com/v1").
search(= ['sentinel-2-l2a'],
collections = box,
bbox = "2022-06-01/2022-08-01",
datetime ={"eo:cloud_cover": {"lt": 20}}).
query
item_collection() )
We pass this list of images to a high-level utilty (gdalcubes
) that will do all of the heavy lifting:
- subsetting by date
- subsetting by bounding box
- aggregating by time
P1D
- reproject into the desired coordinate system
- resampling to a desired spatial resolution
= odc.stac.load(
data
items,="EPSG:4326",
crs=["nir08", "red"],
bands=0.0001,
resolution=box
bbox )
Calculate NDVI, a widely used measure of greenness that can be used to determine tree cover.
= data.red
red = data.nir08
nir
= (((nir - red) / (red + nir)).
ndvi ="MS").
resample(time"time", keep_attrs=True).
median(
compute()
)
# mask out bad pixels
= ndvi.where(ndvi <= 1) ndvi
Plot the result. The long rectangle of Golden Gate Park is clearly visible in the North-West.
import matplotlib as plt
= plt.colormaps.get_cmap('viridis') # viridis is the default colormap for imshow
cmap ='black')
cmap.set_bad(color
="time", cmap=cmap, add_colorbar=False, size=5) ndvi.plot.imshow(row
Add the 1937 “red-lining” zones from the Mapping Inequality project. The red-lined zones are spatial vectors.
="ndvi.tif", driver="COG")
ndvi.rio.to_raster(raster_path= "/vsicurl/https://dsl.richmond.edu/panorama/redlining/static/citiesData/CASanFrancisco1937/geojson.json"
sf_url = zonal_stats(sf_url, "ndvi.tif", stats="mean") mean_ndvi
Now we can plot
= gpd.read_file(sf_url)
sf "ndvi"] = [x["mean"] for x in mean_ndvi ]
sf[="ndvi", legend=True) sf.plot(column
Compute the mean current greenness by 1937 zone.
import geopolars as gpl
import polars as pl
(gpl.
from_geopandas(sf)."grade").
group_by("ndvi").mean()).
agg(pl.col("grade")
sort( )
grade | ndvi |
---|---|
str | f64 |
null | 0.157821 |
"A" | 0.338723 |
"B" | 0.247344 |
"C" | 0.231182 |
"D" | 0.225696 |