# ---- Load Libraries ----
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# ---- Load Data ----
= "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
url = xr.open_dataset(url)
ds
# --- Get first week of Sept 2024
= ds.sel(time="2024-09-01") ds
Tutorial 3: Make way for MOANA - Multi Ordination ANAlysis
📘 Learning Objectives
- Create pretty maps of the three phytoplankton classes in MOANA
- Plot how all three of these phytoplankton classes change in relation to latitude
- Plot how all three of these phytoplankton classes change over the course of 1 year
- Create a ternary plot and map that shows where each phytoplankton class dominates
You will need 3.7Gb for this tutorial
Go to File > Hub Control Panel > Stop my server
Then restart after chosing 3.7 Gb in the drop-down.
Interacting with multiple phytoplankton groups simultaneously
MOANA is the first phytoplankton community composition algorithm to be released by PACE. This product returns near-surface concentrations (cells mL-1) of three different picophytoplankton (i.e., phytoplankton <2 μm in size): Prochlorococcus, Synechococcus, and autotrophic picoeukaryotes (Figure 15). The algorithm uses empirical relationships between measured cell concentrations, in situ hyperspectral remote sensing reflectances, and sea surface temperatures. Details of this algorithm can be found in Lange et al. (2020).
Picophytoplankton are composed of the cyanobacteria Prochlorococcus (∼0.8 µm) and Synechococcus (∼1 µm), as well as picoeukaryotes, which combined are responsible for 50 to 90% of all primary production in open ocean ecosystems and contribute up to 30% of carbon export in these regions. Geographically, Prochlorococcus tends to inhabit warmer and mostly oligotrophic waters surrounded by Synechococcus patches along frontal boundaries. These fronts often reside at boundaries where phytoplankton communities start to transition to higher concentrations of larger eukaryotic cells, such as picoeukaryotes and nanoeukaryotic flagellates. Thus, identification of Prochlorococcus and Synechococcus distributions may be useful in identifying trophic boundaries in oceanic ecosystems, in addition to providing insight into productivity, food web regimes, and carbon export.
Note, for now, the MOANA product is only avaialble for the Atlantic Ocean.
Are you ready?
Lock and load the data
print(f"Size in GB: {ds.nbytes / 1e9:.2f} GB")
ds.sizes
Size in GB: 0.11 GB
Frozen({'latitude': 3360, 'longitude': 2640})
ds
<xarray.Dataset> Size: 106MB Dimensions: (latitude: 3360, longitude: 2640) Coordinates: time datetime64[ns] 8B 2024-09-01 * latitude (latitude) float32 13kB 69.98 69.94 69.9 ... -69.94 -69.98 * longitude (longitude) float32 11kB -84.98 -84.94 ... 24.94 24.98 Data variables: prococcus_moana (latitude, longitude) float32 35MB ... syncoccus_moana (latitude, longitude) float32 35MB ... picoeuk_moana (latitude, longitude) float32 35MB ... Attributes: (12/71) _lastModified: 2025-02-28T17:56:21.000Z cdm_data_type: Grid comment: This dataset is redistributed by Atlan... Conventions: CF-1.10, ACDD-1.3, COARDS creator_email: data@oceancolor.gsfc.nasa.gov creator_name: NASA/GSFC/OBPG ... ... temporal_range: 7-day time_coverage_end: 2025-03-02T00:00:00Z time_coverage_start: 2024-04-18T00:00:00Z title: OCI L3 SMI, PACE MOANA 8DAY; via Atlan... Westernmost_Easting: -84.97916 westernmost_longitude: -85.0
# let's load the data into memory since it is not so big
= ds.load() ds
It’s map time again
We’ve got three different phytoplankton classes in this MOANA data set. Let’s create some pretty maps and get a sense of their relative distributions.
# ---- Define metadata for plots ----
= {
phyto_info "Prochlorococcus": {
"data": ds["prococcus_moana"],
"cmap": plt.cm.Blues,
"label": "Prochlorococcus conc. (cells mL⁻¹)"
},"Synechococcus": {
"data": ds["syncoccus_moana"],
"cmap": plt.cm.Reds,
"label": "Synechococcus conc. (cells mL⁻¹)"
},"Picoeukaryotes": {
"data": ds["picoeuk_moana"],
"cmap": plt.cm.Greens,
"label": "Picoeukaryote conc. (cells mL⁻¹)"
} }
# ---- Set up figure and axes ----
= plt.subplots(1, 3, figsize=(18, 6),
fig, axs ={'projection': ccrs.PlateCarree()})
subplot_kw
# ---- Plot each phytoplankton group ----
for ax, (title, info) in zip(axs, phyto_info.items()):
ax.set_title(title)
ax.coastlines()=0.3)
ax.add_feature(cfeature.BORDERS, linewidth= ax.gridlines(draw_labels=True)
gl = gl.right_labels = False
gl.top_labels
= info["data"].plot(
img =ax,
ax=info["cmap"],
cmap=True,
robust=False
add_colorbar
)
= plt.colorbar(img, ax=ax, orientation='horizontal', pad=0.05, shrink=0.9)
cbar "label"])
cbar.set_label(info[
plt.tight_layout() plt.show()
Looking good! Here you can see that little ole Prochlorococcus really runs amok in the offshore waters, where both Synechococcus and Picoeukaryotes tend to dominate at higher latitudes, and Picoeukaryotes assert their dominance in along the coastal margins… you know, to the degree that a free-floating cell can “assert” anything.
Let’s get a bit more quantitative
Let’s get a little better sense of how these phytoplakton classes are geographically distributed. What we’re going to do next is plot how all three of these phytoplankton classes change in relation to latitude. To do this, we’ll be taking “slices” of data and averaging them. Conceptually, think of a horizontal line across 50 degrees North - we’ll extract all that data, take an average, and then move on to the pixel below (e.g. 49.98 degrees North) and do the same until we hit 50 degrees South. Doing this for each phytoplankton class, we can now distill this information into something that can be put onto a line plot.
= ds["syncoccus_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])
syn = ds["prococcus_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])
pro = ds["picoeuk_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])
pico
= {
custom_colors "Prochlorococcus": plt.cm.Blues(0.6),
"Synechococcus": plt.cm.Reds(0.6),
"Picoeukaryotes": plt.cm.Greens(0.6)
}
= plt.subplots(figsize=(6, 8)) # Taller figure for vertical layout
fig, ax1
# Primary axis: Syn and Pico
"latitude"], label="Synechococcus", color=custom_colors["Synechococcus"])
ax1.plot(syn, syn["latitude"], label="Picoeukaryotes", color=custom_colors["Picoeukaryotes"])
ax1.plot(pico, pico["Latitude")
ax1.set_ylabel("Mean Abundance (Syn & Pico, cell ml$^{-1}$)", color="black")
ax1.set_xlabel(='x', labelcolor="black")
ax1.tick_params(axis
# Secondary axis: Prochlorococcus
= ax1.twiny()
ax2 "latitude"], label="Prochlorococcus", color=custom_colors["Prochlorococcus"], linestyle="--")
ax2.plot(pro, pro["Mean Abundance (Prochlorococcus, cell ml$^{-1}$)", color=custom_colors["Prochlorococcus"])
ax2.set_xlabel(='x', labelcolor=custom_colors["Prochlorococcus"])
ax2.tick_params(axis
# Combine legends
= ax1.get_legend_handles_labels()
lines_1, labels_1 = ax2.get_legend_handles_labels()
lines_2, labels_2 + lines_2, labels_1 + labels_2, loc="lower right")
ax1.legend(lines_1
"Latitudinal Means of Phytoplankton Groups", pad=10)
plt.title(True)
plt.grid(
plt.tight_layout() plt.show()
Note that Prochlorococcus is on a separate axis here because the cells are so abundant. That doesn’t necessarily mean they represent a larger amount of biomass across the board though. Prochlorococcus are teeny tiny little cyanobacteria cells, and Synechococcus and picoeukaryotes are larger, so in the latter case, fewer cells can represent more overall biomass. We make this distinction here becuase when we later consider which phytoplankton class is “dominant” in an area, it is relative to their respective range of concentrations, not absolute cell counts.
The times, they are a changin’
For this exercise, we’ll download one year’s worth of PACE data to see how things are evolving over time. Using this time series, we’re going to use the “slice” function again, but this time we’ll extract a 10 degree x 10 degree box in the south Atlantic ocean. It can be smaller, or located elsewhere if you like, but either way, we’re going to average the data within that box for each individual time step.
Next, we’ll plot how all three of these phytoplankton classes change over the course of 1 year, allowing you to see the cycles of growth and decay relative to one another. It is useful to add this context because not all chlorophyll-a is equivalent, e.g., a bloom of Prochlorococcus may represent a different amount of trophic energy potential relative to blooms of other phytoplankton classes, or it may havce come about from a different oceanographic process.
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from matplotlib.patches import Rectangle
# Bounding box
= -54, -51
lon_min, lon_max = -42, -39
lat_min, lat_max
# Choose one phytoplankton group
= "Prochlorococcus"
title = ds["prococcus_moana"].sel(latitude=slice(20, -60))
data = plt.cm.Blues
cmap = "Prochlorococcus conc. (cells mL⁻¹)"
label
# ---- Plot ----
= plt.subplots(figsize=(8, 6), subplot_kw={'projection': ccrs.PlateCarree()})
fig, ax
# Base map
ax.set_title(title)
ax.coastlines()=0.3)
ax.add_feature(cfeature.BORDERS, linewidth= ax.gridlines(draw_labels=True)
gl = gl.right_labels = False
gl.top_labels
# Plot the data
= data.plot(ax=ax, cmap=cmap, robust=True, add_colorbar=False)
img
# Add bounding box
ax.add_patch(Rectangle((lon_min, lat_min),- lon_min,
lon_max - lat_min,
lat_max ='red',
edgecolor='none',
facecolor=2,
linewidth=ccrs.PlateCarree()))
transform
# Add colorbar
= plt.colorbar(img, ax=ax, orientation='horizontal', pad=0.05, shrink=0.9)
cbar
cbar.set_label(label)
plt.tight_layout() plt.show()
Load data and select our box.
# ---- Load Data ----
= "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
url = xr.open_dataset(url)
dataset =dataset.sel(latitude=slice(-39,-42), longitude=slice(-54,-51))
da
# Get the mean over the box
= da["syncoccus_moana"].mean(dim=["latitude", "longitude"])
syn_ts = da["picoeuk_moana"].mean(dim=["latitude", "longitude"])
pico_ts = da["prococcus_moana"].mean(dim=["latitude", "longitude"])
pro_ts
pro_ts
<xarray.DataArray 'prococcus_moana' (time: 41)> Size: 164B array([ 76012.04 , 139162.08 , 177361.06 , 161316.4 , 222840.39 , 137385. , 142513.7 , 114853.05 , 29329.648, 222419.66 , 159956.17 , 99857.34 , 92749.43 , 93915.49 , 89002.47 , 78405.47 , 113439.734, 132991.88 , 105341.13 , 68365.78 , 85625. , 117346.81 , 87780.3 , 124836.06 , 76297.945, 111245.4 , 130434.305, 147058. , 114212.92 , 197231.17 , 199400.97 , 268445.97 , 258324.56 , 255930.8 , 271919.44 , 263647.38 , 260969.44 , 230369.28 , 278427.03 , 246284.44 , 308866.66 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 328B 2024-04-18 2024-04-26 ... 2025-03-02
# Create the figure and axis
= plt.subplots(figsize=(12, 5))
fig, ax1
= {
custom_colors "Prochlorococcus": plt.cm.Blues(0.6),
"Synechococcus": plt.cm.Reds(0.6),
"Picoeukaryotes": plt.cm.Greens(0.6)
}
# Plot syn and pico on the left y-axis
"time"], syn_ts, 'o-', label="Synechococcus", color=custom_colors["Synechococcus"])
ax1.plot(syn_ts["time"], pico_ts, 'o-', label="Picoeukaryotes", color=custom_colors["Picoeukaryotes"])
ax1.plot(pico_ts["Time")
ax1.set_xlabel("Mean Abundance (Syn & Pico)", color="black")
ax1.set_ylabel(='y', labelcolor="black")
ax1.tick_params(axis
# Create a secondary y-axis for prococcus_moana
= ax1.twinx()
ax2 "time"], pro_ts, 'o--', label="Prochlorococcus", color=custom_colors["Prochlorococcus"])
ax2.plot(pro_ts["Mean Abundance (Prochlorococcus)", color=custom_colors["Prochlorococcus"])
ax2.set_ylabel(='y', labelcolor=custom_colors["Prochlorococcus"])
ax2.tick_params(axis
# Combine all legends
= ax1.get_legend_handles_labels()
lines_1, labels_1 = ax2.get_legend_handles_labels()
lines_2, labels_2 + lines_2, labels_1 + labels_2, loc="upper left")
ax1.legend(lines_1
# Add title and grid
"8-day Phytoplankton Abundance in Selected Box")
plt.title(True)
plt.grid(=45)
plt.xticks(rotation
plt.tight_layout() plt.show()
Bringing it all together
A central theme of working with PACE data revolves around wrangling multiple dimensions of data into something you can conceptualize. Viewing three separate maps makes it a bit difficult to see the interactions between the different phytoplankton classes. This is important - we want to know where these boundaries are colliding, because this indicates that something is changing with the underlying oceanography.
Let’s try something a little different here. Instead of making three separate plots, let’s stack them all and make a false “true color” image by substituting the usual red, green, and blue radiance channels with Synechococcus (red), Picoeukaryotes (green), and Proclorococcus (blue). Can it be done?
Yes, but with a tiny bit of nuance here. We talked earlier about how higher cell counts does not unequivocally mean that this is a “dominant” class. We can do something called histogram stretching, which is a little trick some folks use to make their true color images pop a little more. Basically, for each phytoplankton class, let’s take all the data between the 2% and 98% percentile range, and normalize those data to the respective median. This ensure all phytoplankton classes are now on a level playing field when we stack them up. Higher or lower values no longer represent absolute cell counts, they now represent higher or lower weights relative to their central tendency. The same trick is often done for making “true color” images to make them pop a little more. Let’s give it a try:
# ---- Lazy Load Data ----
import xarray as xr
= "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
url = xr.open_dataset(url)
ds
# When working with big data sets we want to subset before we start
# working with them
# --- Get first week of Sept 2024
= ds.sel(time="2024-09-01")
ds
# ---- Subset to North Atlantic off US Coast ----
= [-84, -20, 15, 60]
extent = ds.sel(latitude=slice(60,15), longitude=slice(-84,-20)) ds
Create a “red, green, blue” stack
import numpy as np
# ---- Normalize using robust percentile-based scaling ----
def robust_normalize(arr):
= np.nanpercentile(arr, [2, 98])
vmin, vmax return np.clip((arr - vmin) / (vmax - vmin), 0, 1)
# ---- Define our colors ----
= {
phyto_rgb "R": ds["syncoccus_moana"],
"G": ds["picoeuk_moana"],
"B": ds["prococcus_moana"]
}
# ---- Create a stack for implot ----
= np.stack([robust_normalize(da.values) for da in phyto_rgb.values()], axis=-1)
rgb_image = 0
rgb_image[np.isnan(rgb_image)] = np.flipud(rgb_image) rgb_image
# ---- Plot ----
= plt.subplots(figsize=(10, 6), subplot_kw={'projection': ccrs.PlateCarree()})
fig, ax "Relative Dominance of Phytoplankton Groups (RGB Blending)")
ax.set_title(
ax.coastlines()=0.3)
ax.add_feature(cfeature.BORDERS, linewidth
# Grid labels only
= ax.gridlines(draw_labels=True, linewidth=0, color='none')
gl = gl.top_labels = False
gl.right_labels
# extent defined when data set up above
='lower', extent=extent, transform=ccrs.PlateCarree())
ax.imshow(rgb_image, origin=ccrs.PlateCarree())
ax.set_extent(extent, crs
plt.tight_layout() plt.show()
Now you can really see where each phytoplankton class dominates, and through the gradational mixing of colors, we can see where, and to what degree they overlap. Neat, right?
Create a legend
We still could use a bit of help contextualizing the above image, so let’s just make a little legend for it. Since we’re in tri-color space, let’s use the very useful, and in my opinion, underutilized ternary plot. These are fun little plots that show the relative contribution of three variables relative to one another. It helps us determine how to interpret the purple, teal, pink, and orange areas you see on the map.
from matplotlib.patches import Polygon
# Define triangle vertices (RGB space: R=Syn, G=Pico, B=Pro)
= np.array([
triangle 0.5, np.sqrt(3)/2], # top = Pico (Green)
[0, 0], # bottom left = Syn (Red)
[1, 0] # bottom right = Pro (Blue)
[
])
# Create barycentric coordinates for triangle
= 150
res = np.array([
bary_coords / res, j / res, 1 - i / res - j / res]
[i for i in range(res + 1)
for j in range(res + 1 - i)
])= bary_coords[:, [1, 0, 2]] # R=Syn, G=Pico, B=Pro
rgb_vals = np.clip(rgb_vals, 0, 1)
rgb_vals
# Convert to triangle (x, y) coordinates
= (bary_coords[:, 0:1] * triangle[0] +
xy_coords 1:2] * triangle[1] +
bary_coords[:, 2:3] * triangle[2])
bary_coords[:,
# Plot the ternary color key
= plt.subplots(figsize=(4, 4))
fig, ax 'equal')
ax.set_aspect('off')
ax.axis(
0], xy_coords[:, 1], c=rgb_vals, s=2)
ax.scatter(xy_coords[:, =True, edgecolor='black', fill=False))
ax.add_patch(Polygon(triangle, closed
# Labels
0.5, np.sqrt(3)/2 + 0.05, "Picoeukaryotes\n(Green)", ha='center', fontsize=9)
ax.text(-0.05, -0.05, "Synechococcus\n(Red)", ha='right', fontsize=9)
ax.text(1.05, -0.05, "Prochlorococcus\n(Blue)", ha='left', fontsize=9)
ax.text(
"Ternary Color Legend - (Phytoplankton Dominance)", fontsize=10, pad=30)
plt.title(
plt.tight_layout() plt.show()
Summary
There you have it! PACE’s first phytoplankton class algorithm. Note, even if these specific critters are not of particular interest to you, they are excellent indicators of underlying oceanographic processes (MOANA uses SST as an input as well). If you are a highly migratory species traversing the open seas, these boundaries may very well be influencial.