Tutorial 3: Make way for MOANA - Multi Ordination ANAlysis

Author

Ryan Vandermeulen (NOAA) and edits by Eli Holmes (NOAA)

Colab Badge JupyterHub Badge Download Badge

📘 Learning Objectives

  1. Create pretty maps of the three phytoplankton classes in MOANA
  2. Plot how all three of these phytoplankton classes change in relation to latitude
  3. Plot how all three of these phytoplankton classes change over the course of 1 year
  4. Create a ternary plot and map that shows where each phytoplankton class dominates

You will need 3.7Gb for this tutorial

Go to File > Hub Control Panel > Stop my server

Then restart after chosing 3.7 Gb in the drop-down.

Interacting with multiple phytoplankton groups simultaneously

MOANA is the first phytoplankton community composition algorithm to be released by PACE. This product returns near-surface concentrations (cells mL-1) of three different picophytoplankton (i.e., phytoplankton <2 μm in size): Prochlorococcus, Synechococcus, and autotrophic picoeukaryotes (Figure 15). The algorithm uses empirical relationships between measured cell concentrations, in situ hyperspectral remote sensing reflectances, and sea surface temperatures. Details of this algorithm can be found in Lange et al. (2020).

Picophytoplankton are composed of the cyanobacteria Prochlorococcus (∼0.8 µm) and Synechococcus (∼1 µm), as well as picoeukaryotes, which combined are responsible for 50 to 90% of all primary production in open ocean ecosystems and contribute up to 30% of carbon export in these regions. Geographically, Prochlorococcus tends to inhabit warmer and mostly oligotrophic waters surrounded by Synechococcus patches along frontal boundaries. These fronts often reside at boundaries where phytoplankton communities start to transition to higher concentrations of larger eukaryotic cells, such as picoeukaryotes and nanoeukaryotic flagellates. Thus, identification of Prochlorococcus and Synechococcus distributions may be useful in identifying trophic boundaries in oceanic ecosystems, in addition to providing insight into productivity, food web regimes, and carbon export.

Note, for now, the MOANA product is only avaialble for the Atlantic Ocean.

Are you ready?

Lock and load the data

# ---- Load Libraries ----
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# ---- Load Data ----
url = "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
ds = xr.open_dataset(url)

# --- Get first week of Sept 2024
ds = ds.sel(time="2024-09-01")
print(f"Size in GB: {ds.nbytes / 1e9:.2f} GB")
ds.sizes
Size in GB: 0.11 GB
Frozen({'latitude': 3360, 'longitude': 2640})
ds
<xarray.Dataset> Size: 106MB
Dimensions:          (latitude: 3360, longitude: 2640)
Coordinates:
    time             datetime64[ns] 8B 2024-09-01
  * latitude         (latitude) float32 13kB 69.98 69.94 69.9 ... -69.94 -69.98
  * longitude        (longitude) float32 11kB -84.98 -84.94 ... 24.94 24.98
Data variables:
    prococcus_moana  (latitude, longitude) float32 35MB ...
    syncoccus_moana  (latitude, longitude) float32 35MB ...
    picoeuk_moana    (latitude, longitude) float32 35MB ...
Attributes: (12/71)
    _lastModified:                     2025-02-28T17:56:21.000Z
    cdm_data_type:                     Grid
    comment:                           This dataset is redistributed by Atlan...
    Conventions:                       CF-1.10, ACDD-1.3, COARDS
    creator_email:                     data@oceancolor.gsfc.nasa.gov
    creator_name:                      NASA/GSFC/OBPG
    ...                                ...
    temporal_range:                    7-day
    time_coverage_end:                 2025-03-02T00:00:00Z
    time_coverage_start:               2024-04-18T00:00:00Z
    title:                             OCI L3 SMI, PACE MOANA 8DAY; via Atlan...
    Westernmost_Easting:               -84.97916
    westernmost_longitude:             -85.0
# let's load the data into memory since it is not so big
ds = ds.load()

It’s map time again

We’ve got three different phytoplankton classes in this MOANA data set. Let’s create some pretty maps and get a sense of their relative distributions.

# ---- Define metadata for plots ----
phyto_info = {
    "Prochlorococcus": {
        "data": ds["prococcus_moana"],
        "cmap": plt.cm.Blues,
        "label": "Prochlorococcus conc. (cells mL⁻¹)"
    },
    "Synechococcus": {
        "data": ds["syncoccus_moana"],
        "cmap": plt.cm.Reds,
        "label": "Synechococcus conc. (cells mL⁻¹)"
    },
    "Picoeukaryotes": {
        "data": ds["picoeuk_moana"],
        "cmap": plt.cm.Greens,
        "label": "Picoeukaryote conc. (cells mL⁻¹)"
    }
}
# ---- Set up figure and axes ----
fig, axs = plt.subplots(1, 3, figsize=(18, 6),
                        subplot_kw={'projection': ccrs.PlateCarree()})

# ---- Plot each phytoplankton group ----
for ax, (title, info) in zip(axs, phyto_info.items()):
    ax.set_title(title)
    ax.coastlines()
    ax.add_feature(cfeature.BORDERS, linewidth=0.3)
    gl = ax.gridlines(draw_labels=True)
    gl.top_labels = gl.right_labels = False

    img = info["data"].plot(
        ax=ax,
        cmap=info["cmap"],
        robust=True,
        add_colorbar=False
    )

    cbar = plt.colorbar(img, ax=ax, orientation='horizontal', pad=0.05, shrink=0.9)
    cbar.set_label(info["label"])

plt.tight_layout()
plt.show()

Looking good! Here you can see that little ole Prochlorococcus really runs amok in the offshore waters, where both Synechococcus and Picoeukaryotes tend to dominate at higher latitudes, and Picoeukaryotes assert their dominance in along the coastal margins… you know, to the degree that a free-floating cell can “assert” anything.

Let’s get a bit more quantitative

Let’s get a little better sense of how these phytoplakton classes are geographically distributed. What we’re going to do next is plot how all three of these phytoplankton classes change in relation to latitude. To do this, we’ll be taking “slices” of data and averaging them. Conceptually, think of a horizontal line across 50 degrees North - we’ll extract all that data, take an average, and then move on to the pixel below (e.g. 49.98 degrees North) and do the same until we hit 50 degrees South. Doing this for each phytoplankton class, we can now distill this information into something that can be put onto a line plot.

syn = ds["syncoccus_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])
pro = ds["prococcus_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])
pico = ds["picoeuk_moana"].sel(latitude = slice(50, -50)).mean(dim=["longitude"])

custom_colors = {
    "Prochlorococcus": plt.cm.Blues(0.6),
    "Synechococcus": plt.cm.Reds(0.6),
    "Picoeukaryotes": plt.cm.Greens(0.6)
}

fig, ax1 = plt.subplots(figsize=(6, 8))  # Taller figure for vertical layout

# Primary axis: Syn and Pico
ax1.plot(syn, syn["latitude"], label="Synechococcus", color=custom_colors["Synechococcus"])
ax1.plot(pico, pico["latitude"], label="Picoeukaryotes", color=custom_colors["Picoeukaryotes"])
ax1.set_ylabel("Latitude")
ax1.set_xlabel("Mean Abundance (Syn & Pico, cell ml$^{-1}$)", color="black")
ax1.tick_params(axis='x', labelcolor="black")

# Secondary axis: Prochlorococcus
ax2 = ax1.twiny()
ax2.plot(pro, pro["latitude"], label="Prochlorococcus", color=custom_colors["Prochlorococcus"], linestyle="--")
ax2.set_xlabel("Mean Abundance (Prochlorococcus, cell ml$^{-1}$)", color=custom_colors["Prochlorococcus"])
ax2.tick_params(axis='x', labelcolor=custom_colors["Prochlorococcus"])

# Combine legends
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc="lower right")

plt.title("Latitudinal Means of Phytoplankton Groups", pad=10)
plt.grid(True)
plt.tight_layout()
plt.show()

Note that Prochlorococcus is on a separate axis here because the cells are so abundant. That doesn’t necessarily mean they represent a larger amount of biomass across the board though. Prochlorococcus are teeny tiny little cyanobacteria cells, and Synechococcus and picoeukaryotes are larger, so in the latter case, fewer cells can represent more overall biomass. We make this distinction here becuase when we later consider which phytoplankton class is “dominant” in an area, it is relative to their respective range of concentrations, not absolute cell counts.

The times, they are a changin’

For this exercise, we’ll download one year’s worth of PACE data to see how things are evolving over time. Using this time series, we’re going to use the “slice” function again, but this time we’ll extract a 10 degree x 10 degree box in the south Atlantic ocean. It can be smaller, or located elsewhere if you like, but either way, we’re going to average the data within that box for each individual time step.

Next, we’ll plot how all three of these phytoplankton classes change over the course of 1 year, allowing you to see the cycles of growth and decay relative to one another. It is useful to add this context because not all chlorophyll-a is equivalent, e.g., a bloom of Prochlorococcus may represent a different amount of trophic energy potential relative to blooms of other phytoplankton classes, or it may havce come about from a different oceanographic process.

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from matplotlib.patches import Rectangle

# Bounding box
lon_min, lon_max = -54, -51
lat_min, lat_max = -42, -39

# Choose one phytoplankton group
title = "Prochlorococcus"
data = ds["prococcus_moana"].sel(latitude=slice(20, -60))
cmap = plt.cm.Blues
label = "Prochlorococcus conc. (cells mL⁻¹)"

# ---- Plot ----
fig, ax = plt.subplots(figsize=(8, 6), subplot_kw={'projection': ccrs.PlateCarree()})

# Base map
ax.set_title(title)
ax.coastlines()
ax.add_feature(cfeature.BORDERS, linewidth=0.3)
gl = ax.gridlines(draw_labels=True)
gl.top_labels = gl.right_labels = False

# Plot the data
img = data.plot(ax=ax, cmap=cmap, robust=True, add_colorbar=False)

# Add bounding box
ax.add_patch(Rectangle((lon_min, lat_min),
                       lon_max - lon_min,
                       lat_max - lat_min,
                       edgecolor='red',
                       facecolor='none',
                       linewidth=2,
                       transform=ccrs.PlateCarree()))

# Add colorbar
cbar = plt.colorbar(img, ax=ax, orientation='horizontal', pad=0.05, shrink=0.9)
cbar.set_label(label)

plt.tight_layout()
plt.show()

Load data and select our box.

# ---- Load Data ----
url = "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
dataset = xr.open_dataset(url)
da=dataset.sel(latitude=slice(-39,-42), longitude=slice(-54,-51))

# Get the mean over the box
syn_ts = da["syncoccus_moana"].mean(dim=["latitude", "longitude"])
pico_ts = da["picoeuk_moana"].mean(dim=["latitude", "longitude"])
pro_ts = da["prococcus_moana"].mean(dim=["latitude", "longitude"])

pro_ts
<xarray.DataArray 'prococcus_moana' (time: 41)> Size: 164B
array([ 76012.04 , 139162.08 , 177361.06 , 161316.4  , 222840.39 ,
       137385.   , 142513.7  , 114853.05 ,  29329.648, 222419.66 ,
       159956.17 ,  99857.34 ,  92749.43 ,  93915.49 ,  89002.47 ,
        78405.47 , 113439.734, 132991.88 , 105341.13 ,  68365.78 ,
        85625.   , 117346.81 ,  87780.3  , 124836.06 ,  76297.945,
       111245.4  , 130434.305, 147058.   , 114212.92 , 197231.17 ,
       199400.97 , 268445.97 , 258324.56 , 255930.8  , 271919.44 ,
       263647.38 , 260969.44 , 230369.28 , 278427.03 , 246284.44 ,
       308866.66 ], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 328B 2024-04-18 2024-04-26 ... 2025-03-02
# Create the figure and axis
fig, ax1 = plt.subplots(figsize=(12, 5))

custom_colors = {
    "Prochlorococcus": plt.cm.Blues(0.6),
    "Synechococcus": plt.cm.Reds(0.6),
    "Picoeukaryotes": plt.cm.Greens(0.6)
}

# Plot syn and pico on the left y-axis
ax1.plot(syn_ts["time"], syn_ts, 'o-', label="Synechococcus", color=custom_colors["Synechococcus"])
ax1.plot(pico_ts["time"], pico_ts, 'o-', label="Picoeukaryotes", color=custom_colors["Picoeukaryotes"])
ax1.set_xlabel("Time")
ax1.set_ylabel("Mean Abundance (Syn & Pico)", color="black")
ax1.tick_params(axis='y', labelcolor="black")

# Create a secondary y-axis for prococcus_moana
ax2 = ax1.twinx()
ax2.plot(pro_ts["time"], pro_ts, 'o--', label="Prochlorococcus", color=custom_colors["Prochlorococcus"])
ax2.set_ylabel("Mean Abundance (Prochlorococcus)", color=custom_colors["Prochlorococcus"])
ax2.tick_params(axis='y', labelcolor=custom_colors["Prochlorococcus"])

# Combine all legends
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc="upper left")

# Add title and grid
plt.title("8-day Phytoplankton Abundance in Selected Box")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Bringing it all together

A central theme of working with PACE data revolves around wrangling multiple dimensions of data into something you can conceptualize. Viewing three separate maps makes it a bit difficult to see the interactions between the different phytoplankton classes. This is important - we want to know where these boundaries are colliding, because this indicates that something is changing with the underlying oceanography.

Let’s try something a little different here. Instead of making three separate plots, let’s stack them all and make a false “true color” image by substituting the usual red, green, and blue radiance channels with Synechococcus (red), Picoeukaryotes (green), and Proclorococcus (blue). Can it be done?

Yes, but with a tiny bit of nuance here. We talked earlier about how higher cell counts does not unequivocally mean that this is a “dominant” class. We can do something called histogram stretching, which is a little trick some folks use to make their true color images pop a little more. Basically, for each phytoplankton class, let’s take all the data between the 2% and 98% percentile range, and normalize those data to the respective median. This ensure all phytoplankton classes are now on a level playing field when we stack them up. Higher or lower values no longer represent absolute cell counts, they now represent higher or lower weights relative to their central tendency. The same trick is often done for making “true color” images to make them pop a little more. Let’s give it a try:

# ---- Lazy Load Data ----
import xarray as xr
url = "https://cwcgom.aoml.noaa.gov/erddap/griddap/noaa_aoml_a113_ab04_933f"
ds = xr.open_dataset(url)

# When working with big data sets we want to subset before we start 
# working with them

# --- Get first week of Sept 2024
ds = ds.sel(time="2024-09-01")

# ---- Subset to North Atlantic off US Coast ----
extent = [-84, -20, 15, 60]
ds = ds.sel(latitude=slice(60,15), longitude=slice(-84,-20))

Create a “red, green, blue” stack

import numpy as np

# ---- Normalize using robust percentile-based scaling ----
def robust_normalize(arr):
    vmin, vmax = np.nanpercentile(arr, [2, 98])
    return np.clip((arr - vmin) / (vmax - vmin), 0, 1)

# ---- Define our colors ----
phyto_rgb = {
    "R": ds["syncoccus_moana"],
    "G": ds["picoeuk_moana"],
    "B": ds["prococcus_moana"]
}

# ---- Create a stack for implot ----
rgb_image = np.stack([robust_normalize(da.values) for da in phyto_rgb.values()], axis=-1)
rgb_image[np.isnan(rgb_image)] = 0
rgb_image = np.flipud(rgb_image)
# ---- Plot ----
fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_title("Relative Dominance of Phytoplankton Groups (RGB Blending)")
ax.coastlines()
ax.add_feature(cfeature.BORDERS, linewidth=0.3)

# Grid labels only
gl = ax.gridlines(draw_labels=True, linewidth=0, color='none')
gl.right_labels = gl.top_labels = False

# extent defined when data set up above
ax.imshow(rgb_image, origin='lower', extent=extent, transform=ccrs.PlateCarree())
ax.set_extent(extent, crs=ccrs.PlateCarree())

plt.tight_layout()
plt.show()

Now you can really see where each phytoplankton class dominates, and through the gradational mixing of colors, we can see where, and to what degree they overlap. Neat, right?

Create a legend

We still could use a bit of help contextualizing the above image, so let’s just make a little legend for it. Since we’re in tri-color space, let’s use the very useful, and in my opinion, underutilized ternary plot. These are fun little plots that show the relative contribution of three variables relative to one another. It helps us determine how to interpret the purple, teal, pink, and orange areas you see on the map.

from matplotlib.patches import Polygon

# Define triangle vertices (RGB space: R=Syn, G=Pico, B=Pro)
triangle = np.array([
    [0.5, np.sqrt(3)/2],  # top = Pico (Green)
    [0, 0],               # bottom left = Syn (Red)
    [1, 0]                # bottom right = Pro (Blue)
])

# Create barycentric coordinates for triangle
res = 150
bary_coords = np.array([
    [i / res, j / res, 1 - i / res - j / res]
    for i in range(res + 1)
    for j in range(res + 1 - i)
])
rgb_vals = bary_coords[:, [1, 0, 2]]  # R=Syn, G=Pico, B=Pro
rgb_vals = np.clip(rgb_vals, 0, 1)

# Convert to triangle (x, y) coordinates
xy_coords = (bary_coords[:, 0:1] * triangle[0] +
             bary_coords[:, 1:2] * triangle[1] +
             bary_coords[:, 2:3] * triangle[2])

# Plot the ternary color key
fig, ax = plt.subplots(figsize=(4, 4))
ax.set_aspect('equal')
ax.axis('off')

ax.scatter(xy_coords[:, 0], xy_coords[:, 1], c=rgb_vals, s=2)
ax.add_patch(Polygon(triangle, closed=True, edgecolor='black', fill=False))

# Labels
ax.text(0.5, np.sqrt(3)/2 + 0.05, "Picoeukaryotes\n(Green)", ha='center', fontsize=9)
ax.text(-0.05, -0.05, "Synechococcus\n(Red)", ha='right', fontsize=9)
ax.text(1.05, -0.05, "Prochlorococcus\n(Blue)", ha='left', fontsize=9)

plt.title("Ternary Color Legend - (Phytoplankton Dominance)", fontsize=10, pad=30)
plt.tight_layout()
plt.show()

Summary

There you have it! PACE’s first phytoplankton class algorithm. Note, even if these specific critters are not of particular interest to you, they are excellent indicators of underlying oceanographic processes (MOANA uses SST as an input as well). If you are a highly migratory species traversing the open seas, these boundaries may very well be influencial.