Exploring ship echosounder data from the Pacific Hake survey

Jupyter notebook accompanying the manuscript:

Echopype: A Python library for interoperable and scalable processing of ocean sonar data for biological information
Authors: Wu-Jung Lee, Emilio Mayorga, Landung Setiawan, Kavin Nguyen, Imran Majeed, Valentina Staneva

Introduction

Goals

  • Illustrate a common workflow for echosounder data conversion, calibration and use. This workflow leverages the standardization applied by echopype and the power, ease of use and familiarity of libraries in the scientific Python ecosystem.
  • Extract and visualize data with relative ease using geospatial and temporal filters.

Description

This notebook uses EK60 echosounder data collected during the 2017 Joint U.S.-Canada Integrated Ecosystem and Pacific Hake Acoustic Trawl Survey (‘Pacific Hake Survey’) to illustrate a common workflow for data conversion, calibration and analysis using echopype and core scientific Python software packages, particularly xarray, GeoPandas, pandas and NumPy.

Two days of cloud-hosted .raw data files are accessed by echopype directly from an Amazon Web Services (AWS) S3 “bucket” maintained by the NOAA NCEI Water-Column Sonar Data Archive. The total data used are 170 .raw files at approximately 25 MB each (1 Hz pinging rate from first light to dusk), corresponding to approximately 4.2 GB. With echopype, each file is converted to a standardized representation based on the SONAR-netCDF4 v1.0 convention and saved to the cloud-optimized Zarr format.

Data stored in the netCDF-based SONAR-netCDF4 convention can be conveniently and intuitively manipulated with xarray in combination with related scientific Python packages. Mean Volume Backscattering Strength (MVBS) is computed with echopype from each raw data file and exported to a netCDF file. Here, we define two geographical bounding boxes encompassing two ship tracks and use these to extract corresponding timestamp intervals from the GPS data, and then the corresponding MVBS data based on those intervals. Finally, these extracted MVBS subsets are plotted as track echograms.

Outline

  1. Establish AWS S3 file system connection and generate list of target EK60 .raw files
  2. Process S3-hosted raw files with echopype: convert, calibrate and export to standardized files
  3. Extract and process GPS locations from the Platform group of converted raw files
  4. Read MVBS and plot track echograms for time periods corresponding to two ship tracks

Running the notebook

This notebook can be run with a conda environment created using the conda environment file https://github.com/OSOceanAcoustics/echopype-examples/blob/main/binder/environment.yml. The notebook creates two directories, if not already present: ./exports/hakesurvey_convertedzarr and ./exports/hakesurvey_calibratednc. netCDF and Zarr files will be exported there.

Note

We encourage importing echopype as ep for consistency.

from pathlib import Path

import fsspec
import numpy as np
import geopandas as gpd
import xarray as xr

import matplotlib.pyplot as plt
from shapely.geometry import box
import cartopy.crs as ccrs
import cartopy.io.img_tiles as cimgt
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

import echopype as ep
from echopype.qc import exist_reversed_time

import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)
from dask.distributed import Client
client = Client()
client

Client

Client-0457e919-0407-11ef-80d7-820f588673e5

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: /user/leewujung/proxy/8787/status

Cluster Info

Worker: 1

Comm: tcp://127.0.0.1:33021 Total threads: 2
Dashboard: /user/leewujung/proxy/33901/status Memory: 7.84 GiB
Nanny: tcp://127.0.0.1:43227
Local directory: /tmp/dask-scratch-space/worker-rtm3hewa

Worker: 2

Comm: tcp://127.0.0.1:36503 Total threads: 2
Dashboard: /user/leewujung/proxy/34185/status Memory: 7.84 GiB
Nanny: tcp://127.0.0.1:42299
Local directory: /tmp/dask-scratch-space/worker-sq1_hzly

Worker: 3

Comm: tcp://127.0.0.1:32945 Total threads: 2
Dashboard: /user/leewujung/proxy/38863/status Memory: 7.84 GiB
Nanny: tcp://127.0.0.1:38595
Local directory: /tmp/dask-scratch-space/worker-ewwsz9bc

Establish AWS S3 file system connection and generate list of target EK60 .raw files

Access and inspect the publicly accessible NCEI WCSD S3 bucket on the AWS cloud as if it were a local file system. This will be done through the Python fsspec file system and bytes storage interface. We will use fsspec.filesystem.glob (fs.glob) to generate a list of all EK60 .raw data files in the bucket, then filter on file names for target dates of interest.

The directory path on the ncei-wcsd-archive S3 bucket is s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/. All .raw files from the 2017 Hake survey cruise are found here.

fs = fsspec.filesystem('s3', anon=True)

bucket = "ncei-wcsd-archive"
rawdirpath = "data/raw/Bell_M._Shimada/SH1707/EK60"
s3rawfiles = fs.glob(f"{bucket}/{rawdirpath}/*.raw")

# print out the last two S3 raw file paths in the list
s3rawfiles[-2:]
['ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170913-T180733.raw',
 'ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Winter2017-D20170615-T002629.raw']

Generate list of target EK60 .raw files from AWS S3 bucket based on dates. The dates are found in the middle string token (e.g., “D20170913”). Select files from 2 days, 2017-07-28 and 2017-07-29.

s3rawfiles = [
    s3path for s3path in s3rawfiles 
    if any([f"D2017{datestr}" in s3path for datestr in ['0728', '0729']])
]

print(f"There are {len(s3rawfiles)} target raw files available")
There are 170 target raw files available

Process S3-hosted raw files with echopype: convert, calibrate and export to standardized files

Loop through all the selected raw files on S3 and convert, calibrate and generate Mean Volume Backscattering Strength (MVBS). Save the raw converted and MVBS data to local files, as zarr and netCDF, respectively.

def populate_metadata(ed, raw_fname):
    """
    Manually populate into the "ed" EchoData object 
    additional metadata about the dataset and the platform
    """
    
    # -- SONAR-netCDF4 Top-level Group attributes
    survey_name = (
        "2017 Joint U.S.-Canada Integrated Ecosystem and "
        "Pacific Hake Acoustic Trawl Survey ('Pacific Hake Survey')"
    )
    ed['Top-level'].attrs['title'] = f"{survey_name}, file {raw_fname}"
    ed['Top-level'].attrs['summary'] = (
        f"EK60 raw file {raw_fname} from the {survey_name}, converted to a SONAR-netCDF4 file using echopype."
        "Information about the survey program is available at "
        "https://www.fisheries.noaa.gov/west-coast/science-data/"
        "joint-us-canada-integrated-ecosystem-and-pacific-hake-acoustic-trawl-survey"
    )

    # -- SONAR-netCDF4 Platform Group attributes
    # Per SONAR-netCDF4, for platform_type see https://vocab.ices.dk/?ref=311
    ed['Platform'].attrs['platform_type'] = "Research vessel"
    ed['Platform'].attrs['platform_name'] = "Bell M. Shimada"  # A NOAA ship
    ed['Platform'].attrs['platform_code_ICES'] = "315"

Create the directories where the exported files will be saved, if these directories don’t already exist.

base_dpath = Path('./exports/notebook2')
base_dpath.mkdir(exist_ok=True, parents=True)

converted_dpath = Path(base_dpath / 'hakesurvey_convertedzarr')
converted_dpath.mkdir(exist_ok=True)
calibrated_dpath = (base_dpath / 'hakesurvey_calibratednc')
calibrated_dpath.mkdir(exist_ok=True)

echopype processing

EchoData is an echopype object for conveniently handling raw converted data from either raw instrument files or previously converted and standardized raw netCDF4 and Zarr files. It is essentially a container for multiple xarray.Dataset objects, each corresponds to one of the netCDF4 groups specified in the SONAR-netCDF4 convention – the convention followed by echopype. The EchoData object can be used to conveniently accesse and explore the echosounder raw data and for calibration and other processing.

The cell below contains the main echopype workflow steps. For each raw file: - Access file directly from S3 via ep.open_raw to create a converted EchoData object in memory - Add global and platform attributes to EchoData object - Export to a local Zarr dataset (a collection of files encapsulated in a directory) - Generate calibrated Sv and then MVBS from the raw data in the EchoData object - Export MVBS to a local netcdf file

Note: Depending on your internet speed, this cell may take some time to run (potentially 20-30 mins).

# %%time

# for s3rawfpath in s3rawfiles:
#     raw_fpath = Path(s3rawfpath)
#     try:
#         # Access file directly from S3 to create a converted EchoData object in memory
#         ed = ep.open_raw(
#             f"s3://{s3rawfpath}",
#             sonar_model='EK60',
#             storage_options={'anon': True}
#         )
#         # Manually populate additional metadata about the dataset and the platform
#         populate_metadata(ed, raw_fpath.name)

#         # Save to converted Zarr format
#         ed.to_zarr(save_path=converted_dpath, overwrite=True)

#         # Use the EchoData object "ed" to generate calibrated and
#         # computed MVBS files that will be saved to netcdf
#         ds_Sv = ep.calibrate.compute_Sv(ed)
#         ds_MVBS = ep.commongrid.compute_MVBS(
#             ds_Sv,
#             range_meter_bin=5,  # in meters
#             ping_time_bin='20s'  # in seconds
#         )
#         ds_MVBS.to_netcdf(calibrated_dpath / f"MVBS_{raw_fpath.stem}.nc")
#     except Exception as e:
#         print(f"Failed to process raw file {raw_fpath.name}: {e}")

Test for time reversals

Small time reversals are found in EK60 datasets, including the 2017 Pacific Hake survey, where the ping_time (or GPS time1) value may be lower (older) than the preceding ping_time by a second. Such discontinuities can interfere with concatenating individual raw files to produce an aggregated dataset. The capability to identify and address these reversals is in the echopype.qc subpackage.

%%time
for datapath in converted_dpath.glob('*'):
    ed = ep.open_converted(datapath, chunks={})
    # Test for a negative ping_time increment in sequential timestamps, in the Sonar/Beam_group1 group
    if exist_reversed_time(ds=ed['Sonar/Beam_group1'], time_name="ping_time"):
        print(f"Reversed time in {datapath}")
CPU times: user 19.7 s, sys: 6.21 s, total: 25.9 s
Wall time: 2min 9s

There are no time reversals in this two-day dataset, fortunately.

Examine the EchoData object for one of the data files

echopype provides a user-friendly, convenient representation of an EchoData object that leverages the user-friendly xarray Dataset HTML representation. Since an EchoData object is effectively a container for multiple xarray.Dataset objects corresponding to netCDF4 groups, the notebook “print out” provides a summary view of all the groups and interactive access to summaries of each group.

Here, ed is the last object opened in the time reversal test, in the preceding cell.

ed = ep.open_converted(list(converted_dpath.glob('*'))[0], chunks={})
ed
EchoData: standardized raw data from /home/jovyan/shared/echopype-examples/notebooks/exports/notebook2/hakesurvey_convertedzarr/Summer2017-D20170729-T235813.zarr
    • <xarray.Dataset>
      Dimensions:  ()
      Data variables:
          *empty*
      Attributes:
          conventions:                 CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3
          date_created:                2017-07-29T23:58:13Z
          keywords:                    EK60
          processing_level:            Level 1A
          processing_level_url:        https://echopype.readthedocs.io/en/stable/pr...
          sonar_convention_authority:  ICES
          sonar_convention_name:       SONAR-netCDF4
          sonar_convention_version:    1.0
          summary:                     EK60 raw file Summer2017-D20170729-T235813.r...
          title:                       2017 Joint U.S.-Canada Integrated Ecosystem ...

      • <xarray.Dataset>
        Dimensions:                 (channel: 3, time1: 534)
        Coordinates:
          * channel                 (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18...
          * time1                   (time1) datetime64[ns] 2017-07-29T23:58:13.128697...
        Data variables:
            absorption_indicative   (channel, time1) float64 dask.array<chunksize=(3, 534), meta=np.ndarray>
            frequency_nominal       (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
            sound_speed_indicative  (channel, time1) float64 dask.array<chunksize=(3, 534), meta=np.ndarray>

      • <xarray.Dataset>
        Dimensions:              (channel: 3, time1: 1676, time2: 534)
        Coordinates:
          * channel              (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11...
          * time1                (time1) datetime64[ns] 2017-07-29T23:58:13.511395840...
          * time2                (time2) datetime64[ns] 2017-07-29T23:58:13.128697856...
        Data variables: (12/20)
            MRU_offset_x         float64 ...
            MRU_offset_y         float64 ...
            MRU_offset_z         float64 ...
            MRU_rotation_x       float64 ...
            MRU_rotation_y       float64 ...
            MRU_rotation_z       float64 ...
            ...                   ...
            sentence_type        (time1) <U3 dask.array<chunksize=(1676,), meta=np.ndarray>
            transducer_offset_x  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
            transducer_offset_y  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
            transducer_offset_z  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
            vertical_offset      (time2) float64 dask.array<chunksize=(534,), meta=np.ndarray>
            water_level          float64 ...
        Attributes:
            platform_code_ICES:  315
            platform_name:       Bell M. Shimada
            platform_type:       Research vessel

        • <xarray.Dataset>
          Dimensions:        (time1: 17325)
          Coordinates:
            * time1          (time1) datetime64[ns] 2017-07-29T23:58:13.128697856 ... 2...
          Data variables:
              NMEA_datagram  (time1) <U73 dask.array<chunksize=(17325,), meta=np.ndarray>
          Attributes:
              description:  All NMEA sensor datagrams

      • <xarray.Dataset>
        Dimensions:           (filenames: 1)
        Coordinates:
          * filenames         (filenames) int64 0
        Data variables:
            source_filenames  (filenames) <U92 dask.array<chunksize=(1,), meta=np.ndarray>
        Attributes:
            conversion_software_name:     echopype
            conversion_software_version:  0.8.1
            conversion_time:              2024-03-22T16:55:17Z

      • <xarray.Dataset>
        Dimensions:           (beam_group: 1)
        Coordinates:
          * beam_group        (beam_group) <U11 'Beam_group1'
        Data variables:
            beam_group_descr  (beam_group) <U131 dask.array<chunksize=(1,), meta=np.ndarray>
        Attributes:
            sonar_manufacturer:      Simrad
            sonar_model:             EK60
            sonar_serial_number:     
            sonar_software_name:     ER60
            sonar_software_version:  2.4.3
            sonar_type:              echosounder

        • <xarray.Dataset>
          Dimensions:                        (channel: 3, ping_time: 534,
                                              range_sample: 3957)
          Coordinates:
            * channel                        (channel) <U37 'GPT  18 kHz 009072058c8d 1...
            * ping_time                      (ping_time) datetime64[ns] 2017-07-29T23:5...
            * range_sample                   (range_sample) int64 0 1 2 ... 3954 3955 3956
          Data variables: (12/29)
              angle_alongship                (channel, ping_time, range_sample) float32 dask.array<chunksize=(3, 534, 3957), meta=np.ndarray>
              angle_athwartship              (channel, ping_time, range_sample) float32 dask.array<chunksize=(3, 534, 3957), meta=np.ndarray>
              angle_offset_alongship         (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              angle_offset_athwartship       (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              angle_sensitivity_alongship    (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              angle_sensitivity_athwartship  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              ...                             ...
              transmit_bandwidth             (channel, ping_time) float64 dask.array<chunksize=(3, 534), meta=np.ndarray>
              transmit_duration_nominal      (channel, ping_time) float64 dask.array<chunksize=(3, 534), meta=np.ndarray>
              transmit_frequency_start       (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              transmit_frequency_stop        (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
              transmit_power                 (channel, ping_time) float64 dask.array<chunksize=(3, 534), meta=np.ndarray>
              transmit_type                  <U2 ...
          Attributes:
              beam_mode:              vertical
              conversion_equation_t:  type_3

      • <xarray.Dataset>
        Dimensions:            (channel: 3, pulse_length_bin: 5)
        Coordinates:
          * channel            (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11' ...
          * pulse_length_bin   (pulse_length_bin) int64 0 1 2 3 4
        Data variables:
            frequency_nominal  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
            gain_correction    (channel, pulse_length_bin) float64 dask.array<chunksize=(3, 5), meta=np.ndarray>
            pulse_length       (channel, pulse_length_bin) float64 dask.array<chunksize=(3, 5), meta=np.ndarray>
            sa_correction      (channel, pulse_length_bin) float64 dask.array<chunksize=(3, 5), meta=np.ndarray>
    • channel
      PandasIndex
      PandasIndex(Index(['GPT  18 kHz 009072058c8d 1-1 ES18-11',
             'GPT  38 kHz 009072058146 2-1 ES38B',
             'GPT 120 kHz 00907205a6d0 4-1 ES120-7C'],
            dtype='object', name='channel'))
    • pulse_length_bin
      PandasIndex
      PandasIndex(Int64Index([0, 1, 2, 3, 4], dtype='int64', name='pulse_length_bin'))

  • Extract and process GPS locations from the Platform group of converted raw files

    Use xarray.open_mfdataset to open the Platform group from all the converted raw netcdf files as a single concatenated (combined) xarray dataset. Then extract GPS time1 (time stamp), latitude and longitude from this group and transform that data into a GeoPandas GeoDataFrame containing point-geometry objects that are readily manipulated via geospatial operations. A GeoDataFrame adds geospatial capabilities to a Pandas DataFrame.

    Due to the presence of multiple time coordinates in this group, care must be taken in defining how the concatenation (combine) operation is to be performed. This is captured in the arguments passed to open_mfdataset.

    %%time
    platform_ds = xr.open_mfdataset(
        str(converted_dpath / '*.zarr'), group='Platform', 
        engine='zarr',
        data_vars='minimal', coords='minimal',
        combine='nested', chunks={}
    )
    CPU times: user 1min 52s, sys: 12.3 s, total: 2min 5s
    Wall time: 3min 34s
    platform_ds
    <xarray.Dataset>
    Dimensions:              (channel: 3, time1: 244846, time2: 88959)
    Coordinates:
      * channel              (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11...
      * time1                (time1) datetime64[ns] 2017-07-28T00:05:36.103314944...
      * time2                (time2) datetime64[ns] 2017-07-28T00:05:34.897271808...
    Data variables: (12/20)
        MRU_offset_x         float64 nan
        MRU_offset_y         float64 nan
        MRU_offset_z         float64 nan
        MRU_rotation_x       float64 nan
        MRU_rotation_y       float64 nan
        MRU_rotation_z       float64 nan
        ...                   ...
        sentence_type        (time1) object dask.array<chunksize=(244846,), meta=np.ndarray>
        transducer_offset_x  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
        transducer_offset_y  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
        transducer_offset_z  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
        vertical_offset      (time2) float64 dask.array<chunksize=(88959,), meta=np.ndarray>
        water_level          float64 9.15
    Attributes:
        platform_code_ICES:  315
        platform_name:       Bell M. Shimada
        platform_type:       Research vessel
    • channel
      PandasIndex
      PandasIndex(Index(['GPT  18 kHz 009072058c8d 1-1 ES18-11',
             'GPT  38 kHz 009072058146 2-1 ES38B',
             'GPT 120 kHz 00907205a6d0 4-1 ES120-7C'],
            dtype='object', name='channel'))
    • time1
      PandasIndex
      PandasIndex(DatetimeIndex(['2017-07-28 00:05:36.103314944',
                     '2017-07-28 00:05:37.511096832',
                     '2017-07-28 00:05:37.669428224',
                     '2017-07-28 00:05:37.948589056',
                     '2017-07-28 00:05:39.000582144',
                     '2017-07-28 00:05:39.158913024',
                     '2017-07-28 00:05:40.353559040',
                     '2017-07-28 00:05:41.152783872',
                     '2017-07-28 00:05:41.311114752',
                     '2017-07-28 00:05:42.003654144',
                     ...
                     '2017-07-30 00:17:52.538341888',
                     '2017-07-30 00:17:53.276961792',
                     '2017-07-30 00:17:53.926177792',
                     '2017-07-30 00:17:54.084509184',
                     '2017-07-30 00:17:55.277079040',
                     '2017-07-30 00:17:55.924213248',
                     '2017-07-30 00:17:56.082544128',
                     '2017-07-30 00:17:57.274190848',
                     '2017-07-30 00:17:59.223481856',
                     '2017-07-30 00:17:59.381813248'],
                    dtype='datetime64[ns]', name='time1', length=244846, freq=None))
    • time2
      PandasIndex
      PandasIndex(DatetimeIndex(['2017-07-28 00:05:34.897271808',
                     '2017-07-28 00:05:37.162400768',
                     '2017-07-28 00:05:38.404471808',
                     '2017-07-28 00:05:40.670601216',
                     '2017-07-28 00:05:42.906728960',
                     '2017-07-28 00:05:45.152857088',
                     '2017-07-28 00:05:47.387984896',
                     '2017-07-28 00:05:49.643113984',
                     '2017-07-28 00:05:51.899244032',
                     '2017-07-28 00:05:54.145372160',
                     ...
                     '2017-07-30 00:17:36.490025984',
                     '2017-07-30 00:17:38.716151808',
                     '2017-07-30 00:17:40.951281152',
                     '2017-07-30 00:17:43.178409984',
                        '2017-07-30 00:17:45.393536',
                     '2017-07-30 00:17:47.619662848',
                     '2017-07-30 00:17:49.835790848',
                     '2017-07-30 00:17:52.050916864',
                     '2017-07-30 00:17:54.266042880',
                     '2017-07-30 00:17:56.492170240'],
                    dtype='datetime64[ns]', name='time2', length=88959, freq=None))
  • platform_code_ICES :
    315
    platform_name :
    Bell M. Shimada
    platform_type :
    Research vessel
  • We can use time1 (the timestamps for NMEA datagrams, or GPS timestamps) to examine the exact timestamp interval spanned by the combined dataset.

    print(f"{platform_ds.time1.values.min()}, {platform_ds.time1.values.max()}")
    2017-07-28T00:05:36.103314944, 2017-07-30T00:17:59.381813248

    To create the GeoPandas GeoDataFrame, first transform the latitude and longitude arrays to a single Pandas DataFrame that retains the time1 coordinate as a common index. This is done by using the DataFrame to_dataframe method together with a Pandas join operation. Then, a point GeoDataFrame is created from this DataFrame.

    gps_df = platform_ds.latitude.to_dataframe().join(platform_ds.longitude.to_dataframe())
    
    gps_df.head(3)
    /srv/conda/envs/notebook/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 637.80 MiB.
    This may cause some slowdown.
    Consider scattering data ahead of time and using futures.
      warnings.warn(
    /srv/conda/envs/notebook/lib/python3.10/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 637.80 MiB.
    This may cause some slowdown.
    Consider scattering data ahead of time and using futures.
      warnings.warn(
    latitude longitude
    time1
    2017-07-28 00:05:36.103314944 43.533072 -124.683998
    2017-07-28 00:05:37.511096832 43.533080 -124.684005
    2017-07-28 00:05:37.669428224 43.533167 -124.684000
    gps_gdf = gpd.GeoDataFrame(
        gps_df,
        geometry=gpd.points_from_xy(gps_df['longitude'], gps_df['latitude']), 
        crs="epsg:4326"
    )

    A simple, easily generated map plot of the point GeoDataFrame

    gps_gdf.plot(markersize=2)

    Read MVBS and plot track echograms for time periods corresponding to two ship tracks

    Read MVBS as a concatenated dataset

    Use xarray.open_mfdataset again to read and concatenate (combine) data files into a single xarray Dataset. This time, we’re reading the MVBS netCDF files.

    %%time
    MVBS_ds = xr.open_mfdataset(
        str(calibrated_dpath / 'MVBS_*.nc'), 
        data_vars='minimal', coords='minimal',
        combine='by_coords'
    )
    CPU times: user 6.77 s, sys: 601 ms, total: 7.38 s
    Wall time: 20.3 s
    MVBS_ds
    <xarray.Dataset>
    Dimensions:            (channel: 3, ping_time: 8819, echo_range: 150)
    Coordinates:
      * channel            (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11' ...
      * ping_time          (ping_time) datetime64[ns] 2017-07-28T00:05:20 ... 201...
      * echo_range         (echo_range) float64 0.0 5.0 10.0 ... 735.0 740.0 745.0
    Data variables:
        Sv                 (channel, ping_time, echo_range) float64 dask.array<chunksize=(3, 60, 150), meta=np.ndarray>
        frequency_nominal  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
    Attributes:
        processing_software_name:     echopype
        processing_software_version:  0.8.1
        processing_time:              2024-03-22T16:12:57Z
        processing_function:          commongrid.compute_MVBS
    • channel
      PandasIndex
      PandasIndex(Index(['GPT  18 kHz 009072058c8d 1-1 ES18-11',
             'GPT  38 kHz 009072058146 2-1 ES38B',
             'GPT 120 kHz 00907205a6d0 4-1 ES120-7C'],
            dtype='object', name='channel'))
    • ping_time
      PandasIndex
      PandasIndex(DatetimeIndex(['2017-07-28 00:05:20', '2017-07-28 00:05:40',
                     '2017-07-28 00:06:00', '2017-07-28 00:06:20',
                     '2017-07-28 00:06:40', '2017-07-28 00:07:00',
                     '2017-07-28 00:07:20', '2017-07-28 00:07:40',
                     '2017-07-28 00:08:00', '2017-07-28 00:08:20',
                     ...
                     '2017-07-30 00:14:40', '2017-07-30 00:15:00',
                     '2017-07-30 00:15:20', '2017-07-30 00:15:40',
                     '2017-07-30 00:16:00', '2017-07-30 00:16:20',
                     '2017-07-30 00:16:40', '2017-07-30 00:17:00',
                     '2017-07-30 00:17:20', '2017-07-30 00:17:40'],
                    dtype='datetime64[ns]', name='ping_time', length=8819, freq=None))
    • echo_range
      PandasIndex
      PandasIndex(Float64Index([  0.0,   5.0,  10.0,  15.0,  20.0,  25.0,  30.0,  35.0,  40.0,
                     45.0,
                    ...
                    700.0, 705.0, 710.0, 715.0, 720.0, 725.0, 730.0, 735.0, 740.0,
                    745.0],
                   dtype='float64', name='echo_range', length=150))
  • processing_software_name :
    echopype
    processing_software_version :
    0.8.1
    processing_time :
    2024-03-22T16:12:57Z
    processing_function :
    commongrid.compute_MVBS
  • Replace the channel dimension and coordinate with the frequency_nominal variable containing actual frequency values. Note that this step is possible only because there are no duplicated frequencies present.

    %%time
    MVBS_ds = ep.consolidate.swap_dims_channel_frequency(MVBS_ds)
    CPU times: user 1.54 s, sys: 193 ms, total: 1.74 s
    Wall time: 11.6 s
    MVBS_ds
    <xarray.Dataset>
    Dimensions:            (frequency_nominal: 3, ping_time: 8819, echo_range: 150)
    Coordinates:
      * ping_time          (ping_time) datetime64[ns] 2017-07-28T00:05:20 ... 201...
      * echo_range         (echo_range) float64 0.0 5.0 10.0 ... 735.0 740.0 745.0
      * frequency_nominal  (frequency_nominal) float64 1.8e+04 3.8e+04 1.2e+05
    Data variables:
        Sv                 (frequency_nominal, ping_time, echo_range) float64 dask.array<chunksize=(3, 60, 150), meta=np.ndarray>
        channel            (frequency_nominal) <U37 'GPT  18 kHz 009072058c8d 1-1...
    Attributes:
        processing_software_name:     echopype
        processing_software_version:  0.8.1
        processing_time:              2024-03-22T16:12:57Z
        processing_function:          commongrid.compute_MVBS

    Extract MVBS along two N-S tracks selected via geographical bounding boxes

    Define rectangular bounding boxes around two tracks oriented North-South, then plot a reference map showing all the GPS points (in red), the two bounding boxes (black), and the OOI mooring location (CE04 Oregon Offshore, yellow star) examined in the accompanying Jupyter notebook.

    tracksouth_bbox = gpd.GeoSeries(box(-125.17, 43.65, -125.14, 43.84), crs=gps_gdf.crs)
    tracknorth_bbox = gpd.GeoSeries(box(-125.17, 43.98, -125.14, 44.17), crs=gps_gdf.crs)
    basemap = cimgt.OSM()
    
    _, ax = plt.subplots(
        figsize=(7, 7), subplot_kw={"projection": basemap.crs}
    )
    bnd = gps_gdf.geometry.bounds
    ax.set_extent([bnd.minx.min() - 1, bnd.maxx.max() + 2, 
                   bnd.miny.min() - 0.8, bnd.maxy.max() + 2.2])
    ax.add_image(basemap, 7)
    ax.gridlines(draw_labels=True, xformatter=LONGITUDE_FORMATTER, yformatter=LATITUDE_FORMATTER)
    
    # GPS points
    gps_gdf.plot(ax=ax, markersize=0.1, color='red', 
                 transform=ccrs.PlateCarree())
    
    # Bounding box for selected tracks
    tracksouth_bbox.plot(ax=ax, edgecolor="black", linewidth=1.2, facecolor='none', 
                         transform=ccrs.PlateCarree())
    tracknorth_bbox.plot(ax=ax, edgecolor="black", linewidth=1.2, facecolor='none', 
                         transform=ccrs.PlateCarree())
    
    # OOI CE04 Oregon Offshore mooring location
    plt.plot(-124.95, 44.37, marker='*', color='yellow', markersize=13, 
             transform=ccrs.PlateCarree());

    Clip the GPS locations GeoPandas GeoDataFrame gps_gdf generated from the Platform group with the two rectangular regions, tracksouth_bbox and tracknorth_bbox. Extract from those clipped GeoDataFrames (tracksouth_gps_gdf and tracknorth_gps_gdf) the minimum and maximum time1 timestamps for each track and use them to select the corresponding time span in the MVBS data.

    Finally, create a MVBS Dataset subset for each track (tracksouth_MVBS_ds and tracknorth_MVBS_ds) by taking advantage of xarray’s “label”-based selection capability via the sel method. We select MVBS_ds data with ping_time values within the timestamp interval “slice” extracted from the geographical track. As we can see above, ping_time is a coordinate variable in the MVBS_ds Dataset.

    tracksouth_gps_gdf = gpd.clip(gps_gdf, tracksouth_bbox)
    tracksouth_MVBS_ds = MVBS_ds.sel(
        ping_time=slice(tracksouth_gps_gdf.index.min(), tracksouth_gps_gdf.index.max())
    )
    
    tracknorth_gps_gdf = gpd.clip(gps_gdf, tracknorth_bbox)
    tracknorth_MVBS_ds = MVBS_ds.sel(
        ping_time=slice(tracknorth_gps_gdf.index.min(), tracknorth_gps_gdf.index.max())
    )

    Plot MVBS echograms for the two N-S tracks, for all 3 frequencies

    The final step is to create echogram plots (range vs ping_time) for each echosounder frequency and each of the two selected ship tracks. That’s 6 subplots. We first define two functions to simplify the task. plot_echograms plots the echograms of the 3 frequencies as a column of subplots, extracting the data for each frequency by using xarray’s isel index selector method, which uses index counts rather than values. As we can see above, frequency_nominal is a coordinate of the Sv (MVBS) DataArray.

    def track_interval_str(trackdt):
        """ Create the timestamp interval title string for a plot column
        """
        track_interval_title_str = (
            f"{trackdt.index.min().strftime('%b-%d %H:%MZ')}"
            f" to {trackdt.index.max().strftime('%b-%d %H:%MZ')}"
        )
        return track_interval_title_str
    
    def plot_echograms(ds, freq_len, column_idx):
        """Plot echograms of the 3 frequencies for xarray dataset ds,
           as a column of subplots"""
        for f in range(freq_len):
            ax = axes[f][column_idx]
            # Select Sv data by frequency using the frequency_nominal coordinate index "f",
            # then plot the echogram of the selected data
            ds.Sv.isel(frequency_nominal=f).plot(
                ax=ax, 
                x='ping_time',
                y='echo_range',
                yincrease=False,
                vmin=-80,
                vmax=-50,
            )
            if f < 2:
                ax.set_xlabel(None);
    freq_len = len(MVBS_ds.frequency_nominal)
    
    fig, axes = plt.subplots(nrows=freq_len, ncols=2, constrained_layout=True, figsize=(16, 16))
    
    fig.suptitle(
        (f"    South Track, {track_interval_str(tracksouth_gps_gdf)}"
         "                                         "
         f"North Track, {track_interval_str(tracknorth_gps_gdf)}"),
        fontsize=16)
    
    plot_echograms(tracksouth_MVBS_ds, freq_len, column_idx=0) # left column
    plot_echograms(tracknorth_MVBS_ds, freq_len, column_idx=1) # right column

    Package versions

    import datetime, s3fs
    print(f"echopype: {ep.__version__}, xarray: {xr.__version__}, geopandas: {gpd.__version__}, "
          f"fsspec: {fsspec.__version__}, s3fs: {s3fs.__version__}")
    
    print(f"\n{datetime.datetime.utcnow()} +00:00")
    echopype: 0.8.4, xarray: 2023.12.0, geopandas: 0.14.2, fsspec: 2023.12.2, s3fs: 2023.12.2
    
    2024-04-26 15:56:06.174336 +00:00