from pathlib import Path
import fsspec
import numpy as np
import geopandas as gpd
import xarray as xr
import matplotlib.pyplot as plt
from shapely.geometry import box
import cartopy.crs as ccrs
import cartopy.io.img_tiles as cimgt
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import echopype as ep
from echopype.qc import exist_reversed_time
import warnings
"ignore", category=DeprecationWarning) warnings.simplefilter(
Exploring ship echosounder data from the Pacific Hake survey
Jupyter notebook accompanying the manuscript:
Echopype: A Python library for interoperable and scalable processing of ocean sonar data for biological information
Authors: Wu-Jung Lee, Emilio Mayorga, Landung Setiawan, Kavin Nguyen, Imran Majeed, Valentina Staneva
Introduction
Goals
- Illustrate a common workflow for echosounder data conversion, calibration and use. This workflow leverages the standardization applied by echopype and the power, ease of use and familiarity of libraries in the scientific Python ecosystem.
- Extract and visualize data with relative ease using geospatial and temporal filters.
Description
This notebook uses EK60 echosounder data collected during the 2017 Joint U.S.-Canada Integrated Ecosystem and Pacific Hake Acoustic Trawl Survey (‘Pacific Hake Survey’) to illustrate a common workflow for data conversion, calibration and analysis using echopype
and core scientific Python software packages, particularly xarray
, GeoPandas
, pandas
and NumPy
.
Two days of cloud-hosted .raw
data files are accessed by echopype directly from an Amazon Web Services (AWS) S3 “bucket” maintained by the NOAA NCEI Water-Column Sonar Data Archive. The total data used are 170 .raw
files at approximately 25 MB each (1 Hz pinging rate from first light to dusk), corresponding to approximately 4.2 GB. With echopype
, each file is converted to a standardized representation based on the SONAR-netCDF4 v1.0 convention and saved to the cloud-optimized Zarr format.
Data stored in the netCDF-based SONAR-netCDF4 convention can be conveniently and intuitively manipulated with xarray
in combination with related scientific Python packages. Mean Volume Backscattering Strength (MVBS) is computed with echopype
from each raw data file and exported to a netCDF file. Here, we define two geographical bounding boxes encompassing two ship tracks and use these to extract corresponding timestamp intervals from the GPS data, and then the corresponding MVBS data based on those intervals. Finally, these extracted MVBS subsets are plotted as track echograms.
Outline
- Establish AWS S3 file system connection and generate list of target EK60
.raw
files - Process S3-hosted raw files with
echopype
: convert, calibrate and export to standardized files - Extract and process GPS locations from the
Platform
group of converted raw files - Read MVBS and plot track echograms for time periods corresponding to two ship tracks
Running the notebook
This notebook can be run with a conda environment created using the conda environment file https://github.com/OSOceanAcoustics/echopype-examples/blob/main/binder/environment.yml. The notebook creates two directories, if not already present: ./exports/hakesurvey_convertedzarr
and ./exports/hakesurvey_calibratednc
. netCDF and Zarr files will be exported there.
Note
We encourage importing echopype
as ep
for consistency.
Establish AWS S3 file system connection and generate list of target EK60 .raw
files
Access and inspect the publicly accessible NCEI WCSD S3 bucket on the AWS cloud as if it were a local file system. This will be done through the Python fsspec file system and bytes storage interface. We will use fsspec.filesystem.glob
(fs.glob
) to generate a list of all EK60 .raw
data files in the bucket, then filter on file names for target dates of interest.
The directory path on the ncei-wcsd-archive S3 bucket is s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/
. All .raw
files from the 2017 Hake survey cruise are found here.
= fsspec.filesystem('s3', anon=True)
fs
= "ncei-wcsd-archive"
bucket = "data/raw/Bell_M._Shimada/SH1707/EK60" rawdirpath
= fs.glob(f"{bucket}/{rawdirpath}/*.raw")
s3rawfiles
# print out the last two S3 raw file paths in the list
-2:] s3rawfiles[
['ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170913-T180733.raw',
'ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Winter2017-D20170615-T002629.raw']
Generate list of target EK60 .raw
files from AWS S3 bucket based on dates. The dates are found in the middle string token (e.g., “D20170913”). Select files from 2 days, 2017-07-28 and 2017-07-29.
= [
s3rawfiles for s3path in s3rawfiles
s3path if any([f"D2017{datestr}" in s3path for datestr in ['0728', '0729']])
]
print(f"There are {len(s3rawfiles)} target raw files available")
There are 170 target raw files available
Process S3-hosted raw files with echopype
: convert, calibrate and export to standardized files
Loop through all the selected raw files on S3 and convert, calibrate and generate Mean Volume Backscattering Strength (MVBS). Save the raw converted and MVBS data to local files, as zarr and netCDF, respectively.
def populate_metadata(ed, raw_fname):
"""
Manually populate into the "ed" EchoData object
additional metadata about the dataset and the platform
"""
# -- SONAR-netCDF4 Top-level Group attributes
= (
survey_name "2017 Joint U.S.-Canada Integrated Ecosystem and "
"Pacific Hake Acoustic Trawl Survey ('Pacific Hake Survey')"
)'Top-level'].attrs['title'] = f"{survey_name}, file {raw_fname}"
ed['Top-level'].attrs['summary'] = (
ed[f"EK60 raw file {raw_fname} from the {survey_name}, converted to a SONAR-netCDF4 file using echopype."
"Information about the survey program is available at "
"https://www.fisheries.noaa.gov/west-coast/science-data/"
"joint-us-canada-integrated-ecosystem-and-pacific-hake-acoustic-trawl-survey"
)
# -- SONAR-netCDF4 Platform Group attributes
# Per SONAR-netCDF4, for platform_type see https://vocab.ices.dk/?ref=311
'Platform'].attrs['platform_type'] = "Research vessel"
ed['Platform'].attrs['platform_name'] = "Bell M. Shimada" # A NOAA ship
ed['Platform'].attrs['platform_code_ICES'] = "315" ed[
Create the directories where the exported files will be saved, if these directories don’t already exist.
= Path('./exports/notebook2')
base_dpath =True, parents=True)
base_dpath.mkdir(exist_ok
= Path(base_dpath / 'hakesurvey_convertedzarr')
converted_dpath =True)
converted_dpath.mkdir(exist_ok= (base_dpath / 'hakesurvey_calibratednc')
calibrated_dpath =True) calibrated_dpath.mkdir(exist_ok
echopype processing
EchoData
is an echopype object for conveniently handling raw converted data from either raw instrument files or previously converted and standardized raw netCDF4 and Zarr files. It is essentially a container for multiple xarray.Dataset
objects, each corresponds to one of the netCDF4 groups specified in the SONAR-netCDF4 convention – the convention followed by echopype. The EchoData
object can be used to conveniently accesse and explore the echosounder raw data and for calibration and other processing.
The cell below contains the main echopype workflow steps. For each raw file: - Access file directly from S3 via ep.open_raw
to create a converted EchoData
object in memory - Add global and platform attributes to EchoData
object - Export to a local Zarr dataset (a collection of files encapsulated in a directory) - Generate calibrated Sv
and then MVBS
from the raw data in the EchoData
object - Export MVBS
to a local netcdf file
Note: Depending on your internet speed, this cell may take some time to run (potentially 20-30 mins).
%%time
for s3rawfpath in s3rawfiles:
= Path(s3rawfpath)
raw_fpath try:
# Access file directly from S3 to create a converted EchoData object in memory
= ep.open_raw(
ed f's3://{s3rawfpath}',
='EK60',
sonar_model={'anon': True}
storage_options
)# Manually populate additional metadata about the dataset and the platform
populate_metadata(ed, raw_fpath.name)
# Save to converted Zarr format
=converted_dpath, overwrite=True)
ed.to_zarr(save_path
# Use the EchoData object "ed" to generate calibrated and
# computed MVBS files that will be saved to netcdf
= ep.calibrate.compute_Sv(ed)
ds_Sv = ep.commongrid.compute_MVBS(
ds_MVBS
ds_Sv,='5m', # in meters
range_bin='20s', # in seconds
ping_time_bin
)/ f'MVBS_{raw_fpath.stem}.nc')
ds_MVBS.to_netcdf(calibrated_dpath except Exception as e:
print(f'Failed to process raw file {raw_fpath.name}: {e}')
/Users/wujung/miniconda3/envs/echopype_examples_v084/lib/python3.10/site-packages/xarray/core/duck_array_ops.py:215: RuntimeWarning: invalid value encountered in cast
return data.astype(dtype, **kwargs)
/Users/wujung/miniconda3/envs/echopype_examples_v084/lib/python3.10/site-packages/xarray/core/duck_array_ops.py:215: RuntimeWarning: invalid value encountered in cast
return data.astype(dtype, **kwargs)
/Users/wujung/miniconda3/envs/echopype_examples_v084/lib/python3.10/site-packages/xarray/core/duck_array_ops.py:215: RuntimeWarning: invalid value encountered in cast
return data.astype(dtype, **kwargs)
CPU times: user 3min 18s, sys: 53.2 s, total: 4min 11s
Wall time: 9min
Test for time reversals
Small time reversals are found in EK60 datasets, including the 2017 Pacific Hake survey, where the ping_time
(or GPS time1
) value may be lower (older) than the preceding ping_time
by a second. Such discontinuities can interfere with concatenating individual raw files to produce an aggregated dataset. The capability to identify and address these reversals is in the echopype.qc
subpackage.
for datapath in converted_dpath.glob('*'):
= ep.open_converted(datapath)
ed # Test for a negative ping_time increment in sequential timestamps, in the Sonar/Beam_group1 group
if exist_reversed_time(ds=ed['Sonar/Beam_group1'], time_name='ping_time'):
print(f'Reversed time in {datapath}')
There are no time reversals in this two-day dataset, fortunately.
Examine the EchoData object for one of the data files
echopype provides a user-friendly, convenient representation of an EchoData
object that leverages the user-friendly xarray Dataset
HTML representation. Since an EchoData
object is effectively a container for multiple xarray.Dataset
objects corresponding to netCDF4 groups, the notebook “print out” provides a summary view of all the groups and interactive access to summaries of each group.
Here, ed
is the last object opened in the time reversal test, in the preceding cell.
ed
-
<xarray.Dataset> Size: 0B Dimensions: () Data variables: *empty* Attributes: conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3 date_created: 2017-07-28T15:14:34Z keywords: EK60 processing_level: Level 1A processing_level_url: https://echopype.readthedocs.io/en/stable/pr... sonar_convention_authority: ICES sonar_convention_name: SONAR-netCDF4 sonar_convention_version: 1.0 summary: EK60 raw file Summer2017-D20170728-T151434.r... title: 2017 Joint U.S.-Canada Integrated Ecosystem ...
-
<xarray.Dataset> Size: 30kB Dimensions: (channel: 3, time1: 534) Coordinates: * channel (channel) <U37 444B 'GPT 18 kHz 009072058c8d 1-1... * time1 (time1) datetime64[ns] 4kB 2017-07-28T15:14:34.69... Data variables: absorption_indicative (channel, time1) float64 13kB ... frequency_nominal (channel) float64 24B ... sound_speed_indicative (channel, time1) float64 13kB ...
-
<xarray.Dataset> Size: 77kB Dimensions: (channel: 3, time1: 1659, time2: 534) Coordinates: * channel (channel) <U37 444B 'GPT 18 kHz 009072058c8d 1-1 ES... * time1 (time1) datetime64[ns] 13kB 2017-07-28T15:14:36.2129... * time2 (time2) datetime64[ns] 4kB 2017-07-28T15:14:34.69374... Data variables: (12/20) MRU_offset_x float64 8B ... MRU_offset_y float64 8B ... MRU_offset_z float64 8B ... MRU_rotation_x float64 8B ... MRU_rotation_y float64 8B ... MRU_rotation_z float64 8B ... ... ... sentence_type (time1) <U3 20kB ... transducer_offset_x (channel) float64 24B ... transducer_offset_y (channel) float64 24B ... transducer_offset_z (channel) float64 24B ... vertical_offset (time2) float64 4kB ... water_level float64 8B ... Attributes: platform_code_ICES: 315 platform_name: Bell M. Shimada platform_type: Research vessel
-
<xarray.Dataset> Size: 5MB Dimensions: (time1: 17079) Coordinates: * time1 (time1) datetime64[ns] 137kB 2017-07-28T15:14:34.693743 ..... Data variables: NMEA_datagram (time1) <U73 5MB ... Attributes: description: All NMEA sensor datagrams
-
<xarray.Dataset> Size: 376B Dimensions: (filenames: 1) Coordinates: * filenames (filenames) int64 8B 0 Data variables: source_filenames (filenames) <U92 368B ... Attributes: conversion_software_name: echopype conversion_software_version: 0.8.4 conversion_time: 2024-04-25T18:02:09Z
-
<xarray.Dataset> Size: 568B Dimensions: (beam_group: 1) Coordinates: * beam_group (beam_group) <U11 44B 'Beam_group1' Data variables: beam_group_descr (beam_group) <U131 524B ... Attributes: sonar_manufacturer: Simrad sonar_model: EK60 sonar_serial_number: sonar_software_name: ER60 sonar_software_version: 2.4.3 sonar_type: echosounder
-
<xarray.Dataset> Size: 38MB Dimensions: (channel: 3, ping_time: 534, range_sample: 3957) Coordinates: * channel (channel) <U37 444B 'GPT 18 kHz 009072058... * ping_time (ping_time) datetime64[ns] 4kB 2017-07-28T... * range_sample (range_sample) int64 32kB 0 1 2 ... 3955 3956 Data variables: (12/29) angle_alongship (channel, ping_time, range_sample) int8 6MB ... angle_athwartship (channel, ping_time, range_sample) int8 6MB ... angle_offset_alongship (channel) float64 24B ... angle_offset_athwartship (channel) float64 24B ... angle_sensitivity_alongship (channel) float64 24B ... angle_sensitivity_athwartship (channel) float64 24B ... ... ... transmit_bandwidth (channel, ping_time) float64 13kB ... transmit_duration_nominal (channel, ping_time) float64 13kB ... transmit_frequency_start (channel) float64 24B ... transmit_frequency_stop (channel) float64 24B ... transmit_power (channel, ping_time) float64 13kB ... transmit_type <U2 8B ... Attributes: beam_mode: vertical conversion_equation_t: type_3
-
<xarray.Dataset> Size: 868B Dimensions: (channel: 3, pulse_length_bin: 5) Coordinates: * channel (channel) <U37 444B 'GPT 18 kHz 009072058c8d 1-1 ES18... * pulse_length_bin (pulse_length_bin) int64 40B 0 1 2 3 4 Data variables: frequency_nominal (channel) float64 24B ... gain_correction (channel, pulse_length_bin) float64 120B ... pulse_length (channel, pulse_length_bin) float64 120B ... sa_correction (channel, pulse_length_bin) float64 120B ...
Extract and process GPS locations from the Platform
group of converted raw files
Use xarray.open_mfdataset
to open the Platform
group from all the converted raw netcdf files as a single concatenated (combined) xarray
dataset. Then extract GPS time1
(time stamp), latitude
and longitude
from this group and transform that data into a GeoPandas GeoDataFrame
containing point-geometry objects that are readily manipulated via geospatial operations. A GeoDataFrame
adds geospatial capabilities to a Pandas DataFrame
.
Due to the presence of multiple time coordinates in this group, care must be taken in defining how the concatenation (combine) operation is to be performed. This is captured in the arguments passed to open_mfdataset
.
%%time
= xr.open_mfdataset(
platform_ds str(converted_dpath / '*.zarr'), group='Platform',
='zarr',
engine='minimal', coords='minimal',
data_vars='nested'
combine )
CPU times: user 32.5 s, sys: 1.2 s, total: 33.7 s
Wall time: 34.6 s
platform_ds
<xarray.Dataset> Size: 11MB Dimensions: (channel: 3, time1: 244846, time2: 88959) Coordinates: * channel (channel) <U37 444B 'GPT 18 kHz 009072058c8d 1-1 ES... * time1 (time1) datetime64[ns] 2MB 2017-07-28T00:05:36.10331... * time2 (time2) datetime64[ns] 712kB 2017-07-28T00:05:34.897... Data variables: (12/20) MRU_offset_x float64 8B nan MRU_offset_y float64 8B nan MRU_offset_z float64 8B nan MRU_rotation_x float64 8B nan MRU_rotation_y float64 8B nan MRU_rotation_z float64 8B nan ... ... sentence_type (time1) object 2MB dask.array<chunksize=(244846,), meta=np.ndarray> transducer_offset_x (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> transducer_offset_y (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> transducer_offset_z (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> vertical_offset (time2) float64 712kB dask.array<chunksize=(88959,), meta=np.ndarray> water_level float64 8B 9.15 Attributes: platform_code_ICES: 315 platform_name: Bell M. Shimada platform_type: Research vessel
- channelPandasIndex
PandasIndex(Index(['GPT 18 kHz 009072058c8d 1-1 ES18-11', 'GPT 38 kHz 009072058146 2-1 ES38B', 'GPT 120 kHz 00907205a6d0 4-1 ES120-7C'], dtype='object', name='channel'))
- time1PandasIndex
PandasIndex(DatetimeIndex(['2017-07-28 00:05:36.103315', '2017-07-28 00:05:37.511097', '2017-07-28 00:05:37.669428', '2017-07-28 00:05:37.948589', '2017-07-28 00:05:39.000582', '2017-07-28 00:05:39.158913', '2017-07-28 00:05:40.353559', '2017-07-28 00:05:41.152784', '2017-07-28 00:05:41.311115', '2017-07-28 00:05:42.003654', ... '2017-07-30 00:17:52.538342', '2017-07-30 00:17:53.276962', '2017-07-30 00:17:53.926178', '2017-07-30 00:17:54.084509', '2017-07-30 00:17:55.277079', '2017-07-30 00:17:55.924213', '2017-07-30 00:17:56.082544', '2017-07-30 00:17:57.274191', '2017-07-30 00:17:59.223482', '2017-07-30 00:17:59.381813'], dtype='datetime64[ns]', name='time1', length=244846, freq=None))
- time2PandasIndex
PandasIndex(DatetimeIndex(['2017-07-28 00:05:34.897272', '2017-07-28 00:05:37.162401', '2017-07-28 00:05:38.404472', '2017-07-28 00:05:40.670601', '2017-07-28 00:05:42.906729', '2017-07-28 00:05:45.152857', '2017-07-28 00:05:47.387985', '2017-07-28 00:05:49.643114', '2017-07-28 00:05:51.899244', '2017-07-28 00:05:54.145372', ... '2017-07-30 00:17:36.490026', '2017-07-30 00:17:38.716152', '2017-07-30 00:17:40.951281', '2017-07-30 00:17:43.178410', '2017-07-30 00:17:45.393536', '2017-07-30 00:17:47.619663', '2017-07-30 00:17:49.835791', '2017-07-30 00:17:52.050917', '2017-07-30 00:17:54.266043', '2017-07-30 00:17:56.492170'], dtype='datetime64[ns]', name='time2', length=88959, freq=None))
- platform_code_ICES :
- 315
- platform_name :
- Bell M. Shimada
- platform_type :
- Research vessel