from pathlib import Path
import itertools as it
import datetime as dt
from dateutil import parser as dtparser
import fsspec
import xarray as xr
import matplotlib.pyplot as plt
import hvplot.xarray
import panel
import echopype as ep
import warnings
"ignore", category=DeprecationWarning) warnings.simplefilter(
Watching a solar eclipse using an OOI moored echosounder
Jupyter notebook accompanying the manuscript:
Echopype: A Python library for interoperable and scalable processing of ocean sonar data for biological information
Authors: Wu-Jung Lee, Emilio Mayorga, Landung Setiawan, Kavin Nguyen, Imran Majeed, Valentina Staneva
Introduction
Goals
- Illustrate a common workflow for echosounder data conversion, calibration and use. This workflow leverages the standardization applied by echopype. and the power, ease of use and familiarity of libraries in the scientific Python ecosystem.
- Demonstrate the ease to interoperate echosounder data with those from a different instrument in a single computing environment. Without
echopype
, additional wrangling across more than one software systems is needed to achieve the same visualization and comparison.
Description
This notebook uses EK60 echosounder data from the U.S. Ocean Observatories Initiative (OOI) to illustrate a common workflow for data conversion, combination, calibration and analysis using echopype
, as well as the data interoperability it enables. Without echopype
, additional wrangling across more than one software systems is needed to achieve the same visualization and comparison.
We will use data from the OOI Oregon Offshore Cabled Shallow Profiler Mooring collected on August 20-21, 2017. This was the day before and of a solar eclipse, during which the reduced sunlight affected the regular diel vertical migration (DVM) patterns of marine life. This change was directly observed using the upward-looking echosounder mounted on this mooring platform that happened to be within the totality zone. The effect of the solar eclipse was clearly seen by aligning and comparing the echosounder observations with solar radiation data collected by the Bulk Meteorology Instrument Package located on the nearby Coastal Endurance Oregon Offshore Surface Mooring, also maintained by the OOI.
The data used are 19 .raw
files with a total volume of approximately 1 GB. With echopype
functionality, the raw data files hosted on the OOI Raw Data Archive (an HTTP server) are directly parsed and organized into a standardized representation following in the SONAR-netCDF4 v1.0 convention, and stored to the cloud-optimized Zarr format. The individual converted files are later combined into a single entity that can be easily explored and manipulated.
Outline
- Establish connection with the OOI Raw Data Archive and generate list of target EK60
.raw
files - Process the archived raw files with
echopype
: convert and combine into a single quantity (anEchoData
object) in a standardized format. - Obtain solar radiation data from an OOI Thredds server.
- Plot the echosounder and solar radiation data together to visualize the zooplankton response to a solar eclipse.
Running the notebook
This notebook can be run with a conda environment created using the conda environment file https://github.com/OSOceanAcoustics/echopype-examples/blob/main/binder/environment.yml. The notebook creates a directory ./exports/notebook3
and save all generated Zarr and netCDF files there.
Warning
The compute_MVBS
step in this notebook is not efficient for lazy-loaded data with echopype
version 0.6.3. We plan to address this issue soon.
Note
We encourage importing echopype
as ep
for consistency.
Establish connection with the OOI Raw Data Archive and generate list of target EK60 .raw
files
Access and inspect the publicly accessible OOI Raw Data Archive (an HTTP server) as if it were a local file system. This will be done through the Python fsspec file system and bytes storage interface. We will use fsspec.filesystem.glob
(fs.glob
) to generate a list of all EK60 .raw
data files in the archive then filter on file names for target dates of interest.
= fsspec.filesystem('https') fs
= (
ooi_raw_url "https://rawdata.oceanobservatories.org/files/"
"CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08"
)
Now let’s specify the range of dates we will be pulling data from. Note that the data filenames contain the time information but were recorded at UTC time.
def in_range(raw_file: str, start: dt.datetime, end: dt.datetime) -> bool:
"""Check if file url is in datetime range"""
= Path(raw_file).name
file_name = dtparser.parse(file_name, fuzzy=True)
file_datetime return file_datetime >= start and file_datetime <= end
= dt.datetime(2017, 8, 21, 7, 0)
start_datetime = dt.datetime(2017, 8, 22, 7, 0) end_datetime
On the OOI Raw Data Archive, the monthly folder is further split to daily folders, so we can simply grab data from the desired days.
= [f"{ooi_raw_url}/{day}" for day in range(start_datetime.day, end_datetime.day + 1)] desired_day_urls
desired_day_urls
['https://rawdata.oceanobservatories.org/files/CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08/21',
'https://rawdata.oceanobservatories.org/files/CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08/22']
Grab all raw files within daily folders by using the filesytem glob, just like the Linux glob
.
= it.chain.from_iterable([fs.glob(f"{day_url}/*.raw") for day_url in desired_day_urls]) all_raw_file_urls
= list(filter(
desired_raw_file_urls lambda raw_file: in_range(
raw_file, -dt.timedelta(hours=3), # 3 hour buffer to select files
start_datetime+dt.timedelta(hours=3)
end_datetime
),
all_raw_file_urls
))
print(f"There are {len(desired_raw_file_urls)} raw files within the specified datetime range.")
There are 19 raw files within the specified datetime range.
Process the archived raw files with echopype
Examine the workflow by processing just one file
Let’s first test the echopype
workflow by converting and processing 1 file from the above list.
We will use ep.open_raw
to directly read in a raw data file from the OOI HTTP server.
The type of sonar needs to be specified as an input argument. The echosounders on the OOI Regional Cabled Array are Simrad EK60 echosounder. All other uncabled echosounders are the Acoustic Zooplankton and Fisher Profiler (AZFP) manufacturered by ASL Environmental Sciences. Echopype
supports both of these and other instruments (see echopype documentation for detail).
Converting from raw data files to a standardized data format
Below we already know the path to the 1 file on the http server:
= ep.open_raw(raw_file=desired_raw_file_urls[0], sonar_model="ek60") echodata
Here echopype
read, parse, and convert content of the raw file into memory, and gives you a nice representation of the converted file below as a Python EchoData
object.
echodata
-
<xarray.Dataset> Size: 0B Dimensions: () Data variables: *empty* Attributes: conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3 keywords: EK60 sonar_convention_authority: ICES sonar_convention_name: SONAR-netCDF4 sonar_convention_version: 1.0 summary: title: date_created: 2017-08-21T04:57:17Z
-
<xarray.Dataset> Size: 332kB Dimensions: (channel: 3, time1: 5923) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1... * time1 (time1) datetime64[ns] 47kB 2017-08-21T04:57:17.3... Data variables: absorption_indicative (channel, time1) float64 142kB 0.009785 ... 0.05269 sound_speed_indicative (channel, time1) float64 142kB 1.494e+03 ... 1.49... frequency_nominal (channel) float64 24B 3.8e+04 1.2e+05 2e+05
-
<xarray.Dataset> Size: 190kB Dimensions: (time1: 1, time2: 5923, channel: 3) Coordinates: * time1 (time1) datetime64[ns] 8B 2017-08-21T04:57:17.329287 * time2 (time2) datetime64[ns] 47kB 2017-08-21T04:57:17.3292... * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1 OO... Data variables: (12/20) latitude (time1) float64 8B nan longitude (time1) float64 8B nan sentence_type (time1) float64 8B nan pitch (time2) float64 47kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 roll (time2) float64 47kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 vertical_offset (time2) float64 47kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 ... ... position_offset_y float64 8B nan position_offset_z float64 8B nan transducer_offset_x (channel) float64 24B 0.0 0.0 0.0 transducer_offset_y (channel) float64 24B 0.0 0.0 0.0 transducer_offset_z (channel) float64 24B 0.0 0.0 0.0 frequency_nominal (channel) float64 24B 3.8e+04 1.2e+05 2e+05 Attributes: platform_name: platform_type: platform_code_ICES:
-
<xarray.Dataset> Size: 96B Dimensions: (time1: 1) Coordinates: * time1 (time1) datetime64[ns] 8B 2017-08-21T04:57:17.329287 Data variables: NMEA_datagram (time1) <U22 88B '$SDVLW,0.000,N,0.000,N' Attributes: description: All NMEA sensor datagrams
-
<xarray.Dataset> Size: 484B Dimensions: (filenames: 1) Coordinates: * filenames (filenames) int64 8B 0 Data variables: source_filenames (filenames) <U119 476B 'https://rawdata.oceanobservator... Attributes: conversion_software_name: echopype conversion_software_version: 0.8.4.dev58+gae911de conversion_time: 2024-04-26T18:38:27Z
-
<xarray.Dataset> Size: 568B Dimensions: (beam_group: 1) Coordinates: * beam_group (beam_group) <U11 44B 'Beam_group1' Data variables: beam_group_descr (beam_group) <U131 524B 'contains backscatter power (un... Attributes: sonar_manufacturer: Simrad sonar_model: EK60 sonar_serial_number: sonar_software_name: ER60 sonar_software_version: 2.4.3 sonar_type: echosounder
-
<xarray.Dataset> Size: 229MB Dimensions: (channel: 3, ping_time: 5923, range_sample: 1072) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208d... * ping_time (ping_time) datetime64[ns] 47kB 2017-08-21... * range_sample (range_sample) int64 9kB 0 1 2 ... 1070 1071 Data variables: (12/29) frequency_nominal (channel) float64 24B 3.8e+04 1.2e+05 2e+05 beam_type (channel) int64 24B 0 1 0 beamwidth_twoway_alongship (channel) float64 24B 7.1 7.0 7.0 beamwidth_twoway_athwartship (channel) float64 24B 7.1 7.0 7.0 beam_direction_x (channel) float64 24B nan nan nan beam_direction_y (channel) float64 24B nan nan nan ... ... data_type (channel, ping_time) int8 18kB 1 1 1 ... 1 1 sample_time_offset (channel, ping_time) float64 142kB 0.0 ...... channel_mode (channel, ping_time) int8 18kB 0 0 0 ... 0 0 backscatter_r (channel, ping_time, range_sample) float32 76MB ... angle_athwartship (channel, ping_time, range_sample) float32 76MB ... angle_alongship (channel, ping_time, range_sample) float32 76MB ... Attributes: beam_mode: vertical conversion_equation_t: type_3
-
<xarray.Dataset> Size: 892B Dimensions: (channel: 3, pulse_length_bin: 5) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1 OOI.... * pulse_length_bin (pulse_length_bin) int64 40B 0 1 2 3 4 Data variables: frequency_nominal (channel) float64 24B 3.8e+04 1.2e+05 2e+05 sa_correction (channel, pulse_length_bin) float64 120B 0.0 0.0 ... 0.0 gain_correction (channel, pulse_length_bin) float64 120B 24.0 ... 25.0 pulse_length (channel, pulse_length_bin) float64 120B 0.000256 ... ...
The EchoData
object can be saved to either the netCDF4
or zarr
formats through to_netcdf
or to_zarr
methods.
# Create directory for files generated in this notebook
= Path('./exports/notebook3')
base_dpath =True, parents=True) base_dpath.mkdir(exist_ok
# Create directory for single-raw-file example
= Path(base_dpath / 'onefiletest')
output_dpath =True) output_dpath.mkdir(exist_ok
# Save to netCDF format
=output_dpath, overwrite=True) echodata.to_netcdf(save_path
# Save to zarr format
=output_dpath, overwrite=True) echodata.to_zarr(save_path
Basic echo processing
At present echopype
supports basic processing funcionalities including calibration (from raw instrument data records to volume backscattering strength, \(S_V\)), denoising, and computing mean volume backscattering strength, \(\overline{S_V}\) or \(\text{MVBS}\). The Echodata
object can be passed into various calibrate and preprocessing functions without having to write out any intermediate files.
Here we demonstrate calibration to obtain \(S_V\). For EK60 data, by default the function uses environmental (sound speed and absorption) and calibration parameters stored in the data file. Users can optionally specify other parameter choices.
# Compute volume backscattering strength (Sv) from raw data
= ep.calibrate.compute_Sv(echodata) ds_Sv
The computed Sv
is stored with other variables used in the calibration operation.
ds_Sv
<xarray.Dataset> Size: 305MB Dimensions: (channel: 3, ping_time: 5923, range_sample: 1072, filenames: 1) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208d... * ping_time (ping_time) datetime64[ns] 47kB 2017-08-21... * range_sample (range_sample) int64 9kB 0 1 2 ... 1070 1071 * filenames (filenames) int64 8B 0 Data variables: (12/16) Sv (channel, ping_time, range_sample) float64 152MB ... echo_range (channel, ping_time, range_sample) float64 152MB ... frequency_nominal (channel) float64 24B 3.8e+04 1.2e+05 2e+05 sound_speed (channel, ping_time) float64 142kB 1.494e+... sound_absorption (channel, ping_time) float64 142kB 0.00978... sa_correction (channel, ping_time) float64 142kB 0.0 ...... ... ... angle_sensitivity_alongship (channel) float64 24B 21.9 23.0 23.0 angle_sensitivity_athwartship (channel) float64 24B 21.9 23.0 23.0 beamwidth_alongship (channel) float64 24B 7.1 7.0 7.0 beamwidth_athwartship (channel) float64 24B 7.1 7.0 7.0 source_filenames (filenames) <U119 476B 'https://rawdata.oc... water_level float64 8B 0.0 Attributes: processing_software_name: echopype processing_software_version: 0.8.4.dev58+gae911de processing_time: 2024-04-26T18:38:31Z processing_function: calibrate.compute_Sv
Quickly visualize the result
The default xarray
visualization functions are useful in getting a quick sense of the data.
First replace the channel
dimension and coordinate with the frequency_nominal
variable containing actual frequency values. Note that this step is possible only because there are no duplicated frequencies present.
= ep.consolidate.swap_dims_channel_frequency(ds_Sv) ds_Sv
=200000).plot.pcolormesh(
ds_Sv.Sv.sel(frequency_nominal='ping_time', cmap = 'jet', vmin=-80, vmax=-30
x; )
Note that the vertical axis is range_sample
. This is the bin (or sample) number as recorded in the data. A separate data variable in ds_Sv
contains the physical range (echo_range
) from the transducer in meters. echo_range
has the same dimension as Sv
and may not be uniform across all frequency channels or pings, depending on the echosounder setting during data collection.
Convert multiple files and combine into a single EchoData
object
Now that we verified that echopype
does work for a single file, let’s proceed to process all sonar data from August 20-21, 2017.
First, convert all desired files from the OOI HTTP server to a local directory.
# Create a directory for all converted files
= Path(base_dpath / 'convertallfiles')
output_dpath =True) output_dpath.mkdir(exist_ok
%%time
for raw_file_url in desired_raw_file_urls:
# Read and convert, resulting in echodata object
= ep.open_raw(raw_file=raw_file_url, sonar_model='ek60', use_swap=True)
ed =output_dpath, overwrite=True) ed.to_zarr(save_path
Then, assemble a list of EchoData
object from the converted files. Note that by using chunks={}
, the files are lazy-loaded and only metadata are read into memory, until more operations are executed.
= []
ed_list for converted_file in sorted(output_dpath.glob("*.zarr")):
# ed_list.append(ep.open_converted(converted_file))
={})) ed_list.append(ep.open_converted(converted_file, chunks
Combine all the opened files to a single EchoData
object.
= ep.combine_echodata(ed_list) ed_combined
Below we can inspect this combined EchoData
object just as before.
Expand the Provenance
group to see how the filenames of all 19 files combined are stored in the coordinate variable echodata_filename
, and many more data variables are added to retain information that was previously stored as attributes in the Provenance
group of individual EchoData
objects.
Expand the Sonar/Beam_group1
to see how data variables such as backscatter_r
and angle_alongship
contain chunked dask arrays (that are lazy-loadded), instead of regular numpy arrays (that are fully loaded into memory). This means that the computation needed to combine the EchoData
objects are specified but not yet executed, which is why the previous line ran through so fast. This is also the mechanism that allows dask to perform its magic for out-of-core computation.
ed_combined
-
<xarray.Dataset> Size: 0B Dimensions: () Data variables: *empty* Attributes: conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3 date_created: 2017-08-21T04:57:17Z keywords: EK60 sonar_convention_authority: ICES sonar_convention_name: SONAR-netCDF4 sonar_convention_version: 1.0 summary: title:
-
<xarray.Dataset> Size: 6MB Dimensions: (channel: 3, time1: 109482) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1... * time1 (time1) datetime64[ns] 876kB 2017-08-21T04:57:17.... Data variables: absorption_indicative (channel, time1) float64 3MB dask.array<chunksize=(3, 5923), meta=np.ndarray> frequency_nominal (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> sound_speed_indicative (channel, time1) float64 3MB dask.array<chunksize=(3, 5923), meta=np.ndarray>
-
<xarray.Dataset> Size: 4MB Dimensions: (channel: 3, time1: 19, time2: 109478) Coordinates: * time1 (time1) datetime64[ns] 152B 2017-08-21T04:57:17.3292... * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1 OO... * time2 (time2) datetime64[ns] 876kB 2017-08-21T04:57:17.329... Data variables: (12/20) MRU_offset_x float64 8B nan MRU_offset_y float64 8B nan MRU_offset_z float64 8B nan MRU_rotation_x float64 8B nan MRU_rotation_y float64 8B nan MRU_rotation_z float64 8B nan ... ... transducer_offset_y (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> transducer_offset_z (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> water_level float64 8B 0.0 pitch (time2) float64 876kB dask.array<chunksize=(5923,), meta=np.ndarray> roll (time2) float64 876kB dask.array<chunksize=(5923,), meta=np.ndarray> vertical_offset (time2) float64 876kB dask.array<chunksize=(5923,), meta=np.ndarray> Attributes: platform_code_ICES: platform_name: platform_type:
-
<xarray.Dataset> Size: 2kB Dimensions: (time1: 19) Coordinates: * time1 (time1) datetime64[ns] 152B 2017-08-21T04:57:17.329287 ...... Data variables: NMEA_datagram (time1) <U22 2kB dask.array<chunksize=(1,), meta=np.ndarray> Attributes: description: All NMEA sensor datagrams
-
<xarray.Dataset> Size: 27kB Dimensions: (filenames: 19, echodata_filename: 19) Coordinates: * filenames (filenames) int64 152B 0 1 2 3 ... 15 16 17 18 * echodata_filename (echodata_filename) <U26 2kB 'OOI-D20170821-... Data variables: (12/24) source_filenames (filenames) <U119 9kB dask.array<chunksize=(1,), meta=np.ndarray> conventions (echodata_filename) <U35 3kB 'CF-1.7, SONAR-... date_created (echodata_filename) <U20 2kB '2017-08-21T04:... keywords (echodata_filename) <U4 304B 'EK60' ... 'EK60' sonar_convention_authority (echodata_filename) <U4 304B 'ICES' ... 'ICES' sonar_convention_name (echodata_filename) <U13 988B 'SONAR-netCDF4... ... ... sonar_serial_number (echodata_filename) <U1 76B '' '' '' ... '' '' sonar_software_name (echodata_filename) <U4 304B 'ER60' ... 'ER60' sonar_software_version (echodata_filename) <U5 380B '2.4.3' ... '2.... sonar_type (echodata_filename) <U11 836B 'echosounder' ... beam_mode (echodata_filename) <U8 608B 'vertical' ... ... conversion_equation_t (echodata_filename) <U6 456B 'type_3' ... 't... Attributes: conversion_software_name: echopype conversion_software_version: 0.8.4.dev58+gae911de conversion_time: 2024-04-24T14:55:05Z is_combined: True combination_software_name: echopype combination_software_version: 0.8.4.dev58+gae911de combination_time: 2024-04-26T18:38:47Z
-
<xarray.Dataset> Size: 568B Dimensions: (beam_group: 1) Coordinates: * beam_group (beam_group) <U11 44B 'Beam_group1' Data variables: beam_group_descr (beam_group) <U131 524B dask.array<chunksize=(1,), meta=np.ndarray> Attributes: sonar_manufacturer: Simrad sonar_model: EK60 sonar_serial_number: sonar_software_name: ER60 sonar_software_version: 2.4.3 sonar_type: echosounder
-
<xarray.Dataset> Size: 8GB Dimensions: (channel: 3, ping_time: 109482, range_sample: 1072) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208d... * ping_time (ping_time) datetime64[ns] 876kB 2017-08-2... * range_sample (range_sample) int64 9kB 0 1 2 ... 1070 1071 Data variables: (12/29) angle_alongship (channel, ping_time, range_sample) float64 3GB dask.array<chunksize=(3, 3886, 1072), meta=np.ndarray> angle_athwartship (channel, ping_time, range_sample) float64 3GB dask.array<chunksize=(3, 3886, 1072), meta=np.ndarray> angle_offset_alongship (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> angle_offset_athwartship (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> angle_sensitivity_alongship (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> angle_sensitivity_athwartship (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> ... ... transmit_bandwidth (channel, ping_time) float64 3MB dask.array<chunksize=(3, 5923), meta=np.ndarray> transmit_duration_nominal (channel, ping_time) float64 3MB dask.array<chunksize=(3, 5923), meta=np.ndarray> transmit_frequency_start (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> transmit_frequency_stop (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> transmit_power (channel, ping_time) float64 3MB dask.array<chunksize=(3, 5923), meta=np.ndarray> transmit_type <U2 8B 'CW' Attributes: beam_mode: vertical conversion_equation_t: type_3
-
<xarray.Dataset> Size: 892B Dimensions: (channel: 3, pulse_length_bin: 5) Coordinates: * channel (channel) <U39 468B 'GPT 38 kHz 00907208dd13 5-1 OOI.... * pulse_length_bin (pulse_length_bin) int64 40B 0 1 2 3 4 Data variables: frequency_nominal (channel) float64 24B dask.array<chunksize=(3,), meta=np.ndarray> gain_correction (channel, pulse_length_bin) float64 120B dask.array<chunksize=(3, 5), meta=np.ndarray> pulse_length (channel, pulse_length_bin) float64 120B dask.array<chunksize=(3, 5), meta=np.ndarray> sa_correction (channel, pulse_length_bin) float64 120B dask.array<chunksize=(3, 5), meta=np.ndarray>
- channelPandasIndex
PandasIndex(Index(['GPT 38 kHz 00907208dd13 5-1 OOI.38|200', 'GPT 120 kHz 00907208a0b1 3-1 ES120-7CD', 'GPT 200 kHz 00907208dd13 5-2 OOI38|200'], dtype='object', name='channel'))
- pulse_length_binPandasIndex
PandasIndex(Index([0, 1, 2, 3, 4], dtype='int64', name='pulse_length_bin'))