Download Copernicus ERA5 Data with S3 without logging in

Written by Minh Phan

In this tutorial, you will Download Copernicus ERA5 Data with S3 without logging in. Copernicus ERA5 is one of the most well-known reanalysis datasets on modern climate, providing a numerical assessment of the modern climate. Although we mentioned previously that streaming data in S3 is time-consuming if you’re not in the local region, we had lots of luck using this dataset to get data quickly and seamlessly without much additional coding (slicing data temporally) as S3 streaming can handle big requests efficiently. Most of the codes we wrote in this notebook are modified from the original notebook here

Variables

The table below lists the 18 ERA5 variables that are available on S3. All variables are surface or single level parameters sourced from the HRES sub-daily forecast stream.

Variable Name File Name Variable type (fc/an)
10 metre U wind component eastward_wind_at_10_metres.nc an
10 metre V wind component northward_wind_at_10_metres.nc an
100 metre U wind component eastward_wind_at_100_metres.nc an
100 metre V wind component northward_wind_at_100_metres.nc an
2 metre dew point temperature dew_point_temperature_at_2_metres.nc an
2 metre temperature air_temperature_at_2_metres.nc an
2 metres maximum temperature since previous post-processing air_temperature_at_2_metres_1hour_Maximum.nc fc
2 metres minimum temperature since previous post-processing air_temperature_at_2_metres_1hour_Minimum.nc fc
Mean sea level pressure air_pressure_at_mean_sea_level.nc an
Sea surface temperature sea_surface_temperature.nc an
Mean wave period sea_surface_wave_mean_period.nc
Mean direction of waves sea_surface_wave_from_direction.nc
Significant height of combined wind waves and swell significant_height_of_wind_and_swell_waves.nc
Snow density snow_density.nc an
Snow depth lwe_thickness_of_surface_snow_amount.nc an
Surface pressure surface_air_pressure.nc an
Surface solar radiation downwards integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation.nc fc
Total precipitation precipitation_amount_1hour_Accumulation.nc fc

For my dataset, we collect air temperature (at 2m), sea surface temperature, and u and v wind components so that we can compute speed and direction later.

Import necessary libraries

import boto3
import botocore
import datetime
import matplotlib.pyplot as plt
import os
import xarray as xr
import numpy as np
import pandas as pd
import sys

Download data

era5_bucket = 'era5-pds'
client = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED))
def download_era5_s3(var_era5, month_start, month_end, lat1=5, lat2=25, lon1=60, lon2=80):
    """
    var_era5: variable name
    month_start: formatted as YYYY-MM 
    month_end: formatted as YYYY-MM (right-exclusive)
    """
    s3_data_ptrn = '{year}/{month}/data/{var}.nc'
    
    path_temp_folder = 'demonstrated data/era5/temp'
    path_var_folder = f'demonstrated data/era5/{var_era5}'
    if not os.path.exists(path_temp_folder):
        os.makedirs(path_temp_folder)
    if not os.path.exists(path_var_folder):
        os.makedirs(path_var_folder)
    
    data_file_ptrn = os.path.join(path_temp_folder,'{year}{month}_{var}.nc')
    sliced_data_file_ptrn = os.path.join(path_var_folder, '{year}{month}_{var}.nc')
    months = pd.date_range(month_start, month_end, freq='M')
    for month in months:
        s3_data_key = s3_data_ptrn.format(year=month.year, month="{:02d}".format(month.month), var=var_era5)
        data_file = data_file_ptrn.format(year=month.year, month="{:02d}".format(month.month), var=var_era5)
        if not os.path.isfile(data_file): # check if file already exists
            print("Downloading %s from S3..." % s3_data_key)
            client.download_file(era5_bucket, s3_data_key, data_file)

        export_file = sliced_data_file_ptrn.format(year = month.year, month = "{:02d}".format(month.month), var= var_era5)
        xr.open_dataset(data_file).sel(lat=slice(lat2, lat1), lon=slice(lon1, lon2)).to_netcdf(export_file)
        os.remove(data_file)
# download data for 4 variables we need 
# consult available names in the table above in the file name column (remove .nc)

# month_end is not included in dataset (right-exclusive)
download_era5_s3(var_era5='eastward_wind_at_10_metres', month_start='2003-01', month_end='2003-03')
download_era5_s3(var_era5='northward_wind_at_10_metres', month_start='2003-01', month_end='2003-03')
Downloading 2003/01/data/eastward_wind_at_10_metres.nc from S3...
Downloading 2003/02/data/eastward_wind_at_10_metres.nc from S3...
Downloading 2003/01/data/northward_wind_at_10_metres.nc from S3...
Downloading 2003/02/data/northward_wind_at_10_metres.nc from S3...