ERA5 validation

[1]:
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
from wind_validation import validate, create_report

Using the Tall tower dataset and ERA5 data

A nice source of data for validation of wind simulations is the Tall Tower dataset (see link at the end of the page for the paper). It contains data from more than 200 tall meteorological masts around the world that have been quality checked. In this example we will compare these data with the global reanalysis from ERA5. The ERA5 data have been generated by running the numerical weather prediction model from the ECMWF in hindcast mode using data assimilation on the historical data from many different data sources, including from satellites and routine weather stations. Due to it’s global spatial coverage and decadal time coverage the model has a rather coarse grid spacing of about ~25 km.

To make it easy to run this package without external depencies, we included the ERA5 data and a site from the tall tower dataset. Here, the datasets obs and mod were obtained by installing the tatoda package and running the following code:

import tatoda
obs = tatoda.read_data('Chinook')
obs = obs.sel(time='2010')
mod = tatoda.read_data('Chinook',dataset="ERA5")
mod = tatoda.make_era5_as_obs(mod, obs)

The result of this is that we have modelled time series from the ERA5 data and the observations from the Tall tower dataset at the Chinook site in the west of the US. We will open these by using open_dataset:

[2]:
mod = xr.open_dataset("data/ts_mod.nc")
obs = xr.open_dataset("data/ts_obs.nc")

For a fair validation we want to compare the same time stamps from model and observations. Therefore we first have to ensure that the sampling output are the same. Afterwards we also have to make sure that both wind speed and wind direction are set to ‘not available’ where either of them were not reporting any values. Also we have to select the time stamp from the model output where observations are available. This is all already done above in the call make_era5_as_obs. For more example how to do this, see the first example notebook RUNE_validation.ipynb.

[3]:
mod["wind_speed"] = mod["wind_speed"].where(~np.isnan(obs.wind_speed) & ~np.isnan(obs.wind_direction))
mod = mod.dropna(dim="time")
obs = obs.dropna(dim="time")

We can now plot the data to get an overview of the time series

[4]:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))
obs['wind_speed'].plot(ax=ax1)
mod['wind_speed'].plot(ax=ax1)
obs['wind_direction'].plot(ax=ax2)
mod['wind_direction'].plot(ax=ax2)
[4]:
[<matplotlib.lines.Line2D at 0x7f513421b280>]
_images/ERA5_validation_7_1.png

Inspect the model & observation datasets:

[5]:
print(mod)
<xarray.Dataset>
Dimensions:         (height: 1, time: 51758)
Coordinates:
    south_north     float64 ...
    west_east       float64 ...
    pt              object ...
    crs             int8 ...
  * height          (height) int32 50
  * time            (time) datetime64[ns] 2010-01-01T00:05:00.000008940 ... 2...
Data variables:
    wind_speed      (time, height) float64 1.251 0.76 0.2764 ... 5.621 5.552
    wind_direction  (time, height) float64 14.95 12.3 0.2317 ... 203.6 204.2
[6]:
print(obs)
<xarray.Dataset>
Dimensions:         (time: 51758)
Coordinates:
  * time            (time) datetime64[ns] 2010-01-01T00:05:00.000008940 ... 2...
    west_east       float32 ...
    south_north     float32 ...
    height          float32 ...
    crs             int8 ...
Data variables:
    wind_speed      (time) float32 3.631 3.576 3.51 3.428 ... 2.542 2.175 2.325
    wind_direction  (time) float32 278.4 265.5 251.5 252.0 ... 208.6 232.0 230.9
Attributes:
    tower_name:       chinook
    institution:      Bonneville Power Administration
    boom_directions:  Not available
    location:         US
    offshore:         no
    tower_type:       Met mast
    creation_time:    2019-06-07-T15:03:26Z
    links:            https://transmission.bpa.gov/BUSINESS/OPERATIONS/WIND/M...
    history:

We now input the two time series and perform the validation according to the diagram.

Drawing

The actual validation is done by calling validate function on the two time series objects that have been create above.

[7]:
result = validate(obs, mod)
print(result)
<xarray.Dataset>
Dimensions:                   (height: 1)
Coordinates:
    west_east                 float32 ...
    south_north               float32 ...
  * height                    (height) int32 50
    crs                       int8 ...
    pt                        object ...
Data variables: (12/27)
    obs_wind_speed_mean       float32 4.406
    obs_wind_speed_std        float64 3.322
    obs_wind_direction_cmean  float64 236.7
    obs_wind_direction_cstd   float64 79.65
    obs_power_density_mean    float32 162.3
    obs_air_density_mean      float32 1.225
    ...                        ...
    wind_direction_cmae       (height) float64 45.5
    power_density_me          (height) float64 -32.31
    power_density_mpe         (height) float64 -19.9
    power_density_mae         (height) float64 32.31
    power_density_mape        (height) float64 19.9
    air_density_mpe           (height) float64 -1.946e-06
Attributes:
    tower_name:       chinook
    institution:      Bonneville Power Administration
    boom_directions:  Not available
    location:         US
    offshore:         no
    tower_type:       Met mast
    creation_time:    2019-06-07-T15:03:26Z
    links:            https://transmission.bpa.gov/BUSINESS/OPERATIONS/WIND/M...
    history:
    format:           timeseries

We can make a sector variable by calling the function wd_to_sector. After that we can also perform sectorwise validation by grouping the variable according to sector.

[8]:
from windkit.wind import wd_to_sector
sector = wd_to_sector(obs['wind_direction'])
sector.name = 'sector'
result_sec = validate(obs, mod, by=sector)
print(result_sec)
<xarray.Dataset>
Dimensions:                   (height: 1, sector: 12)
Coordinates:
  * sector                    (sector) float64 0.0 1.0 2.0 3.0 ... 9.0 10.0 11.0
    west_east                 float32 -119.5
    south_north               float32 45.83
  * height                    (height) int32 50
    crs                       int8 0
    pt                        <U7 'Chinook'
Data variables: (12/27)
    obs_wind_speed_mean       (sector) float32 2.174 2.999 2.951 ... 1.932 1.722
    obs_wind_speed_std        (sector) float64 1.623 1.924 1.905 ... 1.439 1.145
    obs_wind_direction_cmean  (sector) float64 1.432 30.26 59.21 ... 299.1 329.8
    obs_wind_direction_cstd   (sector) float64 8.892 8.517 8.722 ... 8.527 8.512
    obs_power_density_mean    (sector) float32 20.71 40.05 39.48 ... 16.56 8.971
    obs_air_density_mean      (sector) float32 1.225 1.225 1.225 ... 1.225 1.225
    ...                        ...
    wind_direction_cmae       (sector, height) float64 69.43 51.82 ... 76.82
    power_density_me          (sector, height) float64 1.359 -12.55 ... 10.3
    power_density_mpe         (sector, height) float64 6.561 -31.32 ... 114.8
    power_density_mae         (sector, height) float64 1.359 12.55 ... 10.3
    power_density_mape        (sector, height) float64 6.561 31.32 ... 114.8
    air_density_mpe           (sector, height) float64 -1.946e-06 ... -1.946e-06
Attributes:
    tower_name:       chinook
    institution:      Bonneville Power Administration
    boom_directions:  Not available
    location:         US
    offshore:         no
    tower_type:       Met mast
    creation_time:    2019-06-07-T15:03:26Z
    links:            https://transmission.bpa.gov/BUSINESS/OPERATIONS/WIND/M...
    history:
    format:           timeseries

Validation at all sites

We can also validate the whole Tall tower dataset, instead of only the Chinook site. Because this scripts downloads large amount of data we have already included the .nc file here. You can check the script validate_era5_using_tatoda.py in the examples directory if you want to know how you can make this dataset yourself. The file era5_vs_tatoda_time_series.nc contains the results from all sites from the Tall tower dataset. It is obtained in the same way as for the single site, but instead calling the validate() method on many different sites.

[9]:
result = xr.open_dataset("data/era5_vs_tatoda_time_series.nc")
print(result)
<xarray.Dataset>
Dimensions:                                        (point: 464)
Coordinates:
    south_north                                    (point) float64 ...
    west_east                                      (point) float64 ...
    crs                                            int8 ...
    height                                         (point) float64 ...
    name                                           (point) object ...
Dimensions without coordinates: point
Data variables: (12/47)
    obs_wind_speed_min                             (point) float32 ...
    obs_wind_speed_max                             (point) float32 ...
    obs_wind_speed_mean                            (point) float32 ...
    obs_wind_speed_median                          (point) float32 ...
    obs_wind_speed_std                             (point) float32 ...
    obs_wind_speed_var                             (point) float32 ...
    ...                                             ...
    power_density_mape                             (point) float64 ...
    air_dens_mpe                                   (point) float64 ...
    wind_direction_cme                             (point) float64 ...
    wind_direction_cmae                            (point) float64 ...
    wind_direction_crmse                           (point) float64 ...
    wind_direction_circular_median_absolute_error  (point) float64 ...
Attributes:
    tower_name:       42361
    institution:      Shell International E&P
    boom_directions:  Not available
    location:         US
    offshore:         yes
    tower_type:       Petrol Platform
    creation_time:    2019-06-07-T09:32:12Z
    links:            http://ndbc-load.nws.noaa.gov/station_page.php?station=...
    history:
    format:           timeseries

To create a report we can call create_report. This will open an html file in your browser where you can study the comparison in more detail.

[10]:
create_report(result, obs_name="Observations", mod_name="Model")
HTML report saved in: /tmp/tmpkrcdw89o.html

A couple of interesting things can be observed: the ERA5 data largely underestimate the wind speeds for many of the sites. This can be related to the course resolution (~25 km) of the ERA5 data. It can therefore not model microscale effects, that might be relevant at many of the sites in the Tall tower dataset, because masts are frequently placed on hill tops or mountains. In the report you can see all the sites on a map and you can click on each of the sites to explore the errors. Plots of the error distributions are available as well.

Finally, we can group the data by any variably by using the groupby functionality from xarray. For example, we can create a discrete variable from any properties of the dataset and use that to group all the error metrics and statistics. Below we stratify the dataset by different bins of height to investigate model error as a function of height. For each group, we compute the mean and standard deviation in each bin and create a plot. It can be seen that the errors are usually largest close to the surface, where microscale effects that the ERA5 model cannot resolve, play the largest role.

[11]:

bins = np.linspace(0, 140, 15) centers = 0.5 * (bins[1:] + bins[:-1]) labels = [f"({l},{h}]" for l, h in zip(bins[:-1], bins[1:])] grps = result.groupby_bins(result["height"], bins, labels=labels) fig, ax = plt.subplots(1, 1, figsize=(8, 4)) ax.scatter(result["height"], result["wind_speed_mae"], color="C0") ax.errorbar( centers, grps.mean(skipna=True)["wind_speed_mae"], yerr=grps.std(skipna=True)["wind_speed_mae"], fmt="o", markersize=8, capsize=10, color="k", ) ax.set(xlabel="Height agl (m)", ylabel="Wind speed MAE [m/s]") plt.show()
_images/ERA5_validation_21_0.png

References:

Ramon, J., Lledó, L., Pérez-Zanón, N., Soret, A., and Doblas-Reyes, F. J.: The Tall Tower Dataset: a unique initiative to boost wind energy research, Earth Syst. Sci. Data, 12, 429–439, https://doi.org/10.5194/essd-12-429-2020, 2020.

Tall tower website, https://talltowers.bsc.es/ (accessed on 04-03-2022)

Hersbach, H, Bell, B, Berrisford, P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999– 2049. https://doi.org/10.1002/qj.3803