Proposal and Evaluation of the Machine Learning Models for Correcting ERA5 Stress Equivalent Wind Forecasts as a Function of Atmospheric and Oceanic Conditions

Makarova, Evgeniia

Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/288875

COMPARTIR / EXPORTAR:

SHARE BASE	Comparte tu historia de Acceso Abierto
Visualizar otros formatos: MARC \| Dublin Core \| RDF \| ORE \| MODS \| METS \| DIDL \| DATACITE
Refman EndNote Bibtex RefWorks Excel CSV PDF DataCite Send via email

Título:	Proposal and Evaluation of the Machine Learning Models for Correcting ERA5 Stress Equivalent Wind Forecasts as a Function of Atmospheric and Oceanic Conditions
Autor:	Makarova, Evgeniia CSIC ORCID
Director:	Portabella, Marcos CSIC ORCID
Palabras clave:	Scatterometer-based corrections ERA5 biases Machine learning Ocean forcing
Fecha de publicación:	dic-2022
Editor:	Universidad Autónoma de Madrid CSIC - Instituto de Ciencias del Mar (ICM)
Resumen:	This work aims at creating a preliminary machine learning (ML) model for correcting the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis stress-equivalent local wind biases, based on atmospheric and oceanic parameters. Several errors in the ECMWF global output for near surface ocean winds have been reported when validated against scatterometer observations. An existing approach for correcting these biases (the so-called ERA* method) consists of scatterometer-based corrections accumulated over a certain time window at each grid point, which allows to reduce local persistent biases. This approach is sensitive to scatterometer sampling and, to collect a statistically significant number of samples, assumes that such biases are static. This is not the case for errors due to moist convection or the diurnal cycle. For operational purposes, the temporal window is lagged with respect to the reanalysis forecast time and the time difference between scatterometer-based correction (SC) and sample data collections can be ten days. We propose a preliminary ML setup that looks for the functional relationship between several oceanic and atmospheric variables that describe the persistent NWP errors as observed in the NWP-scatterometer differences. This would allow to predict the biases of the stressequivalent wind forecasts and using the bias corrections in coupled weather or seasonal forecasts, or to account for these in climate runs. Such variables are first identified as ECMWF model parameters, such as stress-equivalent winds, their derivatives (curl and divergence), atmospheric stability related parameters, i.e., sea-surface temperature (SST), air temperature (Ta), relative humidity (rh), surface pressure (sp), as well as SST gradients and ocean currents. This work evaluates the feasibility of such approach and provides an overview of possible implementations of this regression. Several ML algorithms are trained on a dataset that covers a period of 65 days and further evaluated. These algorithms include two libraries based on Gradient Boosting Decision Trees (GBDT), such as XGBoost and LightGBM, and feed-forward neural networks, implemented with the sklearn library (MLP Regressor) and with the Tensorflow and Keras API. The models are trained to reproduce the differences between collocated scatterometer (ASCAT-A) and ERA5 U10S. The resulting models are further evaluated against a test dataset that covers a period of 23 days posterior to the training period. The best performing models are then further selected to generate the corrections for the entire ERA5 forecasts. The corrected forecasts are then collocated with an independent scatterometer HSCAT-B that has a local pass time that differs 3.5 hours from that of ASCAT-A. Globally, the best performing model is a Tensorflow-based neural network with 4 hidden layers with 256, 128, 64, 32 neurons per layer, with dropout used for regularization. It shows a 5.54% of square error reduction globally, and in particular up to 7.66% in the extra-tropics, compared to ERA5 (test period). In the tropics and high latitudes, the error variance reduction is of 3.67% and 5.47%, respectively. This neural network setup outperforms the ERA* product in the extra-tropics and high latitudes, although not in the tropics. This work demonstrates that it is possible to reduce ERA5 local biases by using only NWP variables as model inputs, which makes this approach promising for operational setup purposes
Descripción:	Trabajo final presentado por Evgeniia Makarova para un máster en Data Science de la Universidad Autónoma de Madrid (UAM), realizado bajo la dirección del Dr. Marcos Portabella Arnús del Institut de Ciències del Mar (ICM-CSIC).-- 71 pages
URI:	http://hdl.handle.net/10261/288875
Aparece en las colecciones:	(ICM) Tesis