Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

AI and Machine learning for weather predictions

Peter Dueben

Royal Society University Research Fellow & ECMWF’s Coordinator for Machine Learning and AI Activities

This research used resources of the Oak Ridge


The ESIWACE, MAELSTROM and AI4Copernicus projects
Leadership Computing Facility (OLCF), which is a DOE
Office of Science User Facility supported under Contract
The strength of a common goal have received funding from the European Union under
grant agreement No 823988, 955513 and 101016798.
DE-AC05-00OR22725.
Let’s start with definitions

Artificial intelligence (AI) is intelligence demonstrated by machines, in


contrast to the natural intelligence displayed by humans (Wikipedia)
Example: A self-driving car stops as it detects a cyclist crossing
Artificial intelligence

Machine learning (ML) is the scientific study of algorithms and statistical


Machine learning models that computer systems use to perform a specific task without using
explicit instructions… (Wikipedia)
Example: To learn to distinguish between a cyclist and other things from data
Deep
learning
Deep learning is part of a broader family of machine learning methods
based on artificial neural networks (Wikipedia)
Example: The technique that is used to detect a cyclist in a picture
Deep learning and artificial neural networks as one example of machine learning
The concept:
Take input and output samples from a large data set
Learn to predict outputs from inputs
Predict the output for unseen inputs

The key:
Neural networks can learn a complex task as a “black box”
No previous knowledge about the system is required
More data will allow for better networks

The number of applications is increasing by day:


Image recognition
Speech recognition
Healthcare
Gaming
Finance
Music composition and art

And weather?
Decision trees and random forrests

- Decisions fork in tree structures An example for ecPoint:


until a prediction is made.

- “Random forest” methods are


training a multitude of decision
trees using a mean predictions
or the value with the most hits as
a result.

- Decision trees are often fast and


accurate and they are able to
conserve some of the properties
of the system.

- Decision trees often require a lot


of memory (as they serve as an
efficient look-up table).
Hewson and Pillosu 2020
Two families of machine learning

Source: https://medium.com/@recrosoft.io/supervised-vs-unsupervised-learning-key-differences-cdd46206cdcb
Why would machine learning help in weather predictions?
Predictions of weather and climate are difficult:

• The Earth is huge, resolution is limited and we cannot represent


all important processes within model simulations

• The Earth System shows “chaotic” dynamics which makes it


difficult to predict the future based on equations

• Some of the processes involved are not well understood

• All Earth System components (atmosphere, ocean, land surface,


cloud physics,…) are connected in a non-trivial way

However, we have a huge number of observations

➢ There are many application areas for machine learning


throughout the workflow of numerical weather predictions

➢ Machine learning also provides a number of


opportunities for high performance computing
Why now?
• Increase in data volume and knowledge
• New computing hardware
• New machine learning software
Slide from Torsten Hoefler (ETH)
Bauer et al. ECMWF SAC paper 2019
Destination Earth at the horizon…
What will machine learning for numerical weather and climate predictions
look like in 10 years from now?

Machine learning will replace


Machine learning will have

conventional models
no long-term effect

The uncertainty range is still very large...


Can we replace conventional weather forecast systems by deep learning?

We could base the entire model on neural networks and trash the conventional models.?
There are limitations for existing models and ECMWF provides access to hundreds of petabytes of data

A simple test configuration:


▪ We retrieve historical data (ERA5) for geopotential at 500 hPa (Z500) for the last decades
(>65,000 global data sets)
▪ We map the global data to a coarse two-dimensional grid (60x31)
▪ We learn to predict the update of the field from one hour to the next using deep learning
▪ Once we have learned the update, we can perform predictions into the future

No physical understanding is required!

Dueben and Bauer GMD 2018


Can we replace conventional weather forecast systems by deep learning?

Time evolution of Z500 for historic data and a neural network prediction.
Can you tell which one is the neural network?

➢ The neural network is picking up the dynamics nicely.


➢ Forecast errors are comparable if we compare like with like.
➢ There is a lot of progress at the moment.
Scher and Messori GMD 2019; Weyn, Durran, and Caruana JAMES 2019; Rasp and Thuerey 2020…
➢ Is this the future for medium-range weather predictions?

Unlikely…
➢ The simulations change dynamics in long integrations and it is unclear how
to fix conservation properties.
➢ It is unknown how to increase complexity and how to fix feature interactions.
➢ There are only ~40 years of data available. Dueben and Bauer GMD 2018
Why is it hard to beat conventional forecast systems in the medium range?

Top-of-the-atmosphere cloud brightness temperature [K] for satellite observations and a simulation of the
atmosphere with 1.45 km resolution.
Dueben, Wedi, Saarinen and Zeman JSMJ 2020

Today, global weather forecast simulations have O(1,000,000,000) degrees-of-freedom, can represent many
details of the Earth System, and show a breath-taking level of complexity.

They are based on decades of model developments and process understanding.

Readers’ Choice Award for the “Best Use of HPC in Physical Science” by HPCwire in 2020.
Why is it hard to beat conventional forecast systems in the medium range?

Dueben and Palmer MWR 2015:


Single precision runs in IFS are possible and time-to-solution is reduced by 40%.
Where can deep learning models potentially beat conventional Earth System models?

1-hour MetNET precipitation predictions by Google:


Agrawal, Barrington, Bromberg, Burge, Gazen, Hickey arXiv:1912.12132
NOAA forecast Ground truth Machine learning:

Deep learning for multi-year ENSO forecasts: Ham, Kim, Luo Nature 2019

And climate?
Many application areas for machine learning across ECMWF
State-of-play and outline of the talk

There are many interesting application areas for machine


learning to improve weather and climate predictions

We are only at the very beginning to explore the potential of


machine learning in the different areas

I will present a couple of example applications of machine


learning at ECMWF in the following

I will name the main challenges that we are facing when using
machine learning today
Observations:
Detect the risk for the ignition of wild fires by lightnings

• Observations for 15 variables are used as inputs including


soil moisture, 2m temperature, soil type, vegetation cover,
relative humidity, and precipitation

• The rate of radiant heat output from the Global Fire


Assimilation System (GFAS) of CAMS (monitored by the
MODIS satellite) was used to generate a “truth”

• 12,000 data points were used for training

• Different machine learning tools (decision trees, random


forest and Ada Boost​) are used to classify the cases into
“ignition” and “no-ignition”

• The best classifier has an accuracy of 77 %

Ruth Coughlan, Francesca Di Giuseppe, Claudia Vitolo and the SMOS-E project
Data assimilation: Bias-correct the forecast model in 4DVar data assimilation
• Data-assimilation blends observations and the forecast model to
generate initial conditions for weather predictions
• During data-assimilation the model trajectory is “synchronised” with
observations for the same weather regimes
• It is possible to learn model error when comparing the model with
(trustworthy) observations

Two approaches:
• Learn model error within the 4DVar data-assimilation framework for
“weak-constraint 4D-Var”
• Learn model error from a direct comparison of the model trajectory
to observations or analysis increments using deep learning
(column-based or three-dimensional)
Background departure Estimation from Neural Network
Benefit:
When the bias is learned, it can be used to:
• Correct for the bias during data-assimilation
to improve initial conditions
• Correct for the bias in forecast simulations
to improve predictions (discussed controversially)
• Understand model deficiencies

Patrick Laloyaux, Massimo Bonavita and Peter Dueben @ ECMWF + Thorsten Kurth and David Matthew Hall @ NVIDIA
To emulate parametrisation schemes
Method:
• Store input/output data pairs of a parametrisation scheme
• Use this data to train a neural network
• Replace the parametrisation scheme by the neural network within the model

Why would you do this?


Neural networks are likely to be much more efficient and portable to
heterogenous hardware

Active area of research:


Chevallier et al. JAM 1998, Krasnopolsky et al. MWR 2005, Rasp et al. PNAS 2018,
Brenowitz and Bretherton GRL 2018…

We emulate the non-orographic gravity wave drag within the Integrated Forecasting System (IFS)
Chantry, Hatfield, Dueben, Polichtchouk and Palmer https://arxiv.org/abs/2101.08195

Results:
• Nice relationship between neural network complexity and error reduction
• Similar cost when used within IFS on CPU hardware and 10 times faster when used offline on GPUs
• Emulator was used successfully to generate tangent linear and adjoint code within 4D-Var data assimilation
Hatfield, Chantry, Dueben, Lopez, Geer, Palmer in preparation
• Forecast error can be reduced when training with more angles and wavespeed elements
Numerical weather forecasts: To emulate the 3D cloud effects in radiation
To represent 3D cloud effects for radiation (SPARTACUS) within simulations of the Integrated Forecast Model is
four time slower than the standard radiation scheme (Tripleclouds)
Can we emulate the difference between Tripleclouds and SPARTACUS using neural networks?

Rel. Cost Tripleclouds SPARTACUS Neural Network Tripleclouds+Neural Network Meyer, Hogan, Dueben, Mason
1.0 4.4 0.003 1.003 https://arxiv.org/abs/2103.11919
Numerical weather forecasts: To precondition the linear solver
• Linear solvers are important to build efficient semi-implicit time-stepping schemes for atmosphere and ocean models.
• However, the solvers are expensive.
• The solver efficiency depends critically on the preconditioner that is approximating the inverse of a large matrix.

Can we use machine learning for preconditioning, predict the inverse of the matrix and reduce the number of
iterations that are required for the solver?

Testbed: A global shallow water model at 5 degree resolution but with real-world topography.
Method: Neural networks that are trained from the model state and the tendencies of full timesteps.

No preconditioner: Machine learning preconditioner: Implicit Richardson preconditioner:

It turns out that the approach (1) is working and cheap, (2) interpretable and (3) easy to implement
even if no preconditioner is present.
Ackmann, Dueben, Smolarkieicz and Palmer https://arxiv.org/abs/2010.02866
Post-processing and dissemination: Improve ensemble predictions

Ensemble predictions are important but expensive.


Can we improve ensemble skill scores from a small number of ensemble members via deep learning?
• Use global fields of five ensemble members as inputs.
• Correct the ensemble scores of temperature at 850 hPa and Z500 hPa for a 2-day forecast towards a full 10
member ensemble forecast. Grönquist, Yao, Ben-Nun, Dryden, Dueben, Lavarini, Li, Hoefler Phil Trans A 2021
Post-processing and dissemination: ecPoint to post-process rainfall predictions
• Use forecast data as inputs
• Train against worldwide rainfall observations
• Improve local rainfall predictions by accounting Probability (%) > 50mm /12h
for sub-grid variability and weather-dependent biases
• Use decision trees as machine learning tool

RAW ENSEMBLE

D4 D3 D2 D1
POINT RAINFALL (post-processed)

D4 D3 D2 D1

Example: Devastating floods in Crete on 25 February 2019


24h rain
Benefits: Earlier and more consistent signal with higher probabilities
Timothy Hewson and Fatima Pillosu
1st domain specific problem for machine learning: Scale interactions
Weather and climate modelling: Machine learning:
Tools need to allow for scale interactions Neural network tools allow for encoding/decoding structures

Source: https://towardsdatascience.com

Source: UCAR

Can we use encoder/decoder networks to represent scale interactions?


Post-processing and dissemination: Precipitation down-scaling
Problem: Learn to map weather predictions from ERA5 reanalysis data at ~50 km resolution to E-OBS
local precipitation observations at ~10 km resolution over the UK.

Use case: Eventually, apply the tool to climate predictions to understand changes of local precipitation
pattern due to climate change.

Method: Use Tru-NET with a mixture of ConvGru layers to represent spatial-temporal scale interactions
and a novel Fused Temporal Cross Attention mechanism to improve time dependencies.

Model RMSE
Conventional forecast model 3.627
Hierarchical Convolutional GRU 3.266
Tru-Net 3.081

Adewoyin, Dueben, Watson, He, Dutta http://arxiv.org/abs/2008.09090


2nd domain specific problem: Multi-scale modelling on unstructured grids

Source: Willem Deconinck


Source: Polavarapu et al. 2005

Longitude/latitude vs. reduced Gaussian cubic octahedral grid

Problem: Find a three-dimensional machine learning solution that work on unstructured grids.

Solution: Maybe Geometric deep learning and Graph Neural Networks,


see Master Thesis of Icíar Lloréns Jover @ EPFL (https://infoscience.epfl.ch/record/278138)
3rd domain specific problem: Use of deep learning hardware for conventional models

• Machine learning accelerators are focussing on low numerical precision and high floprats.
• Example: TensorCores on NVIDIA Volta GPUs are optimised for half-precision matrix-
matrix calculations with single precision output.
→ 7.8 TFlops for double precision vs. 125 TFlops for half precision

Can we use TensorCores within our models?

Relative cost for model components for a non-hydrostatic model at 1.45 km resolution:

• The Legendre transform is the most expensive kernel. It consists of a large number of
standard matrix-matrix multiplications.
• If we can re-scale the input and output fields, we can use half precision arithmetic.
Half precision Legendre Transformations

Root-mean-square error for geopotential height at 500 hPa at 9 km resolution averaged over multiple
start dates. Hatfield, Chantry, Dueben, Palmer Best Paper Award PASC2019

The simulations are using an emulator to reduce precision (Dawson and Dueben GMD 2017) and
more thorough diagnostics are needed.
We have recently published our machine learning roadmap

https://events.ecmwf.int/event/232/
https://www.ecmwf.int/en/elibrary/19877-machine-learning-ecmwf-roadmap-next-10-years
Challenges and milestones
Different philosophy for domain and machine learning scientists
Approach: Support close collaborations // study explainable AI, trustworthy AI and physics
informed machine learning
For many applications off-the-shelf machine learning tools will not be sufficient
Approach: Foster cross-disciplinary collaborations // develop customised machine learning
tools // Benchmark Datasets
Difficult to learn from observations and to improve models
Approach: Learn from and exploit data assimilation // learn boundary conditions
from observations
Data avalanche
Approach: Anticipate data access and channelise requests // efficient use of
heterogeneous hardware
Different set of tools (e.g. Fortran on CPUs vs. Python on GPUs)
Approach: Training // Software // Hardware
Integrate machine learning tools into the conventional NWP and climate service workflow
Approach: Centralised tools and efforts // embed efforts into the scalability project
MAchinE Learning for Scalable meTeoROlogy and cliMate
(MAELSTROM) started 1st April

Science blog: https://www.ecmwf.int/en/about/media-centre/science-


blog/2021/large-scale-machine-learning-applications-weather-and

Recording of plenary presentation of kick-off meeting:


https://bluejeans.com/s/KEXqf0tWpOy/
Conclusions

• There are a large number of application areas throughout the prediction workflow in weather and
climate modelling for which machine learning could make a difference.

• The weather and climate community is still only at the beginning to explore the potential of
machine learning (and in particular deep learning).

• Machine learning could not only be used to improve models, it could also be used to make them
more efficient on future supercomputers.

• Further training opportunities in the internet, e.g. https://atcold.github.io/pytorch-Deep-


Learning/ or http://introtodeeplearning.com/

Many thanks! [email protected] @PDueben


The strength of a common goal

You might also like