Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
Peter Dueben
Royal Society University Research Fellow & ECMWF’s Coordinator for Machine Learning and AI Activities
The key:
Neural networks can learn a complex task as a “black box”
No previous knowledge about the system is required
More data will allow for better networks
And weather?
Decision trees and random forrests
Source: https://medium.com/@recrosoft.io/supervised-vs-unsupervised-learning-key-differences-cdd46206cdcb
Why would machine learning help in weather predictions?
Predictions of weather and climate are difficult:
conventional models
no long-term effect
We could base the entire model on neural networks and trash the conventional models.?
There are limitations for existing models and ECMWF provides access to hundreds of petabytes of data
Time evolution of Z500 for historic data and a neural network prediction.
Can you tell which one is the neural network?
Unlikely…
➢ The simulations change dynamics in long integrations and it is unclear how
to fix conservation properties.
➢ It is unknown how to increase complexity and how to fix feature interactions.
➢ There are only ~40 years of data available. Dueben and Bauer GMD 2018
Why is it hard to beat conventional forecast systems in the medium range?
Top-of-the-atmosphere cloud brightness temperature [K] for satellite observations and a simulation of the
atmosphere with 1.45 km resolution.
Dueben, Wedi, Saarinen and Zeman JSMJ 2020
Today, global weather forecast simulations have O(1,000,000,000) degrees-of-freedom, can represent many
details of the Earth System, and show a breath-taking level of complexity.
Readers’ Choice Award for the “Best Use of HPC in Physical Science” by HPCwire in 2020.
Why is it hard to beat conventional forecast systems in the medium range?
Deep learning for multi-year ENSO forecasts: Ham, Kim, Luo Nature 2019
And climate?
Many application areas for machine learning across ECMWF
State-of-play and outline of the talk
I will name the main challenges that we are facing when using
machine learning today
Observations:
Detect the risk for the ignition of wild fires by lightnings
Ruth Coughlan, Francesca Di Giuseppe, Claudia Vitolo and the SMOS-E project
Data assimilation: Bias-correct the forecast model in 4DVar data assimilation
• Data-assimilation blends observations and the forecast model to
generate initial conditions for weather predictions
• During data-assimilation the model trajectory is “synchronised” with
observations for the same weather regimes
• It is possible to learn model error when comparing the model with
(trustworthy) observations
Two approaches:
• Learn model error within the 4DVar data-assimilation framework for
“weak-constraint 4D-Var”
• Learn model error from a direct comparison of the model trajectory
to observations or analysis increments using deep learning
(column-based or three-dimensional)
Background departure Estimation from Neural Network
Benefit:
When the bias is learned, it can be used to:
• Correct for the bias during data-assimilation
to improve initial conditions
• Correct for the bias in forecast simulations
to improve predictions (discussed controversially)
• Understand model deficiencies
Patrick Laloyaux, Massimo Bonavita and Peter Dueben @ ECMWF + Thorsten Kurth and David Matthew Hall @ NVIDIA
To emulate parametrisation schemes
Method:
• Store input/output data pairs of a parametrisation scheme
• Use this data to train a neural network
• Replace the parametrisation scheme by the neural network within the model
We emulate the non-orographic gravity wave drag within the Integrated Forecasting System (IFS)
Chantry, Hatfield, Dueben, Polichtchouk and Palmer https://arxiv.org/abs/2101.08195
Results:
• Nice relationship between neural network complexity and error reduction
• Similar cost when used within IFS on CPU hardware and 10 times faster when used offline on GPUs
• Emulator was used successfully to generate tangent linear and adjoint code within 4D-Var data assimilation
Hatfield, Chantry, Dueben, Lopez, Geer, Palmer in preparation
• Forecast error can be reduced when training with more angles and wavespeed elements
Numerical weather forecasts: To emulate the 3D cloud effects in radiation
To represent 3D cloud effects for radiation (SPARTACUS) within simulations of the Integrated Forecast Model is
four time slower than the standard radiation scheme (Tripleclouds)
Can we emulate the difference between Tripleclouds and SPARTACUS using neural networks?
Rel. Cost Tripleclouds SPARTACUS Neural Network Tripleclouds+Neural Network Meyer, Hogan, Dueben, Mason
1.0 4.4 0.003 1.003 https://arxiv.org/abs/2103.11919
Numerical weather forecasts: To precondition the linear solver
• Linear solvers are important to build efficient semi-implicit time-stepping schemes for atmosphere and ocean models.
• However, the solvers are expensive.
• The solver efficiency depends critically on the preconditioner that is approximating the inverse of a large matrix.
Can we use machine learning for preconditioning, predict the inverse of the matrix and reduce the number of
iterations that are required for the solver?
Testbed: A global shallow water model at 5 degree resolution but with real-world topography.
Method: Neural networks that are trained from the model state and the tendencies of full timesteps.
It turns out that the approach (1) is working and cheap, (2) interpretable and (3) easy to implement
even if no preconditioner is present.
Ackmann, Dueben, Smolarkieicz and Palmer https://arxiv.org/abs/2010.02866
Post-processing and dissemination: Improve ensemble predictions
RAW ENSEMBLE
D4 D3 D2 D1
POINT RAINFALL (post-processed)
D4 D3 D2 D1
Source: https://towardsdatascience.com
Source: UCAR
Use case: Eventually, apply the tool to climate predictions to understand changes of local precipitation
pattern due to climate change.
Method: Use Tru-NET with a mixture of ConvGru layers to represent spatial-temporal scale interactions
and a novel Fused Temporal Cross Attention mechanism to improve time dependencies.
Model RMSE
Conventional forecast model 3.627
Hierarchical Convolutional GRU 3.266
Tru-Net 3.081
Problem: Find a three-dimensional machine learning solution that work on unstructured grids.
• Machine learning accelerators are focussing on low numerical precision and high floprats.
• Example: TensorCores on NVIDIA Volta GPUs are optimised for half-precision matrix-
matrix calculations with single precision output.
→ 7.8 TFlops for double precision vs. 125 TFlops for half precision
Relative cost for model components for a non-hydrostatic model at 1.45 km resolution:
• The Legendre transform is the most expensive kernel. It consists of a large number of
standard matrix-matrix multiplications.
• If we can re-scale the input and output fields, we can use half precision arithmetic.
Half precision Legendre Transformations
Root-mean-square error for geopotential height at 500 hPa at 9 km resolution averaged over multiple
start dates. Hatfield, Chantry, Dueben, Palmer Best Paper Award PASC2019
The simulations are using an emulator to reduce precision (Dawson and Dueben GMD 2017) and
more thorough diagnostics are needed.
We have recently published our machine learning roadmap
https://events.ecmwf.int/event/232/
https://www.ecmwf.int/en/elibrary/19877-machine-learning-ecmwf-roadmap-next-10-years
Challenges and milestones
Different philosophy for domain and machine learning scientists
Approach: Support close collaborations // study explainable AI, trustworthy AI and physics
informed machine learning
For many applications off-the-shelf machine learning tools will not be sufficient
Approach: Foster cross-disciplinary collaborations // develop customised machine learning
tools // Benchmark Datasets
Difficult to learn from observations and to improve models
Approach: Learn from and exploit data assimilation // learn boundary conditions
from observations
Data avalanche
Approach: Anticipate data access and channelise requests // efficient use of
heterogeneous hardware
Different set of tools (e.g. Fortran on CPUs vs. Python on GPUs)
Approach: Training // Software // Hardware
Integrate machine learning tools into the conventional NWP and climate service workflow
Approach: Centralised tools and efforts // embed efforts into the scalability project
MAchinE Learning for Scalable meTeoROlogy and cliMate
(MAELSTROM) started 1st April
• There are a large number of application areas throughout the prediction workflow in weather and
climate modelling for which machine learning could make a difference.
• The weather and climate community is still only at the beginning to explore the potential of
machine learning (and in particular deep learning).
• Machine learning could not only be used to improve models, it could also be used to make them
more efficient on future supercomputers.