0% found this document useful (0 votes)
3 views

TorchDA_A Python package for performing data assimilation with deep learning forward and transformation functions

Uploaded by

qinminzheng2020
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

TorchDA_A Python package for performing data assimilation with deep learning forward and transformation functions

Uploaded by

qinminzheng2020
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Computer Physics Communications 306 (2025) 109359

Contents lists available at ScienceDirect

Computer Physics Communications


journal homepage: www.elsevier.com/locate/cpc

Computer Programs in Physics

TorchDA: A Python package for performing data assimilation with deep


learning forward and transformation functions ✩
Sibo Cheng a,b,∗ , Jinyang Min c , Che Liu c , Rossella Arcucci c
a CEREA, École des Ponts and EDF R&D, Île-de-France, France
b Data Science Institute, Department of Computing, Imperial College London, UK
c Department of Earth Science & Engineering, Imperial College London, UK

A B S T R A C T

Data assimilation techniques are often confronted with challenges handling complex high dimensional physical systems, because high precision simulation in complex
high dimensional physical systems is computationally expensive and the exact observation functions that can be applied in these systems are difficult to obtain. It
prompts growing interest in integrating deep learning models within data assimilation workflows, but current software packages for data assimilation cannot handle
deep learning models inside. This study presents a novel Python package seamlessly combining data assimilation with deep neural networks to serve as models for
state transition and observation functions. The package, named TorchDA, implements Kalman Filter, Ensemble Kalman Filter (EnKF), 3D Variational (3DVar), and 4D
Variational (4DVar) algorithms, allowing flexible algorithm selection based on application requirements. Comprehensive experiments conducted on the Lorenz 63
and a two-dimensional shallow water system demonstrate significantly enhanced performance over standalone model predictions without assimilation. The shallow
water analysis validates data assimilation capabilities mapping between different physical quantity spaces in either full space or reduced order space. Overall, this
innovative software package enables flexible integration of deep learning representations within data assimilation, conferring a versatile tool to tackle complex high
dimensional dynamical systems across scientific domains.

Program summary
Program Title: TorchDA
CPC Library link to program files: https://doi.org/10.17632/bm5d7xk6gw.1
Developer’s repository link: https://github.com/acse-jm122/torchda
Licensing provisions: GNU General Public License version 3
Programming language: Python3
External routines/libraries: Pytorch.
Nature of problem: Deep learning has recently emerged as a potent tool for establishing data-driven predictive and observation functions within data assimilation
workflows. Existing data assimilation tools like OpenDA and ADAO are not well-suited for handling predictive and observation models represented by deep neural
networks. This gap necessitates the development of a comprehensive package that harmonizes deep learning and data assimilation.
Solution method: This project introduces TorchDA, a novel computational tool based on the PyTorch framework, addressing the challenges posed by predictive
and observation functions represented by deep neural networks. It enables users to train their custom neural networks and effortlessly incorporate them into data
assimilation processes. This integration facilitates the incorporation of real-time observational data in both full and reduced physical spaces.

1. Introduction arising from forecasting models, data assimilation algorithms are fre-
quently employed [50,12,18,1,46]. For example, data assimilation can
Traditional numerical simulation methods often encounter chal- contribute to reliable ocean climate prediction [27], and prove valu-
lenges such as high computational costs, slow processing speeds, and able in nuclear engineering applications [51,29,28]. Other examples
noticeable long-term prediction errors when dealing with high dimen- involve using data assimilation technique to improve the estimation
sional physical fields [19,49,30,50]. To mitigate the predictive errors of the hydrological state and fluxes [11], and using data assimilation


The review of this paper was arranged by Prof. Andrew Hazel.
* Corresponding author at: CEREA, École des Ponts and EDF R&D, Île-de-France, France.
E-mail address: [email protected] (S. Cheng).

https://doi.org/10.1016/j.cpc.2024.109359
Received 20 February 2024; Received in revised form 12 August 2024; Accepted 28 August 2024
Available online 4 September 2024
0010-4655/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Main Notations
Multi-Layer Perceptron (MLP) as the surrogate function to connect la-
Notation Description tent spaces directly, which leads to reduced computational costs [40].
Data Assimilation Algorithms In addition to improving prediction results by combining deep learning
𝑡, 𝑘 continue and discrete time point models and data assimilation algorithms, Barjard et al. [9] and Bocquet
𝐱𝑘 state vector of the system at time 𝑘
et al. [6] also utilised deep learning and data assimilation combined
𝐲𝑘 observation vector at time 𝑘
𝐗[∶,−1] the last state vector in the state matrix 𝐗 approach to iteratively optimise the deep learning model, which was
𝑘 state transformation function proved that deep learning models can achieve competitive performance
𝐇𝑘 linearised observation function on noisy and sparse observations with the assistance of data assimi-
𝐁𝑘 background error covariance matrix lation. However, all these instances necessitated the assembly of data
𝐑𝑘 observation error covariance matrix
𝐏𝑘 forecast error covariance matrix
assimilation workflows involving the integration of deep learning mod-
𝐊𝑘 Kalman Gain matrix els. The work of Frerix et al. [26] provides an illustrative case, wherein
𝐈 identity matrix they have made their code for training deep learning based inverse ob-
𝑖 index of ensemble members servation operators publicly accessible, but their workflow remains tied
𝐱𝑘(𝑖) state vector estimate for the 𝑖-th ensemble member
to their specific application. Hence, the development of a comprehen-
𝜺𝑘(𝑖) stochastic perturbation of each ensemble member 𝑖
sive package capable of seamlessly harmonising deep learning and data
𝐲𝑘(𝑖) observation estimates for the 𝑖-th member
𝛾 learning rate of the optimisation algorithm assimilation emerges as a necessary pursuit for the contemporary re-
 nonlinear observation function search community.
𝐱̄ 𝑘 central state estimate at time 𝑘 There are several data assimilation libraries and software tools which
𝐱𝑏 background state estimate
have been developed to facilitate the implementation of data assimila-
̂ observation operator in the reduced space
 objective function of variational data assimilation
tion algorithms. Noteworthy examples include OpenDA, DAPPER, PDAF
and ADAO [43,47,42,22]. OpenDA presents a convoluted and intricate
Lorenz 63 Test Case configuration workflow, demanding a profound understanding of the
𝜎 ratio of kinematic viscosity to thermal diffusivity
𝑟 ratio of the Rayleigh number to the critical value
software [43]. ADAO lacks seamless integration with modern program-
𝛽 geometrical factor ming environments, which impedes automatic code completion [22].
DAPPER [47] is an open-access Python package that benchmarks the
Shallow Water Test Case
𝑢, 𝑣 horizontal and vertical velocity components
results of various data assimilation methods and provides comparative
ℎ height of the fluid studies. However, the vital important defect is that these data assim-
𝐸𝑢 ∕𝐷𝑢 encoder/decoder of the horizontal component ilation tools are primarily suited for explicit predictive and observa-
𝐸ℎ ∕𝐷ℎ encoder/decoder of the vertical component tion functions in the context of background states assimilation [43,22].
𝐱̂ latent state estimate
Therefore, upon an examination of well-established data assimilation
𝐲̂ latent observation
tools such as OpenDA and ADAO, it becomes evident that existing data
assimilation tools are insufficient for handling the predictive and obser-
in fluid dynamics, especially for complex flows and turbulent phe- vation models represented by deep neural networks.
nomena [54]. Generally, data assimilation combines observational data The initial step in constructing a package that can integrate deep
with forecasting to enhance the fidelity of predictions and facilitate the learning and data assimilation involves the selection of an appropriate
dynamic updating of model states [50,12,18]. By iteratively refining deep learning framework capable of handling differentiation on neural
model estimates through the integration of observations, data assimila- networks. The statistical data on the usage of deep learning frameworks
tion seeks to minimise the difference between predictions and observa- combined with data assimilation applications is presented in Fig. 1a
tions [50,12,18]. However, the practical utilisation of data assimilation based on data assimilation related publications on Papers With Code.1
algorithms is based on the availability of explicit predictive and obser- It is noticeable that TensorFlow is the most popular framework for the
vation functions, thereby constraining their efficacy in scenarios charac- current data assimilation research community. However, an examina-
terised by challenges in function derivation, notably evident in instances tion of Google search trends2 for the general research community of
of chaotic, high dimensional, or nonlinear systems [50,26,41,31]. Ef- Artificial Intelligence, as illustrated in Fig. 1b, indicates that PyTorch
forts to improve speed of simulations involve using data-driven meth- is experiencing fluctuating growth trends rather than a gradual decline
ods, especially by using neural networks to approximate forward func- when compared to TensorFlow. Therefore, it is trustworthy that select-
tions, but it is nontrivial to seamlessly integrate these data-driven ap- ing PyTorch as the basis of this package for integrating deep learning
proaches with conventional data assimilation tools [14,44,45]. and data assimilation could be the better choice for more potential stud-
In the last decade, deep learning has emerged as a powerful tool for ies of the data assimilation domain in the future.
creating predictions and observation models, especially when explicit In this project, a novel approach is proposed by integrating data as-
functions are difficult to obtain. This advancement enables the establish- similation with deep learning, and an innovative computational tool,
ment of data-driven predictive and observation functions by leveraging named TorchDA, has been developed based on the PyTorch framework.
available state variables and observations [18]. Researchers have ex- This tool effectively addresses the data assimilation challenge posed by
plored the integration of this innovative technique within their data as- predictive and observation functions represented by deep neural net-
similation workflows [18]. In a notable illustration, Amendola et al. [1] works. Unlike previously mentioned GLA and LSDA, utilising TorchDA
harnessed autoencoders to encode carbon dioxide concentration states eliminates the requirement for creating surrogate functions to connect
within a controlled environment into compact latent space vectors. Sub- distinct latent spaces. Users have the autonomy to train their own neu-
sequently, they employed Long-Short Term Memory (LSTM) model to ral networks to replace conventional predictive and observation models.
characterise the transitions within these encoded latent space represen- Subsequently, these neural networks can be integrated into TorchDA
tations [1]. Correspondingly, Arcucci et al. [3] and Boudier et al. [8] to facilitate data assimilation computations. In addition, implement-
reported noteworthy reductions in forecasting errors upon integrating ing data assimilation algorithms using Pytorch will naturally allow the
deep learning with data assimilation, particularly within the context of
the Lorenz system. Moreover, Cheng et al. [15] utilised Generalised La-
tent Assimilation (GLA) technique and conducted experiments pertain- 1
Papers With Code: https://paperswithcode.com/.
ing to flow in a tube, leading to substantial acceleration and remarkable 2
Google Trends: https://trends.google.com/trends/explore?date=2014-
reductions in computational expenses for predictive tasks over time. An- 01-01%202024-01-01&q=%2Fm%2F0h97pvq,%2Fg%2F11bwp1s2k3,%2Fg%
other method named Latent Space Data Assimilation (LSDA) adopted 2F11gd3905v1&hl=en-US.

2
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 1. Statistical data of deep learning and data assimilation combined applications, and interests trend of popular machine learning frameworks on Google.

acceleration of GPU for parallel computing which is proved to be signifi- state at a specific temporal instant [50,12,18]. A range of algorithms,
cantly more efficient compared to CPU computations [61]. On the other notably including Kalman Filter-based techniques and variational meth-
hand, the stochastic gradient descent method can also improve the con- ods, are commonly employed for data assimilation purposes [50,12].
vergence of the assimilation problem by avoiding getting stuck in local Moreover, these methods often combine with deep neural networks in
minima [63]. current studies, which indicated the importance of explaining these ap-
To verify the correctness of implementations inside this package, ex- proaches in detail [18].
periments on two testing cases including Lorenz 63 system [37] and
shallow water model [58] were conducted. These test cases involved 2.1.1. Kalman filter
the application of various data assimilation algorithms, including the The Kalman Filter is a fundamental algorithm in data assimilation
Kalman Filter, Ensemble Kalman Filter (EnKF), 3D Variational (3DVar), because it is efficient to combine observed data and model predictions
and 4D Variational (4DVar). Specifically, the filter-based algorithms to yield refined state estimates [32,4].
applied data assimilation on future prediction states over time, while At its core, the Kalman Filter embodies an iterative process compris-
variational methods concentrated on optimising the initial background ing two primary steps: the prediction step (Time Update) and the update
state. step (Measurement Update) [32,4]. Let 𝐱𝑘 represent the state vector of
Compared to traditional data assimilation techniques driven by nu- the system at time 𝑘, while 𝐲𝑘 signifies the corresponding observation
merical models, this Python-based tool harnesses the capability of deep vector. The prediction step engenders an estimate of the state vector
learning algorithms, markedly enhancing the efficiency of data assimila- 𝐱𝑘|𝑘−1 based on the previous state 𝐱𝑘−1 and the linearised state tran-
tion computations. Moreover, this work successfully resolves the limita- sition 𝑘 , while accounting for the perturbations introduced by the
tions existed in traditional Python data assimilation tools such as ADAO, process noise 𝐐𝑘 :
particularly the incapability to handle derivatives of deep neural net-
works. In summary the main contributions of the proposed TorchDA 𝐱𝑘|𝑘−1 = 𝑘 𝐱𝑘−1|𝑘−1 .
package in the data assimilation community could be summarized as:
Simultaneously, the error covariance matrix 𝐏𝑘|𝑘−1 evolves via the
• A seamless integration of deep learning-based transformation and relation:
forward operator in data assimilation schemes;
𝐏𝑘|𝑘−1 = 𝑘 𝐏𝑘−1|𝑘−1 𝑇𝑘 + 𝐐𝑘 ,
• Embedded functionality for computing the tangent linear and ad-
joint models in deep learning frameworks; where (.)𝑇 is the transpose operator. Subsequently, the update step re-
• Efficient GPU computations for both Kalman-type and variational fines the state estimate and the associated error covariance matrix by
data assimilation algorithms; assimilating the observational information via linearised observation
• An easy implementation of stochastic gradient descent method to function 𝐇𝑘 . The Kalman Gain 𝐊𝑘 is calculated according to the er-
improve the DA optimization problem, in particular, for variational ror covariance matrix of the observations and predicted state vector.
methods. The updated state estimate 𝐱𝑘|𝑘 and the corresponding error covariance
matrix 𝐏𝑘|𝑘 is calculated as:
The rest of this paper is organised as follows: it starts by present-
ing fundamental concepts of data assimilation and its combination with 𝐊𝑘 = 𝐏𝑘|𝑘−1 𝐇𝑇𝑘 (𝐇𝑘 𝐏𝑘|𝑘−1 𝐇𝑇𝑘 + 𝐑𝑘 )−1 ,
deep neural networks in Section 2, followed by the explanation of the
package structure, then computational examples of Lorenz 63 Model and 𝐱𝑘|𝑘 = 𝐱𝑘|𝑘−1 + 𝐊𝑘 (𝐲𝑘 − 𝐇𝑘 𝐱𝑘|𝑘−1 ),
Shallow Water Model would be provided afterwards.
𝐏𝑘|𝑘 = (𝐈 − 𝐊𝑘 𝐇𝑘 )𝐏𝑘|𝑘−1 .
2. Background This process shows the ability of Kalman Filter to obtain better es-
timates of the current state by considering uncertainties in the system
2.1. Data assimilation concept dynamics and measurement process [32,4]. However, the Kalman Filter
can only tackle linear problems due to the requirement of transpose on
As previously elucidated, data assimilation integrates observations both state transition 𝑘 and observation function 𝐇𝑘 . The Kalman Fil-
with predictions of the state space, aiming to approximate the authentic ter implementation inside the package is described in Algorithm 1. It is

3
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

worth mentioning that the Kalman Filter implementation in TorchDA Algorithm 2 Formulation of the implemented EnKF.
assumes a constant 𝐏 matrix which is a simplification to a standard Inputs: 𝑁𝑒 , , 𝐇, 𝐏, 𝐑, 𝐱𝑏 , 𝐲
Kalman Filter. In this context, the symbol [∅] denotes an empty list, parameters: 𝐗𝑘 , 𝐗𝑎𝑣𝑒𝑟𝑎𝑔𝑒 , 𝐗𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒 , 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 , 𝑁𝑦
and the operator ⊕ indicates the concatenation operation. In addition, 𝐗𝑘 ← 𝑁𝑒 number of ensemble perturbed around 𝐱𝑏 with 𝐏
𝐗[∶,−1] refers to selecting the last state vector in the 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 matrix 𝐗𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ← [∅], 𝑁𝑦 ← number of uncorrelated observations
which is a sequence of state vectors along timeline. for 𝑘 ← 1 to 𝑁𝑦 do
𝐗𝑚𝑒𝑎𝑛 = 0
for 𝑖 ← 1 to 𝑁𝑒 do
Algorithm 1 Formulation of the implemented Kalman Filter. 𝐗(𝑖) = (𝐗(𝑖)𝑘−1,𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠
)
Inputs: , 𝐇, 𝐏, 𝐑, 𝐱𝑏 , 𝐲 𝐗(𝑖)
𝑘
← 𝐗(𝑖)
[∶,−1]
parameters: 𝐱𝑘 , 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 , 𝑁𝑦 𝐗𝑚𝑒𝑎𝑛 = 𝐗𝑚𝑒𝑎𝑛 + 𝐗(𝑖)
𝐱𝑘 = 𝐱𝑏 , 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 ← [∅], 𝑁𝑦 ← number of uncorrelated observations end for
for 𝑘 ← 1 to 𝑁𝑦 do 𝐘 ← 𝑁𝑒 number of ensemble perturbed around 𝐲𝑘 with 𝐑
∑𝑁𝑒
𝐗𝑘 = (𝐱𝑘−1,𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 ) 𝐱𝑚𝑒𝑎𝑛 = 𝑁1 𝑒=1 𝐗𝑘(𝑒) , where 𝐗𝑘(𝑒) is the e-th ensemble member
𝐱𝑘 ← 𝐗𝑘[∶,−1] 𝑒

𝐏𝑒 = 1
⋅ (𝐗𝑘 − 𝐱𝑚𝑒𝑎𝑛 )𝑇 (𝐗𝑘 − 𝐱𝑚𝑒𝑎𝑛 )
𝐱𝑘,𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = 𝐱𝑘 + 𝐏𝐇𝑇 [𝐇𝐏𝐇𝑇 + 𝐑]−1 (𝐲𝑘 − 𝐇𝐱𝑘 ) 𝑁𝑒 −1
𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 ← 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 ⊕ 𝐗𝑘 𝐗𝑘,𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = 𝐗𝑘 + (𝐏𝑒 𝐇𝑇 [𝐇𝐏𝑒 𝐇𝑇 + 𝐑]−1 (𝐘𝑇 − 𝐇𝐗𝑇𝑘 ))𝑇
end for 𝐗𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ← 𝐗𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ⊕ ( 𝑁1 ⋅ 𝐗𝑚𝑒𝑎𝑛 )
𝑒
Output: 𝐗𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 end for
Outputs: 𝐗𝑎𝑣𝑒𝑟𝑎𝑔𝑒

2.1.2. Ensemble Kalman filter


The Ensemble Kalman Filter (EnKF) is a method that uses a group tation of nonlinear features in the underlying state distribution, thus
of model states to achieve better estimates of the current state [33]. It causing the EnKF algorithm particularly suitable for scenarios charac-
supports nonlinear state transition 𝑘 [33]. Meanwhile, the covariance terised by complex and nonlinear dynamics [33,10].
matrix 𝐏𝑘 can be efficiently approximated by the ensemble process, so
2.1.3. 3D variational data assimilation
computational costs are also reduced [33] compared to explicitly evolv-
The 3D Variational (3DVar) algorithm embodies a variational ap-
ing the 𝐏 matrix as done in a standard Kalman Filter. At the same time,
proach aimed at optimising the state estimate by assimilating observa-
the TorchDA package still allows for flow-dependent uncertainty esti-
tions while accounting for prior information and background uncertain-
mates.
ties [48].
In essence, the EnKF algorithm mirrors the principles of the Kalman
Fundamentally, the 3DVar framework formulates the assimilation
Filter, albeit with a distinct ensemble-based approach [33]. It operates
process as an optimisation problem, with the objective of minimising
iteratively, comprising a forecast step and an update step [33]. At the
(𝑖) the mismatch between background state and observational data [48].
heart of the EnKF lies the ensemble of model state vectors {𝐱𝑘 }, where
At its core, the algorithm seeks to determine the state estimate 𝐱0 that
𝑖 indexes the ensemble members, 𝑘 signifies the time step, and 𝐱𝑘(𝑖) de- optimally merges the background state with the observational data [48].
notes the state vector estimate for the 𝑖-th ensemble member at time 𝑘. This behaviour is accomplished through the construction of an objective
In the current version of TorchDA, we consider a stochastic EnKF im- function that quantifies the difference between the observed data 𝐲 and
plementation [57]. the background state estimate 𝐱b :
The forecast step engenders an ensemble of forecasts that is served as
1 1
a proxy for the model predictions. This step involves the perturbation of 𝐽 (𝐱0 ) = (𝐱0 − 𝐱b )𝑇 𝐁−1
0
(𝐱0 − 𝐱b ) + (𝐲0 − (𝐱0 ))𝑇 𝐑−1
0
(𝐲0 − (𝐱0 )),
2 2
each ensemble member with a stochastic representation of the model’s
(𝑖)
error, denoted by 𝜺𝑘 : where 𝐁0 and 𝐑0 signify the background error covariance matrix and the
observation error covariance matrix, respectively, and  represents the
(𝑖)
𝐱𝑘|𝑘−1 (𝑖)
= 𝑘 (𝐱𝑘−1|𝑘−1 + 𝜺(𝑖) ). observation operator. The observation operator  can be a nonlinear
𝑘
mapping from the state space to observation space. The optimisation
Subsequently, the update step refines the ensemble estimates by process entails seeking the state estimate that minimises the objective
(𝑖)
assimilating observational data denoted as {𝐲𝑘 }. This assimilation is function, thus yielding an assimilated state 𝐱0 that effectively blends
pivotal as it introduces perturbations to the observations, thereby pre- model predictions and observations [48]. The 3DVar implementation
serving the effectiveness of the ensemble process [10]. This perturbation inside the package is described in Algorithm 3.
of observations serves to prevent the rapid decay of the entire vari- The 3DVar algorithm presents several advantages, including the
ance of the ensemble, consequently preserving the capacity of ensemble ability to naturally accommodate prior information, exploit statistical
to capture and represent the underlying system dynamics [10]. The properties of the background state, and incorporate observation un-
ensemble mean 𝐱̄ 𝑘|𝑘−1 is employed as a central estimate, and the co- certainties [48]. However, 3DVar only considers background state and
variance matrix 𝐏𝑘|𝑘−1 quantifies the spread of the ensemble forecasts. observations at a single time point, which is solving a static problem
The Kalman Gain 𝐊𝑘 is calculated based on the ensemble perturbations, by directly optimising a single estimate without regarding future evolu-
(𝑖)
and the updated ensemble estimates 𝐱𝑘|𝑘 are computed as: tions.

𝐊𝑘 = 𝐏𝑘|𝑘−1 𝐇𝑇𝑘 (𝐇𝑘 𝐏𝑘|𝑘−1 𝐇𝑇𝑘 + 𝐑𝑘 )−1 , Algorithm 3 Formulation and minimisation of the implemented 3DVar.
(𝑖) (𝑖) Inputs: , 𝐁, 𝐑, 𝐱𝑏 , 𝐲, 𝑁𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 , 𝛾
𝐱𝑘|𝑘 = 𝐱𝑘|𝑘−1 + 𝐊𝑘 (𝐲𝑘(𝑖) (𝑖)
− 𝐇𝑘 𝐱𝑘|𝑘−1 ).
parameters: 𝐱𝑏𝑎
The procedure of EnKF is also presented in Fig. 2a for reference, and 𝐱𝑏𝑎 ← optimizable state vector copied from 𝐱𝑏
the implementation of EnKF inside the package is described in Algo- for 𝑛 ← 1 to 𝑁𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 do
 (𝐱𝑏𝑎 ) = ||𝐱𝑏𝑎 − x𝑏 ||2 −1 + ||y − (𝐱𝑏𝑎 )||2 −1
rithm 2. B R
𝐱𝑏𝑎 ← 𝐴𝑑𝑎𝑚( , 𝐱𝑏𝑎 , 𝛾)
The EnKF algorithm is an ensemble-based approach which provides
end for
it with the capability to explicitly capture and propagate the inherent
Outputs: 𝐱𝑏𝑎
uncertainty within the system dynamics and observational measure-
ments [33,10]. Moreover, the ensemble nature facilitates the represen-

4
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 2. Data assimilation algorithms.

2.1.4. 4D variational data assimilation Algorithm 4 Formulation and minimisation of the implemented 4DVar.
The 4D Variational (4DVar) algorithm is characterised by its intrin- Inputs: , , 𝐁, 𝐑, 𝐱𝑏 , 𝐲, 𝑁𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 , 𝛾
sic capability to assimilate a sequence of observations across a finite parameters: 𝐱𝑏𝑎 , 𝑁𝑦
time interval, thus encompassing temporal dynamics and enhancing the 𝐱𝑏𝑎 ← optimizable state vector copied from 𝐱𝑏
accuracy of state estimation [48,55]. 𝑁𝑦 ← number of uncorrelated observations
As its 3DVar counterpart, the foundation of the 4DVar algorithm for 𝑛 ← 1 to 𝑁𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 do
is positioned in variational principles. However, the 4DVar extends its 𝑏 (𝐱𝑏𝑎 ) = ||𝐱𝑏𝑎 − x𝑏 ||2 −1
B
𝐱 = 𝐱𝑏𝑎
scope beyond spatial dimensions to encompass the temporal dimension,
𝑜 = ||y(1) − (𝐱𝑏𝑎 )||2 −1
thereby enabling the assimilation of observations within a predefined R
for 𝑘 ← 2 to 𝑁𝑦 do
temporal window [48,55]. This temporal integration effectively pro- 𝐗 = (𝐱)
vides the algorithm with the capability to address temporal evolution 𝐱 ← 𝐗[∶,−1]
and capture intricate time-varying dynamics [48,55]. The 4DVar algo- 𝑜 (𝐱) = 𝑜 (𝐱) + ||y(𝑘) − (𝐱)||2 −1
R
rithm initiates an optimisation process aimed at ascertaining the optimal end for
state trajectory that best aligns model predictions with observed data  = 𝑏 +  𝑜
over the designated temporal interval [48,55]. The objective function 𝐱𝑏𝑎 ← 𝐴𝑑𝑎𝑚( , 𝐱𝑏𝑎 , 𝛾)
represents a composite measure between the predicted states 𝑘 (𝐱𝑘 ) end for
and the observed data 𝐲𝑘 at each assimilation time step, accounting for Outputs: 𝐱𝑏𝑎
their temporal evolution:
1 by full space data assimilation, a pragmatic approach involves con-
𝐽 (𝐱0 ) = (𝐱0 − 𝐱b )𝑇 𝐁−1
0
(𝐱 − 𝐱b )
2 ducting data assimilation within a reduced space [2,5,16,62]. Recent
𝑡
1 ∑[ ] research endeavours have addressed the imperative of computational
+ (𝐲 − (𝑘 (𝐱𝑘 )))𝑇 𝐑−1
𝑘 (𝐲𝑘 − (𝑘 (𝐱𝑘 ))) , efficiency by integrating data assimilation with deep learning through
2 𝑘=0 𝑘
the utilisation of autoencoders [60,17]. These algorithms, collectively
where the summation spans the assimilation time steps, and 𝐱b denotes referred to as latent assimilation, effectively harness the computational
the background state estimate for the initial time step. The optimisa- efficacy inherent in deep learning, while preserving the precision of data
tion process involves determining the state trajectory 𝐱 that minimises assimilation. Within a specific subset of latent assimilation methodolo-
the aggregate objective function, yielding an assimilated state trajectory gies, autoencoders are trained on individual physical quantities with
that captures both spatial and temporal dynamics [48,55]. The proce- the objective of reconstructing the original physical quantities within
dure of 4DVar is also presented in Fig. 2b for reference, and implemen- their respective representation spaces. The compressed representations
tation of 4DVar inside the package TorchDA is described in Algorithm 4. situated at the bottleneck layer, positioned between the encoder and de-
The ability of the 4DVar algorithm to assimilate observations over coder components of the autoencoder, are subsequently employed for
a finite time window causes it particularly suited for scenarios charac- predictions and assimilation in latent space. To perform effective latent
terised by rapidly evolving and time-dependent processes [48,55]. By assimilation on state vector, these compressed representations are usu-
integrating temporal information, the 4DVar technique contributes to ally mapped into different latent spaces, because this process can involve
a more reasonable true state estimation in the whole assimilation win- more information from different physical quantities. The schematic de-
dow than 3DVar, but 4DVar is more computational expensive due to the piction of this latent assimilation concept is illustrated in Fig. 3.
forward predictions required through the time window. Regarding latent assimilation, it falls within two broad methodolog-
For more technical details about data assimilation algorithms, the ical categories. The first involves the incorporation of latent state space
readers are referred to the review papers [12] and [18]. data into full space observations during the assimilation process, while
the second entails the utilisation of latent state space data with latent
2.2. Integrating data assimilation and deep learning space observations. Peyron et al. [46] and Maulik et al. [39] adopted a
strategy wherein observations in the full physical space are employed to
As previously demonstrated, there is a growing trend in various ap- refine reduced order models, which aligns with the first category. Con-
plications and experiments that aim to combine data assimilation with versely, several other researchers have explored different approaches
deep learning algorithms. Given the computational challenges entailed involving the compression of both state variables and observations into

5
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 3. Schematic diagram of the latent data assimilation with deep neural network surrogate models.

latent space [1,17,38,35]. In addition to latent representation learned by kalman_filter: This module incorporates implementations of KF and
autoencoder, Casas et al. [13] propose a methodology that employs Re- EnKF algorithms for data assimilation.
current Neural Networks (RNNs) to acquire assimilated results within variational: The variational module accommodates the 3DVar and
a reduced space produced by Principal Components Analysis (PCA), 4DVar algorithms for variational data assimilation.
thereby enhancing future predictions. Therefore, the emergence of la- To understand more detailed prerequisites of this package imposed
tent assimilation emphasises the necessity for a more streamlined inte- on users, it is recommended to read documents provided in the GitHub
gration of data assimilation and deep learning. repository of this package.3 Some essential prerequisites of inputs and
In addition to employing autoencoders for representation compres- promise of outputs outlined as follows.
sion, recent research endeavours have also explored the substitution of Covariance Matrix 𝐁 and 𝐑:
the observation operator, which typically maps the state space to obser- The software imposes specific constraints on the error covariance
vation space, with deep neural networks. An illustrative study conducted matrices B and R; both must be positive definite matrices with row and
by Frerix et al. [26] involved the training of a deep neural network column dimensions aligned with the background state vector length or
to serve as an inverse observation operator, responsible for mapping observational state vector length accordingly.
the observation space back to the state space. This was accomplished Background State 𝐱0 :
through the modification of the cost function used during the optimi- The background state, denoted as 𝐱0 , must conform to a 1D or 2D
sation steps, employing a variational approach within the domain of tensor with the shape ([𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒], state dimension), with 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
data assimilation [26]. Conversely, Storto et al. [53] adopted a dif- being an optional parameter. 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒 is exclusively available for the
ferent approach by training a standard forward observation operator, 3DVar algorithm.
which maps state space variables to observation space. However, they Observations 𝐲 :
introduced a linearisation step for this operator around the background Observations or measurements, referred to as 𝐲 , are required to be
state [53]. This circumvented the need for directly solving the cost func- in the form of a 2D tensor. For the KF, EnKF, and 4DVar algorithms,
tion involving optimisation through a neural network. Consequently, the shape of 𝐲 should be (the state dimension of observations), with
it is evident that neural networks are increasingly being considered as the number of observations being at least 1 in KF or EnKF, and a min-
potential substitutes for observation operators in the realm of data as- imum of 2 in 4DVar. For the 3DVar algorithm, the shape of 𝐲 must be
similation. ([𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒], state dimension), and 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒 is optional, with the de-
fault being 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒 equal to 1.
3. Package structure State transformation function 𝑘 :
The state transformation function, denoted as 𝑘 , should be a
The software package is developed in Python and relies only on Py- callable code object capable of handling a 1D tensor input 𝐱0 with the
Torch as its unique dependency. The package encompasses five distinct shape (state dimension). The output of 𝑘 must be a 2D tensor with
modules, namely parameters, builder, executor, kalman_filter, and the shape (time window sequence length, state dimension).
variational. This Python package offers a versatile and user-friendly Observation Operator 𝐇𝑘 or  :
framework tailored for the seamless integration of data assimilation The observation operator, represented as 𝐇𝑘 or  , can take the
with neural networks across various algorithms, including Kalman Fil- form of either a tensor or a callable code object. In the KF algo-
ter (KF), EnKF, 3DVar, and 4DVar. Below, a summary of the primary rithm, 𝐇𝑘 can be a tensor. However, if  is a callable code object,
contents of each module is provided: it should be equipped to process a 2D tensor input 𝐱0 with the shape
parameters: This module contains the Parameters class, enabling ([𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒], state dimension). By default, the number of
users to specify data assimilation parameters. ensembles is set to 1 for all algorithms except EnKF. The output of
builder: The CaseBuilder class within this module facilitates the con- the callable  code object should correspondingly possess the shape
figuration and execution of data assimilation cases.
executor: The _Executor class implemented here is responsible for
3
executing data assimilation cases. TorchDA: https://github.com/acse-jm122/torchda.

6
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

([𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑠𝑒𝑚𝑏𝑙𝑒], measurement dimension), while in all other al- Listing 2 Passing parameters by a parameter object.
gorithms, the output should be a 2D tensor with the shape (1, state
1 parameters = torchda.Parameters(
dimension).
2 algorithm=torchda.Algorithms.EnKF,
In the case of KF, where a callable  code object is utilised, it is ad- 3 device=torchda.Device.CPU,
visable to work with a 1D tensor input 𝐱0 (state dimension) to mitigate 4 observation_time_steps=time_obs,
potential issues arising from Jacobian approximations and output un- 5 gaps=gaps,
6 num_ensembles=Ne,
certainties in this process. While KF has not been included as an option
7 observation_model=H,
in algorithm parameters due to these considerations, it is still accessible 8 output_sequence_length=lstm_model.out_seq_length,
as an individual function for customisation in user applications. 9 forward_model=lstm_model,
Output Structure: 10 background_covariance_matrix=P0,
Execution results are returned in the form of packed native Python 11 observation_covariance_matrix=R,
12 background_state=xtT[0],
dictionaries. Two distinct sets of outputs are available for various un-
13 observations=y.T,
derlying data assimilation algorithms: 14 )
1. In the context of the EnKF algorithm, it is possible to obtain two 15 run_case = torchda.CaseBuilder(parameters=parameters)
distinct categories of output: namely, “average_ensemble_all_states” and 16 results = run_case.execute()
“each_ensemble_all_states”. The former represents the average ensemble 17 xEnKF = results["average_ensemble_all_states"]
18 x_ens = results["each_ensemble_all_states"]
states across the entire forwarding time window in the EnKF process,
while the latter portrays the result states of each ensemble member
throughout the entire forwarding time window.
2. For the 3DVar or 4DVar algorithms, the output includes “as-
similated_state” and “intermediate_results”. The “assimilated_state”
signifies the optimised background state after the respective varia-
Listing 3 Passing parameters by setters.
tional algorithm has completed its run. The “intermediate_results” en-
compass a sub-level dictionary, containing “J”, “J_grad_norm”, and 1 case_to_run = (
“background_states” for each iteration in 3DVar, and “Jb”, “Jo”, “J”, 2 torchda.CaseBuilder()
“J_grad_norm”, and “background_states” for each iteration in 4DVar. 3 .set_observation_model(H)
4 .set_background_covariance_matrix(B)
To provide a practical demonstration of how to use this package,
5 .set_observation_covariance_matrix(R)
some actual code appeared in experiments on computational examples 6 .set_learning_rate(5)
were shown in code snippets. Users have the option to input parameters 7 .set_max_iterations(300)
using a basic dictionary, as exemplified in code snippet 1, but it is advis- 8 .set_algorithm(torchda.Algorithms.Var3D)
able to utilise an instance of Parameter class, as demonstrated in code 9 .set_device(torchda.Device.GPU)
10 )
snippet 2, which not only helps the use of code completion tools but also
11 ### In the forward prediction loop ###
minimises the likelihood of typing errors by hands. However, the most 12 case_to_run.set_background_state(out).set_observations(y_imgs)
recommended approach is to employ setters, as shown in code snippet 3. 13 out = case_to_run.execute()["assimilated_state"]
This method follows to the builder pattern, allowing users to dynami-
cally configure parameters during runtime, which avoids the need to
specify all parameters simultaneously. There are a total of 7 required pa-
rameters and 10 optional parameters depending on the specific assimila- This comprehensive description outlines the salient features and re-
tion algorithm choices. The required parameters consist of “algorithm,” quirements of the software package, aiding users in effectively utilising
“device,” “observation_model,” “background_covariance_matrix,” “ob- it for data assimilation integrated with deep learning across a spectrum
servation_covariance_matrix,” “background_state,” and “observations”. of algorithms. The comprehensive structure of the package arrangement
These parameters are crucial for executing the selected algorithm. For is also visually presented in Fig. 4.
instance, users must specify the device they intend to use for running
their applications, with current supported options including CPU and
4. Computational examples
GPU. Additionally, if optimisation based algorithms include 3DVar and
4DVar are chosen, users need to assign values for learning rate, maxi-
mum iterations, and optimisation method. The current underlying opti- To validate the functionality and correctness of the package while
misation algorithm employed is Adam [34], where learning rate serves further enhancing comprehension of data assimilation algorithms, com-
as its unique associated parameter. prehensive assessments were conducted. These assessments involved
the application of deep neural networks, encompassing convolutional
Listing 1 Passing parameters by a Python dictionary. autoencoders, residual networks, and multilayer LSTM networks, to
two distinct models: the Lorenz 63 model [37], and the shallow wa-
1 params_dict = { ter model [58]. The rationale for selecting the Lorenz 63 model and
2 "algorithm": torchda.Algorithms.Var4D,
the shallow water model as illustrative examples is rooted in their in-
3 "observation_model": H,
4 "background_covariance_matrix": B, herent chaotic and highly dynamic nature [37,52,20], causing them to
5 "observation_covariance_matrix": R, be ideal scenarios for the integration of deep neural networks as sur-
6 "background_state": xb, rogate models within the context of data assimilation [52,20]. For the
7 "observations": y, Lorenz 63 model, the performance of EnKF and 4DVar were evaluated,
8 "forward_model": M,
9 "output_sequence_length": gap + 1,
while employing a multilayer LSTM network for forward predictions. In
10 "observation_time_steps": time_obs, the case of the shallow water model, 3DVar and 4DVar were combined
11 "gaps": [gap] * (len(time_obs) - 1), with autoencoders, residual networks, and multilayer LSTM networks
12 "learning_rate": 7.5e-3, separately. Specifically, autoencoders were employed to condense rep-
13 "args": (rayleigh, prandtl, b),
14 }
resentations of full space physical quantities into reduced latent space
15 results_4dvar = torchda.CaseBuilder().set_parameters(params_dict) representations, a residual network served as the observation operator,
.execute() and a multilayer LSTM network was utilised for forward predictions in
latent space.

7
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 4. Package structure and dependencies.

4.1. Lorenz 63 model 4.1.1. Experiment without data assimilation


In the baseline setting, the multilayer LSTM network performed for-
Lorenz 63 system provides a simplified yet chaotic representation ward predictions without any corrections. Following the execution of
of dynamic systems, facilitating comprehensive assessments of data predictions within a specified time window using only the forward
assimilation algorithms [24,23]. Its inherent chaos and sensitivity to model, an evaluation of errors on each component was conducted. This
initial conditions, along with a known mathematical solution, estab- evaluation involved the utilisation of the Relative Root Mean Square
lish an appropriate benchmark for evaluating the capabilities of algo- Error (RRMSE) metric, which is the Mean Square Error (MSE) on each
rithms [37]. Specifically, the progression of the Lorenz 63 model is component normalised by the mean square value of the corresponding
delineated through a set of differential equations presented below. component in the reference trajectory. By employing this normalised rel-
ative error calculation, both the predictions generated by the forward
d𝑋
= −𝜎(𝑋 − 𝑌 ), (1) model and the reference trajectory were analysed. The resulting plots
d𝑡
in Fig. 6a depict the errors observed in each component relative to the
d𝑌
= −𝑋𝑍 + 𝑟𝑋 − 𝑌 , (2) reference trajectory.
d𝑡
It is worth noting that the trajectory predicted by the forward model
d𝑍
= 𝑋𝑌 − 𝛽𝑍, (3) exhibits a notable disparity when compared to the reference trajectory.
d𝑡
This phenomenon can be attributed to the chaotic nature of the Lorenz
where {𝑋, 𝑌 , 𝑍} ∈ ℝ3 . 𝜎 represents the Prandtl number, denoting the 63 system [37]. Chaotic systems like this exhibit unstable and nonpe-
ratio of kinematic viscosity to thermal diffusivity [37]. The parameter 𝑟 riodic behaviour, even when subjected to tiny numerical variations in
signifies the ratio of the Rayleigh number 𝑅𝑎 to the critical value of the initial conditions [37]. Consequently, predicting the behaviour of this
Rayleigh number 𝑅𝑎𝑐 , and 𝛽 represents a geometrical factor [37]. system accurately at each time step, even with the aid of a potent se-
Given the inherent uncertainty associated with the Lorenz 63 system, quential model such as the 20-layer LSTM network employed in this
particularly regarding its evolving trajectory for different initial condi- instance, remains a significant challenge.
tions, a standardised initial state has been established as [𝑋, 𝑌 , 𝑍]𝑇 =
[0, 1, 2]𝑇 . The control parameters for the system have been fixed as fol- 4.1.2. Experiment of EnKF
lows: 𝜎 = 10, 𝑟 = 35, and 𝛽 = 8∕3. The temporal intervals between each In this study, the chosen assimilation algorithm is the EnKF, which
state update have been set at 10−3 time units, and the total evolution comprises 50 ensemble members. The standard deviation of the ensem-
duration has been defined as 25 time units. The numerical integration ble members in this case is set to 1 for all three components 𝑋, 𝑌 , 𝑎𝑛𝑑 𝑍 .
technique employed for this evolution is the forward Euler method. Fol- Following the computation of the RRMSE between the forward model
lowing the execution of this simulation, the resulting data is visually predictions with EnKF correction and the reference trajectory, Fig. 6b
depicted in the Fig. 5. shows the predictions of the forward model post-EnKF correction, along-
To establish a forward model for sequential prediction, a 20-layer side the reference trajectory, complete with the RRMSE profiles for each
LSTM network was employed. This LSTM network possesses an input component.
dimension of 3 and a hidden dimension of 12. It was trained with the It is noticeable that the EnKF algorithm assisted predictions to
parameter settings previously mentioned, except for the total evolution achieve 90.94% and 69.35% improvements in MSE and RRMSE respec-
time, which was extended to 500 time units to ensure the sufficiency of tively. The result shows a promising expectation of engaging EnKF as a
training data. This network has the capability to predict 99 future time foundation algorithm in the data assimilation package, which supports
steps based on a given state vector at the current time step, because we both nonlinear forward model and nonlinear observation operator.
aim to test the data assimilation effects in long-term predictions. To em-
ulate this characteristic, the sampling frequency for observations was 4.1.3. Experiment of 4DVar
established at 0.5 time units. This implies that there are 0.5∕10−3 = 500 In contrast to EnKF, 4DVar directly optimises the evolution trajec-
state evolutions occurring between each observation point. Addition- tory of the system based on all observations and starting background
ally, in realistic scenarios, instruments used for measurements are sus- state in the entire evolution trajectory. In this scenario, three specific
ceptible to introducing measurement noise. In this context, the standard observation points indicated by three vertical dotted lines in Fig. 7a
deviation of the observation noise was configured to 10−3 to simulate were selected as constituents of the 4DVar process. The standard de-
the presence of a moderately accurate instrument during measurements. viation of the background error is set to 1 for all three components
Fig. 5 provides a representation of the outcomes resulting from this noisy 𝑋, 𝑌 , 𝑎𝑛𝑑 𝑍 , and the standard deviation of the observation noise was
sampling process. configured to 0.5. Both standard deviation of the background error and

8
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 5. Reference trajectory with noisy observations.

Table 1
Evaluation metrics in different time window for the Lorenz example.

0 to 25 units 7 to 8 units

No Assimilation MSE 158.31 241.47


RRMSE 0.62 0.79

EnKF MSE 14.35 (90.94%↑) 177.46 (26.51%↑)


RRMSE 0.19 (69.35%↑) 0.68 (13.92%↑)

4DVar MSE - 179.50 (25.66%↑)


RRMSE - 0.68 (13.92%↑)

the observation noise are the same configuration as the previous ex- In this scenario, the initial condition is established as a cylindrical body
periment. Precisely, the optimisation algorithm employed in 4DVar is of water with a specified radius, released at 𝑡 = 0. It is assumed that the
Adam [34], with a specified learning rate of 0.5. The optimisation pro- horizontal length scale on the two-dimensional surface holds greater
cess encompasses 200 iterations. Upon completion of the 4DVar process, significance than the vertical scale perpendicular to the surface, and the
it is noticeable from Fig. 7a that the 4DVar assimilated result trajectory Coriolis force is intentionally disregarded. These factors give rise to the
has a more similar shape to the reference trajectory when compared Saint-Venant equations [58], which couple velocity components of the
to the EnKF trajectory at the same window. The RRMSE analysis was fluid (represented as 𝑢 and 𝑣) in two dimensions, measured in 0.1𝑚∕𝑠,
conducted to quantitatively evaluate this performance. To ensure an eq- with the height of the fluid (ℎ), expressed in millimetres. These equa-
uitable comparison between the performance of EnKF and 4DVar, both tions can be formally expressed as equations from Eq. (4) to (8). It is
algorithms were subjected to RRMSE calculations within the time in- noteworthy that in this context, the vertical direction is denoted as the
terval demarcated by the vertical black dotted lines. Fig. 7b provides downward 𝑦 direction. The gravitational constant (𝑔 ), representative of
a closer examination of the prediction trajectories generated by both the gravity of Earth, is appropriately scaled to unity (1), and the dynam-
EnKF and 4DVar in comparison to the reference trajectory. Addition- ical system is formulated in a non-conservative form.
ally, this figure includes the RRMSE values for both EnKF and 4DVar
𝜕𝑢 𝜕ℎ
prediction trajectories. Meanwhile, for explicit demonstration and com- = −𝑔 − 𝑏𝑢 (4)
𝜕𝑡 𝜕𝑥
parative purposes, all evaluation metrics are presented in Table 1.
𝜕𝑣 𝜕ℎ
It is discernible that the prediction trajectory achieved through the = −𝑔 − 𝑏𝑣 (5)
4DVar and the EnKF algorithm achieved similar quantitative results. The
𝜕𝑡 𝜕𝑦
entire evolutionary process within the specified time window of 4DVar 𝜕ℎ 𝜕(𝑢ℎ) 𝜕(𝑣ℎ)
=− − (6)
assimilated result achieved slightly smoother trajectory compared with 𝜕𝑡 𝜕𝑥 𝜕𝑦
the EnKF result. Nevertheless, it is important to acknowledge that the 𝑢𝑡=0 = 0 (7)
entire qualitative result indicates the EnKF result achieved better result
in this assimilation window. This is because EnKF has the property of 𝑣𝑡=0 = 0 (8)
dynamic representation of errors, which causes it to surpass 4DVar in
terms of distance measurement to the reference trajectory [7]. 4.2.1. Experimental settings
The initial values of u and v are both set to zero for the entire veloc-
4.2. Shallow water model ity field, and the height of the water cylinder is positioned at a height
of 0.1𝑚𝑚 with 8𝑚𝑚 radius above the still water at the central location
A standard shallow-water fluid mechanics system is commonly em- of the simulation area. The domain size (𝐿𝑥 × 𝐿𝑦 ) is (64𝑚𝑚 × 64𝑚𝑚),
ployed as a benchmark for assessing the performance of data assim- discretised with a regular square grid of size (64 × 64). Equations (4) to
ilation algorithms, including 3DVar and 4DVar [52,20]. This system (8) are approximated using a first-order finite difference method, and
represents a nonlinear and time-dependent wave propagation problem. discrete time 𝑘 replaced the continuous time 𝑡. Time integration is also

9
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 6. RRMSE on each component between reference trajectory and trajectory of forward model prediction with or without EnKF.

first order, with a time interval of 𝛿𝑘 = 10−4 𝑠, extending the simulation sponding 𝐲𝑘 image at the same time instance. This deep residual network
up to 𝑘final = 0.8𝑠. To provide a visual demonstration of the initial con- serves as a model for the observation operator  , which maps full space
dition and the subsequent wave propagation pattern mentioned earlier, 𝐱𝑘 to full space 𝐲𝑘 . These statements can also be denoted by mathe-
a sequence of successive snapshots illustrating the height of the water matical expressions presented in Equations (9) to (13). There was 1%
cylinder is presented in Fig. 8. randomly selected data (80 samples) in all generated data utilised in val-
At each evolution step, the entire simulation field for physical quanti- idation for 𝐱𝑘 autoencoder, 𝐲𝑘 autoencoder, and observation operator
ties u (𝐱𝑘 ) and h (𝐲𝑘 ) is recorded as grayscale images at full size (64 ×64).  . All remaining data were adopted for training these neural networks.
These images of u are treated as the full state space variable {𝐱𝑘 }. A
convolutional autoencoder is trained on these 𝐱𝑘 images to compress state 𝐱𝑘 = 𝐷𝑢 (𝐸𝑢 (𝐱𝑘 )) ← 𝑢 (9)
them from their original (64 × 64) size to a 1D latent space vector repre-
sentation 𝐱̂ 𝑘 with 32 elements, which is then reconstructed back to the latent state 𝐱̂ 𝑘 = 𝐸𝑢 (𝐱𝑘 ) (10)
original image. Another convolutional autoencoder with the same archi-
observation 𝐲𝑘 = 𝐷ℎ (𝐸ℎ (𝐲𝑘 )) = (𝐱𝑘 ) = (𝐷𝑢 (𝐱̂ 𝑘 )) ← ℎ (11)
tecture is trained on 𝐲𝑘 using the same process as the autoencoder for
𝐱𝑘 . These encoders trained on each autoencoder are utilised in the sub- ̂ 𝐱̂ 𝑘 ) = 𝐸ℎ ((𝐷𝑢 (𝐱̂ 𝑘 )))
observation operator ( (12)
sequent latent data assimilation process. Additionally, a convolutional
deep residual network is trained to map a given 𝐱𝑘 image to its corre- ̂ 𝐱̂ 𝑘 ) = 𝐸ℎ (𝐲𝑘 )
latent observation 𝐲̂ 𝑘 = ( (13)

10
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 7. Reference trajectory, 4DVar prediction trajectory, and trajectory of EnKF correction on forward model prediction.

Fig. 8. A sequence of successive wave propagation snapshots of the water cylinder height start from initial condition at 𝑘 = 0 to 𝑘 = 0.06𝑠 with time interval
Δ𝑘 = 200 × 𝛿𝑘 = 0.02𝑠.

11
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 9. Latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 : MSE for 3DVar latent assimilation and images showcase.

𝐁 = 𝐈(32×32) (the latent space is of dimension 32) (14) differences below 0.1 truncated), and the absolute difference between
the original 𝐱𝑘 image and the decoded image produced through 3DVar
𝐑full = 10−10 ⋅ 𝐈(64×64)2 (the full space is of dimension 64 × 64) (15)
assimilation (with differences below 0.1 truncated). Subsequently, the
𝐑latent = 10−10 ⋅ 𝐈(32×32) (16) Mean Squared Error (MSE) is calculated between the reconstructed im-
age and the original image recorded at the corresponding time instance.
Moreover, a 20-layer LSTM network is employed to learn the evolu-
The resulting curve depicting the evolution of MSE over time steps is
tion of 𝐱𝑘 in latent space, with an input dimension of 32 and a hidden
presented as Fig. 9a. By observing both evaluation metric plot and im-
dimension of 256. This LSTM network was validated on 1% sequential
ages showcase, it is noticeable in Fig. 9 that each application of 3DVar is
data (80 samples) selected at the end of the whole simulation win-
capable of reducing model prediction errors by obtaining a better state
dow, and all other data were utilised in training the network. Following
estimate at the observation step.
all processes above, the only missing component is the operator ̂ ,
which can perform reduced space assimilation. This operator can be
constructed by combining various elements from previous works, in- 4.2.3. 3DVar: latent space 𝐱̂ 𝑘 to latent space 𝐲̂ 𝑘
cluding 1) the decoder of 𝐱𝑘 from the 𝐱𝑘 autoencoder, 2) the model  In this context, latent assimilation is conducted, utilising the observa-
tion operator denoted as  ̂ , responsible for mapping the latent space 𝐱̂ 𝑘
mapping full space 𝐱𝑘 to full space 𝐲𝑘 , and 3) the encoder of 𝐲𝑘 from
the 𝐲𝑘 autoencoder, arranged in a sequential order. to the latent space 𝐲̂ 𝑘 . ̂ is a MLP neural network, trained separately to
link the latent state space and the latent observation space (see Eq. (13)).
4.2.2. 3DVar: latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 Following the previous approach, the same three time instances were
In this scenario, the latent assimilation was conducted by using an selected for latent assimilation. At each assimilation time point, the op-
observation operator (𝐷𝑢 (𝐱̂ 𝑘 )) capable of mapping the latent space 𝐱̂ 𝑘 timisation process constructs a cost function utilising 𝐁 in Eq. (14) and
to the full space 𝐲𝑘 . This operator is constructed using the decoder from 𝐑latent in Eq. (16). The settings of the optimisation algorithm remain un-
the autoencoder 𝐱𝑘 and the full space 𝐱𝑘 to full space 𝐲𝑘 observation changed, and evaluation procedure also remains the same as before. To
operator denoted as  . In the case of 3DVar, there are three selected provide a comprehensive assessment for the performance of the latent
specific time instances for latent assimilation. To be precise, latent as- assimilation, the RRMSE and the Structural Similarity Index Measure
similation is performed every 50 forward predictions, corresponding to (SSIM) were evaluated, in addition to the MSE metric. These evalua-
the 1850th (𝑡1 = 0.185𝑠), 1900th (𝑡2 = 0.19𝑠), and 1950th (𝑡3 = 0.195𝑠) tion metrics are presented in Table 2. As shown by Fig. 10, it is evident
recording steps within the full simulation time window. that both latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 and latent space 𝐱̂ 𝑘 to latent
During each assimilation time point, the cost function for optimi- space 𝐲̂ 𝑘 assimilation are effectively reducing model prediction errors,
sation is constructed using 𝐁 in Eq. (14) and 𝐑𝑓 𝑢𝑙𝑙 in Eq. (15). The but these reconstructed images presented that the assimilation involved
optimisation algorithm employed remains Adam [34], with a learning in latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 slightly outperforms the latent data as-
rate of 5 and a maximum of 300 iterations. similation scheme with 𝐱̂ 𝑘 and 𝐲̂𝑘 . One possible reason is that full space
At each step of the latent vector prediction, decoding is performed us- 𝐲𝑘 contains more effective information than highly compressed latent
ing the decoder of the autoencoder 𝐱𝑘 . Fig. 9b presents various elements, space 𝐲̂ 𝑘 , but the assimilation scheme adopted in Section 4.2.2 is rel-
comprising the original 𝐱𝑘 image, the decoded prediction image, the de- atively computational expensive. This is because the optimisation step
coded image derived from 3DVar assimilation, the absolute difference contains the calculation of a noticeably larger observation covariance
between the original 𝐱𝑘 image and the decoded prediction image (with matrix 𝐑.

12
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 10. Latent space 𝐱̂ 𝑘 to latent space 𝐲̂ 𝑘 : MSE for 3DVar latent assimilation and images showcase.

Table 2
Evaluation metrics on each data assimilation (DA) point in 3DVar.

time step 1850th 1900th 1950th

No Assimilation MSE 0.0019 0.0053 0.0098


RRMSE 0.0830 0.1382 0.1881
SSIM 0.8974 0.7700 0.6080

Latent Space 𝐱̂ 𝑘 to MSE 0.0003 (84.21%↑) 0.0003 (94.34%↑) 0.0002 (97.96%↑)


Full Space 𝐲𝑘 RRMSE 0.0316 (61.93%↑) 0.0308 (77.71%↑) 0.0290 (84.58%↑)
SSIM 0.9726 (7.73%↑) 0.9707 (20.68%↑) 0.9745 (37.61%↑)

Latent Space 𝐱̂ 𝑘 to MSE 0.0007 (63.16%↑) 0.0007 (86.79%↑) 0.0005 (94.90%↑)


Latent Space 𝐲̂ 𝑘 RRMSE 0.0507 (38.92%↑) 0.0518 (62.52%↑) 0.0443 (76.45%↑)
SSIM 0.9473 (5.27%↑) 0.9459 (18.60%↑) 0.9572 (36.48%↑)

4.2.4. 4DVar: latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 SSIM in addition to the MSE metric were also conducted, and these
In this scenario, latent assimilation is conducted using the full space findings are presented in Table 3 and Fig. 12. It is also evident that
operator  , consistent with Section 4.2.2. The latent assimilation is ini- both assimilation schemes involving 4DVar can effectively reduce model
tiated at the 1900th recorded step (𝑡1 = 0.19𝑠) and is combined with prediction errors, and the scheme contains full space 𝐲𝑘 still slightly
another observation recorded at the 1910th step (𝑡2 = 0.191𝑠) within outperforms another scheme stated in this section. However, similar to
the full simulation time window. At the assimilation time point, the the 3Dvar case, the computational cost for the assimilation scheme in
cost function for optimisation is constructed using 𝐁 in Eq. (14) and Section 4.2.4 is also relatively expensive due to the optimisation step
𝐑full in Eq. (15). The optimisation algorithm remains Adam [34], with including a significantly larger matrix 𝐑.
a learning rate of 2.5 and a maximum of 100 iterations. The evaluation
procedure remains the same as before. It is noticeable in Fig. 11 that 4.2.6. Results and discussion
4DVar is also capable of fixing model prediction errors by obtaining a It is evident that all outcomes resulting from the data assimilation
better state estimate, but this better estimate involves consideration of process surpass the performance of pure forward predictions conducted
all observations in the whole assimilation window. without data assimilation. Furthermore, it is notable that, within the en-
tire simulation time window, 3DVar exhibits a slightly superior outcome
4.2.5. 4DVar: latent space 𝐱̂ 𝑘 to latent space 𝐲̂ 𝑘 than 4DVar at the 1900th step for the latent 𝐱̂ 𝑘 to full space 𝐲𝑘 case.
In this scenario, latent assimilation is conducted utilising the obser- An important contributing factor to this distinction is the difference in
vation operator denoted as  ̂ . Similar to the previous approach, all the optimisation steps, with 3DVar employing 300 iterations, while 4DVar
same time instances were selected as the assimilation points. At these utilises only 100 iterations. This difference arises from 3DVar involving
designated assimilation points, the cost function for optimisation is con- only a one-to-one state assimilation without the necessity of forward
structed utilising 𝐁 in Eq. (14) and 𝐑latent in Eq. (16). The set up of the predictions, as is required in 4DVar. Consequently, 4DVar is relatively
optimisation algorithm remains unchanged, and the evaluation proce- computationally expensive. In reality, the computational time for 300
dure remains the same as before. To further inspect the performance of iterations in 3DVar and 100 iterations in 4DVar is roughly equivalent in
the latent assimilation results, assessments using the RRMSE and the benchmark testing. Hence, it becomes important for users to carefully

13
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Fig. 11. Latent space 𝐱̂ 𝑘 to full space 𝐲𝑘 : MSE for 4DVar latent assimilation and images showcase.

Fig. 12. Latent space 𝐱̂ 𝑘 to latent space 𝐲̂ 𝑘 : MSE for 4DVar latent assimilation and images showcase.

balance computational costs against the quality of results when selecting By testing EnKF, 3DVar, and 4DVar that are commonly adopted in
an algorithm. In addition, the average inference time for all experiments data assimilation on Lorenz 63 system and shallow water example, the
conducted in the shallow water model has been listed in Table 4, and all result on each test case indicates that the package was implemented
tests were conducted on a NVIDIA GeForce RTX 3080 Ti Laptop GPU. correctly. In the whole process of utilising the package, it only requires
The benchmarking results indicate the relatively acceptable time con- users to directly setup relevant parameters in a CaseBuilder object before
sumption for data assimilation aided forward predictions. launch the execution of the actual algorithm. 𝑘 and  can be set at

14
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Table 3
Evaluation metrics on each data assimilation (DA) point in 4DVar.

1900th 1910th

No Assimilation MSE 0.0053 0.0060


RRMSE 0.1382 0.1477
SSIM 0.7700 0.7408

Latent Space 𝐱̂ 𝑘 to Full Space 𝐲𝑘 MSE 0.0003 (94.34%↑) 0.0006 (90.00%↑)


RRMSE 0.0318 (76.99%↑) 0.0456 (69.13%↑)
SSIM 0.9697 (20.59%↑) 0.9564 (22.54%↑)

Latent Space 𝐱̂ 𝑘 to Latent Space 𝐲̂ 𝑘 MSE 0.0007 (86.79%↑) 0.0006 (90.00%↑)


RRMSE 0.0488 (64.69%↑) 0.0456 (69.13%↑)
SSIM 0.9469 (18.68%↑) 0.9564 (22.54%↑)

Table 4
Average inference time for the 20-layer LSTM from 1800th record to 2000th
record.

No Assimilation With Assimilation

3DVar Latent Space 𝐱̂ 𝑘 to Full Space 𝐲𝑘 0.01 s 0.43 s


Latent Space 𝐱̂ 𝑘 to Latent Space 𝐲̂ 𝑘 0.01 s 0.07 s

4DVar Latent Space 𝐱̂ 𝑘 to Full Space 𝐲𝑘 0.01 s 0.13 s


Latent Space 𝐱̂ 𝑘 to Latent Space 𝐲̂ 𝑘 0.01 s 0.05 s

any time before running the algorithm, and both should be callable ob- Future work can focus on expanding the built-in algorithm collec-
jects, which is a loose constrain on 𝑘 and  . This package is capable tion with more advanced techniques, such as differentiable EnKF [36],
of handling either explicit or implicit 𝑘 and  in any neural network sigma-point Kalman Filters (SPKF), and Particle Filter (PF) [56]. This
architectures, which enhanced flexibility and compatibility. Similarly, package can also incorporate hybrid data assimilation techniques in the
the data assimilation algorithm is also provided as an option inside, future, including ETKF-3DVAR [59], Ensemble 4DVar (En4DVar) [64],
and it can also be set at any time before launch the execution. Users and 4D Ensemble-Variational (4DEnVar) [64] methods. Furthermore, it
have at least three different approaches to setup parameters such as is worth investigating scientific methods for evaluating covariance ma-
algorithms, 𝑘 , and  , which contains set by Parameters object, dic- trices 𝐁𝑘 and 𝐑𝑘 in the future. The package demonstrates promising po-
tionary passing, and setter methods calling. All setter methods support tential to facilitate novel deep learning and data assimilation integrated
chained calling, so users are less likely to setup parameters incorrectly. solutions, advancing scientific research across domains characterised
Therefore, this package provides user friendly interfaces and not loss by high dimensional nonlinear dynamics. Effort will also be dedicated
flexibility of handling nonlinear state transformation function 𝑘 and to applying the new TorchDA package to real-world data assimilation
observation operator  with any neural network architectures. problems, where more engineering techniques, such as localization and
inflation schemes, will be further examined within the proposed pack-
5. Conclusion and future work age. To reduce the computational burden of 4DVar algorithms, future
implementations will include an official setup of the incremental 4DVar
In this study, a novel Python package TorchDA was developed algorithms [25] within the TorchDA package. Additionally, for the cru-
for seamlessly integrating data assimilation with deep learning mod- cial task of covariance specification in data assimilation, covariance
els. TorchDA provides implementations of various algorithms including tuning methods (e.g., [21]) are planned for incorporation in future ver-
Kalman Filter, EnKF, 3DVar, and 4DVar methods. It allows users to sions.
substitute conventional numerical models with neural networks trained
as models for the state transformation function and observation oper-
ator. This novel package offers the capability to conduct variational or CRediT authorship contribution statement
Kalman-type data assimilation using a non-explicit transformation func-
tion, which is represented by neural networks. This feature addresses a Sibo Cheng: Writing – original draft, Software, Methodology, For-
significant limitation of current data assimilation tools, which are un- mal analysis, Data curation, Conceptualization. Jinyang Min: Writing
able to handle such non-explicit functions. – original draft, Software, Data curation. Che Liu: Writing – review &
The functionality and effectiveness of the package were demon- editing, Software, Methodology. Rossella Arcucci: Writing – review &
strated through comprehensive experiments on the Lorenz 63 system editing, Supervision, Project administration.
and shallow water equations. The Lorenz 63 example illustrated the
capabilities of EnKF and 4DVar algorithms integrated with multilayer
LSTM networks to track the reference trajectory. The shallow water Declaration of competing interest
model assessment involved autoencoders and residual networks acting
as observation operators between different physical quantities, validat- The authors declare that they have no known competing financial
ing the latent assimilation workflow. In both experiments, assimilated interests or personal relationships that could have appeared to influence
results using the package significantly outperformed raw model pre- the work reported in this paper.
dictions without assimilation. Overall, this innovative package offers
researchers a flexible tool to harness the representation power of deep
Data availability
learning for modelling within the data assimilation paradigm, including
some examples such as ocean climate prediction, nuclear engineering,
hydraulics, and fluid dynamics presented in the introduction section. Data will be made available on request.

15
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

Acknowledgments Listing 6 General structure of the _Executor class.

1 class _Executor:
The first author acknowledges the support of the French Agence Na- 2 def set_input_parameters(parameters):
tionale de la Recherche (ANR) under reference ANR-22-CPJ2-0143-01. 3 # Set input parameters for data assimilation
4 # Update parameters of the Executor
5 def __check_[algorithm]_parameters():
Appendix A. Python interfaces 6 # Check algorithm parameters for validity
7 def __call_apply_[algorithm]():
8 # Call apply_[algorithm] with configured parameters
9 # Return algorithm results
10 def __setup_device():
Listing 4 General structure of the parameters class. 11 # Set up computation device (CPU or GPU) based on user
preference
1 @dataclass 12 def run():
2 class Parameters: 13 # Run selected data assimilation algorithm
3 algorithm 14 # Return results as a dictionary
4 device 15 def get_results_dict():
5 observation_model 16 # Get deep copy of results dictionary
6 background_covariance_matrix 17 def get_result(name):
7 observation_covariance_matrix 18 # Get deep copy of a specific result by name
8 background_state
9 observations
10 """Following parameters are OPTIONAL"""
11 forward_model
References
12 output_sequence_length
13 observation_time_steps
14 gaps [1] M. Amendola, R. Arcucci, L. Mottet, C.Q. Casas, S. Fan, C. Pain, P. Linden, Y.-K.
15 num_ensembles Guo, Data assimilation in the latent space of a neural network, preprint, arXiv:2012.
16 start_time 12056, 2020.
17 max_iterations [2] R. Arcucci, L. Mottet, C. Pain, Y.-K. Guo, Optimal reduced space for variational data
18 learning_rate assimilation, J. Comput. Phys. 379 (2019) 51–69.
19 record_log [3] R. Arcucci, J. Zhu, S. Hu, Y.-K. Guo, Deep data assimilation: integrating deep learning
20 args with data assimilation, Appl. Sci. 11 (3) (2021) 1114.
[4] A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, vol. 3, Springer, 2009.
[5] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, P. Wojtaszczyk, Data as-
similation in reduced modeling, SIAM/ASA J. Uncertain. Quantificat. 5 (1) (2017)
1–29.
[6] M. Bocquet, J. Brajard, A. Carrassi, L. Bertino, Bayesian inference of chaotic dynam-
Listing 5 General structure of the CaseBuilder class. ics by merging data assimilation, machine learning and expectation-maximization,
preprint, arXiv:2001.06270, 2020.
1 class CaseBuilder: [7] M. Bocquet, P. Sakov, Joint state and parameter estimation with an iterative ensem-
2 def __init__(self, case_name=None, parameters=None): ble Kalman smoother, Nonlinear Process. Geophys. 20 (5) (2013) 803–818.
3 # Initialize case_name and parameters [8] P. Boudier, A. Fillion, S. Gratton, S. Gürol, S. Zhang, Data assimilation networks, J.
4 # If case_name is None, set it to the current timestamp Adv. Model. Earth Syst. 15 (4) (2023) e2022MS003353.
5 # Initialize parameters as an instance of Parameters [9] J. Brajard, A. Carrassi, M. Bocquet, L. Bertino, Combining data assimilation and
class machine learning to emulate a dynamical model from sparse and noisy observations:
6 # Method to set a batch of parameters a case study with the Lorenz 96 model, J. Comput. Sci. 44 (2020) 101171.
7 def set_parameters(self, parameters): [10] G. Burgers, P.J. Van Leeuwen, G. Evensen, Analysis scheme in the ensemble Kalman
8 # If parameters is an instance of Parameters: filter, Mon. Weather Rev. 126 (6) (1998) 1719–1724.
9 # Convert it to a dictionary [11] M. Camporese, M. Girotto, Recent advances and opportunities in data assimilation
10 # Create a new CaseBuilder object called checked_builder for physics-based hydrological modeling, Frontiers in Water 4 (2022) 948832.
11 # For each parameter in parameters: [12] A. Carrassi, M. Bocquet, L. Bertino, G. Evensen, Data assimilation in the geosciences:
12 # Check if the parameter exists in Parameters class an overview of methods, issues, and perspectives, Wiley Interdiscip. Rev.: Clim.
13 # Check if there is a corresponding setter method Change 9 (5) (2018) e535.
14 # Set the parameter using the setter method [13] C.Q. Casas, R. Arcucci, P. Wu, C. Pain, Y.-K. Guo, A reduced order deep data assim-
15 # Update the parameters of this object ilation model, Phys. D: Nonlinear Phenom. 412 (2020) 132615.
16 # Method to set an individual parameter [14] C. Chen, Y. Dou, J. Chen, Y. Xue, A novel neural network training framework with
17 def set_parameter(self, name, value): data assimilation, J. Supercomput. 78 (17) (2022) 19020–19045.
18 # Check if the parameter name exists in Parameters class [15] S. Cheng, J. Chen, C. Anastasiou, P. Angeli, O.K. Matar, Y.-K. Guo, C.C. Pain, R. Ar-
19 # Check if there is a corresponding setter method for the cucci, Generalised latent assimilation in heterogeneous reduced spaces with machine
parameter learning surrogate models, J. Sci. Comput. 94 (1) (2023) 11.
20 # Call the setter method with the given value [16] S. Cheng, D. Lucor, J.-P. Argaud, Observation data compression for variational as-
21 # Methods to set various configuration parameters similation of dynamical systems, J. Comput. Sci. 53 (2021) 101405.
22 def set_[parameter_name](self, [parameter_name]) [17] S. Cheng, I.C. Prentice, Y. Huang, Y. Jin, Y.-K. Guo, R. Arcucci, Data-driven surrogate
23 # Method to execute the data assimilation case model with latent data assimilation: application to wildfire forecasting, J. Comput.
24 def execute(self): Phys. 464 (2022) 111302.
25 # Set input parameters for execution [18] S. Cheng, C. Quilodrán-Casas, S. Ouala, A. Farchi, C. Liu, P. Tandeo, R. Fablet, D.
26 # Run the data assimilation case Lucor, B. Iooss, J. Brajard, et al., Machine learning with data assimilation and un-
27 # Return the results as a dictionary certainty quantification for dynamical systems: a review, IEEE/CAA J. Autom. Sin.
28 # Methods to retrieve results and parameters 10 (6) (2023) 1361–1387.
29 def get_results_dict(self): [19] H. Cho, D. Venturi, G.E. Karniadakis, Numerical methods for high-dimensional ki-
30 # Get the dictionary containing results netic equations, Uncertainty Quantification for Hyperbolic and Kinetic Equations
31 def get_result(self, name): (2017) 93–125.
32 # Get a specific result by name [20] A. Cioaca, A. Sandu, Low-rank approximations for computing observation impact in
33 def get_parameters_dict(self): 4D-Var data assimilation, Comput. Math. Appl. 67 (12) (2014) 2112–2126.
34 # Get the configured parameters as a dictionary [21] G. Desroziers, L. Berre, B. Chapnik, P. Poli, Diagnosis of observation, background
and analysis-error statistics in observation space, Q. J. R. Meteorol. Soc. 131 (613)
(2005) 3385–3396.
[22] E.D.F. R&D, J.-P. ARGAUD, ADAO documentation - ADAO documentation, 2023.

16
S. Cheng, J. Min, C. Liu et al. Computer Physics Communications 306 (2025) 109359

[23] G. Evensen, F.C. Vossepoel, P.J. van Leeuwen, 3Dvar and SC-4DVar for the Lorenz [44] N. Panda, M.G. Fernández-Godino, H.C. Godinez, C. Dawson, A data-driven non-
63 model recursive smoother, in: Data Assimilation Fundamentals: A Unified Formu- linear assimilation framework with neural networks, Comput. Geosci. 25 (2021)
lation of the State and Parameter Estimation Problem, Springer, 2022, pp. 157–167. 233–242.
[24] G. Evensen, F.C. Vossepoel, P.J. van Leeuwen, EnKF with the Lorenz equations, in: [45] S.G. Penny, T.A. Smith, T.-C. Chen, J.A. Platt, H.-Y. Lin, M. Goodliff, H.D. Abarbanel,
Data Assimilation Fundamentals: A Unified Formulation of the State and Parameter Integrating recurrent neural networks with data assimilation for scalable data-driven
Estimation Problem, Springer, 2022, pp. 151–156. state estimation, J. Adv. Model. Earth Syst. 14 (3) (2022) e2021MS002843.
[25] A. Farchi, M. Chrust, M. Bocquet, P. Laloyaux, M. Bonavita, Online model error [46] M. Peyron, A. Fillion, S. Gürol, V. Marchais, S. Gratton, P. Boudier, G. Goret, Latent
correction with neural networks in the incremental 4d-var framework, J. Adv. Model. space data assimilation by using deep learning, Q. J. R. Meteorol. Soc. 147 (740)
Earth Syst. 15 (9) (2023) e2022MS003474. (2021) 3759–3777.
[26] T. Frerix, D. Kochkov, J. Smith, D. Cremers, M. Brenner, S. Hoyer, Variational data [47] P.N. Raanes, Y. Chen, C. Grudzien, Dapper: data assimilation with python: a package
assimilation with a learned inverse observation operator, in: International Confer- for experimental research, J. Open Sour. Softw. 9 (94) (2024) 5150.
ence on Machine Learning, PMLR, 2021, pp. 3449–3458. [48] F. Rabier, Z. Liu, Variational data assimilation: theory and overview, in: Proc.
[27] Y. Fujii, E. Rémy, H. Zuo, P. Oke, G. Halliwell, F. Gasparin, M. Benkiran, N. Loose, ECMWF Seminar on Recent Developments in Data Assimilation for Atmosphere and
J. Cummings, J. Xie, et al., Observing system evaluation based on ocean data assim- Ocean, Reading, UK, September 8–12, 2003, pp. 29–43.
ilation and prediction systems: on-going challenges and a future vision for designing [49] C.E. Rasmussen, C.K. Williams, et al., Gaussian Processes for Machine Learning,
and supporting ocean observational networks, Front. Mar. Sci. 6 (2019) 417. vol. 1, Springer, 2006.
[28] H. Gong, Y. Yu, Q. Li, C. Quan, An inverse-distance-based fitting term for 3d-var data [50] V. Shutyaev, Methods for observation data assimilation in problems of physics of
assimilation in nuclear core simulation, Ann. Nucl. Energy 141 (2020) 107346. atmosphere and ocean, Izv., Atmos. Ocean. Phys. 55 (2019) 17–31.
[29] H. Gong, T. Zhu, Z. Chen, Y. Wan, Q. Li, Parameter identification and state estimation [51] D.J. Siefman, Development and application of data assimilation methods in reactor
for nuclear reactor operation digital twin, Ann. Nucl. Energy 180 (2023) 109497. physics, Technical report, EPFL, 2019.
[30] L. Györfi, M. Kohler, A. Krzyzak, H. Walk, et al., A Distribution-Free Theory of Non- [52] L.M. Stewart, S.L. Dance, N.K. Nichols, Correlated observation errors in data assim-
parametric Regression, vol. 1, Springer, 2002. ilation, Int. J. Numer. Methods Fluids 56 (8) (2008) 1521–1527.
[31] C.J. Halim, K. Kawamoto, Deep Markov models for data assimilation in chaotic dy- [53] A. Storto, G. De Magistris, S. Falchetti, P. Oddo, A neural network–based observation
namical systems, in: Advances in Artificial Intelligence: Selected Papers from the operator for coupled ocean–acoustic variational data assimilation, Mon. Weather
Annual Conference of Japanese Society of Artificial Intelligence (JSAI 2019) 33, Rev. 149 (6) (2021) 1967–1985.
Springer, 2020, pp. 37–44. [54] T. Suzuki, Data assimilation in fluid dynamics, Fluid Dyn. Res. 47 (5) (2015) 050001.
[32] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Courier Corporation, [55] O. Talagrand, 4D-VAR: four-dimensional variational assimilation, in: Advanced Data
2007. Assimilation for Geosciences: Lecture Notes of the Les Houches School of Physics:
[33] M. Katzfuss, J.R. Stroud, C.K. Wikle, Understanding the ensemble Kalman filter, Am. Special Issue, 2014, p. 1.
Stat. 70 (4) (2016) 350–357. [56] Y. Tang, Z. Shen, Y. Gao, An Introduction to Ensemble-Based Data Assimilation
[34] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, preprint, arXiv: Method in the Earth Sciences, Nonlinear Systems-Design, Analysis, Estimation and
1412.6980, 2014. Control, 2016.
[35] C. Liu, R. Fu, D. Xiao, R. Stefanescu, P. Sharma, C. Zhu, S. Sun, C. Wang, Enkf [57] P.J. Van Leeuwen, A consistent interpretation of the stochastic version of the ensem-
data-driven reduced order assimilation system, Eng. Anal. Bound. Elem. 139 (2022) ble Kalman filter, Q. J. R. Meteorol. Soc. 146 (731) (2020) 2815–2825.
46–55. [58] B.S. Venant, Theorie du mouvement non permanent des eaux avec application aux
[36] X. Liu, G. Clark, J. Campbell, Y. Zhou, H.B. Amor, Enhancing state estimation in crues des rivieres et a l’introduction des marees dans leur lits, C. R. Seances Acad.
robots: a data-driven approach with differentiable ensemble Kalman filters, preprint, Sci. 73 (1871) 147–154.
arXiv:2308.09870, 2023. [59] X. Wang, D.M. Barker, C. Snyder, T.M. Hamill, A hybrid etkf–3dvar data assimilation
[37] E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (2) (1963) 130–141. scheme for the wrf model. Part II: real observation experiments, Mon. Weather Rev.
[38] J. Mack, R. Arcucci, M. Molina-Solana, Y.-K. Guo, Attention-based convolutional 136 (12) (2008) 5132–5147.
autoencoders for 3d-variational data assimilation, Comput. Methods Appl. Mech. [60] Y. Wang, X. Shi, L. Lei, J.C.-H. Fung, Deep learning augmented data assimilation:
Eng. 372 (2020) 113291. reconstructing missing information with convolutional autoencoders, Mon. Weather
[39] R. Maulik, V. Rao, J. Wang, G. Mengaldo, E. Constantinescu, B. Lusch, P. Bal- Rev. 150 (8) (2022) 1977–1991.
aprakash, I. Foster, R. Kotamarthi, Efficient high-dimensional variational data as- [61] J. Wei, X. Luo, H. Huang, W. Liao, X. Lei, J. Zhao, H. Wang, Enable high-resolution,
similation with machine-learned reduced-order models, Geosci. Model Dev. 15 (8) real-time ensemble simulation and data assimilation of flood inundation using dis-
(2022) 3433–3445. tributed gpu parallelization, J. Hydrol. 619 (2023) 129277.
[40] S. Mohd Razak, A. Jahandideh, U. Djuraev, B. Jafarpour, Deep learning for latent [62] D. Xiao, J. Du, F. Fang, C. Pain, J. Li, Parameterised non-intrusive reduced order
space data assimilation in subsurface flow systems, SPE J. 27 (05) (2022) 2820–2840. methods for ensemble Kalman filter data assimilation, Comput. Fluids 177 (2018)
[41] H. Moradkhani, G. Nearing, P. Abbaszadeh, S. Pathiraja, Fundamentals of data as- 69–77.
similation and theoretical advances, in: Handbook of Hydrometeorological Ensemble [63] W. Zhan, G. Wu, H. Gao, Efficient decentralized stochastic gradient descent method
Forecasting, 2018, pp. 1–26. for nonconvex finite-sum optimization problems, in: Proceedings of the AAAI Con-
[42] L. Nerger, W. Hiller, J. Schröter, Pdaf-the parallel data assimilation framework: ference on Artificial Intelligence, vol. 36, 2022, pp. 9006–9013.
experiences with Kalman filtering, in: Use of High Performance Computing in Mete- [64] S. Zhu, B. Wang, L. Zhang, J. Liu, Y. Liu, J. Gong, S. Xu, Y. Wang, W. Huang, L. Liu,
orology, World Scientific, 2005, pp. 63–83. et al., A four-dimensional ensemble-variational (4denvar) data assimilation system
[43] OpenDA-Association, GitHub - OpenDA-Association/OpenDA: Open data assimila- based on grapes-gfs: system description and primary tests, J. Adv. Model. Earth Syst.
tion toolbox, 2023. 14 (7) (2022) e2021MS002737.

17

You might also like