ROM Paper
ROM Paper
Rakesh Halder ∗a,b , Mohammadmehdi Ataeib , Hesam Salehipourb , Krzysztof Fidkowskia , Kevin Makic
arXiv:2402.05372v2 [physics.flu-dyn] 8 Aug 2024
a
Department of Aerospace Engineering, University of Michigan, Ann Arbor, MI, USA 48109
b
Autodesk Research, Toronto, ON, CAN M5G 1M1
c
Department of Naval Architecture and Marine Engineering, University of Michigan, Ann Arbor, MI, USA 48109
A BSTRACT
The use of deep learning has become increasingly popular in reduced-order models (ROMs) to obtain
low-dimensional representations of full-order models. Convolutional autoencoders (CAEs) are often
used to this end as they are adept at handling data that are spatially distributed, including solutions to
partial differential equations. When applied to unsteady physics problems, ROMs also require a model
for time-series prediction of the low-dimensional latent variables. Long short-term memory (LSTM)
networks, a type of recurrent neural network useful for modeling sequential data, are frequently
employed in data-driven ROMs for autoregressive time-series prediction. When making predictions
at unseen design points over long time horizons, error propagation is a frequently encountered issue,
where errors made early on can compound over time and lead to large inaccuracies. In this work, we
propose using bagging, a commonly used ensemble learning technique, to develop a fully data-driven
ROM framework referred to as the CAE-eLSTM ROM that uses CAEs for spatial reconstruction of
the full-order model and LSTM ensembles for time-series prediction. When applied to two unsteady
fluid dynamics problems, our results show that the presented framework effectively reduces error
propagation and leads to more accurate time-series prediction of latent variables at unseen points.
1 Introduction
Physics-based numerical simulation has become an indispensable tool in engineering and scientific applications,
allowing for accurate representations of complex physical phenomena. Physical simulations are formulated as sets of
governing equations, typically in the form of parameterized partial differential equations (PDEs) that are discretized over
a computational domain. Simulations often involve a set of design parameters µ that govern aspects such as boundary
conditions, the geometry of the computational domain, and physical properties. In tasks like design optimization,
where numerous simulations are run for different designs, it is important to achieve high accuracy, or fidelity. However,
high-fidelity simulation comes at a large computational cost, and this can lead to a bottleneck in the design optimization
process.
Employing reduced-order models (ROMs) is a prevalent strategy to mitigate this high computational cost.1 By utilizing
a limited set of training data from computed high-fidelity simulations, ROMs construct a low-dimensional surrogate
model that can be evaluated in real-time and provide accurate full-order model (FOM) approximations at unseen designs
within the parameter space covered by the training data. ROMs significantly reduce the degrees of freedom of the FOM
through the use of a low-dimensional embedding that can be effectively mapped back to the full-order state, leading to a
substantial decrease in computational cost. ROMs consist of two stages: an offline stage, which is computationally
intensive, involving the computation of high-fidelity solutions by solving the FOM to generate data snapshots and train
the low-dimensional surrogate model, and an online stage, where the model is used to approximate solutions at desired
points. Running the FOM requires numerically solving the governing equations over the computational domain, which
comes at a much larger computational cost than the ROM, which represents the FOM using a small number of variables.
∗
Corresponding Author: [email protected]
The presented research was conducted during the author’s internship at Autodesk Research.
Most ROMs utilize the proper orthogonal decomposition,2 a linear method that uses the singular value decomposition
(SVD) to obtain a low-rank subspace consisting of a number of linearly independent basis vectors. A linear combination
of these basis vectors is computed using a set of expansion coefficients to approximate full-order states within the
solution space. The POD basis vectors are interpretable, allowing for visualization of dominant physical features.
Although POD is widely used, it encounters difficulties when applied to highly nonlinear problems, often requiring a
large number of basis vectors to provide reasonable accuracy.3
More recently, machine learning and artificial intelligence (AI) methods have been used to provide nonlinear mappings
between the low-dimensional embedding and high-fidelity solution space. In particular, deep learning4 approaches
have been used to develop ROMs that provide efficient nonlinear manifolds of physical systems. Convolutional
autoencoders (CAEs), a type of artificial neural network, have been used to build ROMs and shown to outperform
POD-based methods .5, 6 Artificial neural networks that use convolutional layers efficiently learn structures and patterns
present in spatially distributed data, including the solutions to PDEs. Autoencoders consist of two individual neural
networks: an encoder, which maps high-dimensional inputs to a low-dimensional latent space, and a decoder, which
maps the low-dimensional latent space to a reconstruction of the high-dimensional input. Although autoencoders can
effectively learn nonlinear relationships, they lack interpretability. Recent works have used variational autoencoders7, 8
to incorporate interpretable nonlinear modes into ROMs, although this results in lower reconstruction accuracy when
compared to vanilla autoencoders. Another approach for simulating systems governed by PDEs at a reduced cost is the
use of neural operators,9, 10 which implicitly learn the governing equations using deep neural networks. However, this
approach typically requires very large amounts of training data to cover parameter spaces effectively, limiting their use
in design optimization.
ROMs are categorized as either non-intrusive or projection-based; non-intrusive ROMs use a fully data-driven approach,
while projection-based ROMs incorporate the governing equations to solve a low-dimensional version of the FOM.
While projection-based ROMs can provide better accuracy and more physically consistent results, they incur a larger
computational cost and often exhibit stability issues11 that inhibit convergence. Additionally, they are generally not
portable between different solvers.
ROMs are developed for both steady and unsteady physics problems. The first approach involves computing a single
point estimate of the low-dimensional embedding, while the latter involves making predictions over a prescribed time
horizon. Unsteady non-intrusive ROMs combine either POD or CAE for spatial reconstruction of full-order states with
a model used to make time-series predictions of the low-dimensional embeddings. Deep learning methods are popular
for this as well, including long short-term memory (LSTM) neural networks12 and transformer neural networks.8 Both
LSTMs and transformers are powerful models for handling data that are sequential in nature such as time-series data.
Transformers utilize an architecture that is much more complex than the one LSTMs use, which can lead to better
performance for complex time-series problems and other tasks such as developing large language models.13 However,
this comes at an increased cost for both training and inference when compared to LSTMs, which use a significantly
lower number of parameters. The use of LSTMs for time-series forecasting is well-established,14, 15 making them a
popular choice.
When making time-series predictions at unseen data sets over a long time horizon, error propagation is a common issue.
Errors made in early predictions can accumulate and compound over time, leading to substantial inaccuracies. Unseen
data sets are particularly suspect to this, as the model may not account for shifting data patterns. Additionally, there
is no feedback from previously seen data to correct errors. This phenomenon is commonly observed in data-driven
computational fluid dynamics (CFD) applications.16, 17 As a result, most studies in the literature focus on applying
unsteady non-intrusive ROMs to single-parameter problems,8, 12, 18 where the ROM is both trained on and used for a
single design. We have identified two previously published studies that combine CAEs and LSTMs for non-intrusive
ROMs and apply them to unseen designs.19, 20 The work by Maulik et al. does not mention the error propagation issue,
while the work by Hasegawa et al. presents results of low accuracy. A recent paper21 by Jeon et al. introduced a hybrid
AI-CFD method using flow residuals to address the issue of error propagation. However, this method is intrusive and
requires the ROM to have access to the CFD solver for computing the residuals, which leads to a more computationally
intensive online stage.
In this work, we propose the use of ensemble learning,22 a machine learning technique for improving the stability and
lowering the variance of predictive models. Ensemble learning involves combining multiple base models, referred to as
weak learners, to create a composite model that offers greater accuracy. To this end, we employ bootstrap aggregating
(bagging) as the ensemble learning method for the temporal portion of the ROM, which involves training the weak
learners on subsets of the dataset chosen randomly through sampling with replacement. The fully data-driven framework
is referred to as the CAE-eLSTM ROM, and our results show that using ensembles leads to significantly improved
stability and predictive performance when applied to two unsteady, incompressible, laminar fluid dynamics problems
using different CFD solvers.
2
2 Methods
In this section, we give an overview of both CAEs and LSTMs and how they are combined to develop an unsteady ROM
using bagging as an ensemble learning method. CAEs are used to both provide a low-dimensional latent representation
of the solution space and provide spatial reconstructions of full-order states. A temporal forecasting model of the latent
variables is also required when using unsteady ROMs. LSTMs, a type of recurrent neural network, are used in this
work. Although we do not provide direct comparisons to POD-based ROMs, a brief introduction to POD is given in this
section. Comparisons of reconstruction errors using POD and CAE are given in Appendix A.
Using training data obtained from a set of n solution snapshots calculated at chosen design points in the parameter
space, a snapshot matrix, X ∈ RN ×n , can be assembled.
X ∈ RN ×n = [x1 , x2 , · · · , xn ] = [x(µ1 ), x(µ2 ), · · · , x(µn )]. (1)
A subspace V associated with X exists such that V = span(X). It assumed that V provides a good approximation
of the solutions for µ within the parameter space if there is a sufficient variety of training data in X. A rank k set of
orthonormal basis vectors [ψ 1 , ψ 2 , · · · , ψ k ] ∈ RN , where k ≪ N , is associated with V such that each solution xi in
X can be reconstructed as a linear combination of the basis vectors
where ai are the low-dimensional expansion coefficients. The truncated singular value decomposition (SVD) of X
decomposes the matrix into two orthonormal matrices U ∈ RN ×n and V ∈ Rn×n , and a diagonal matrix Σ ∈ Rn×n
such that
X = U ΣV T . (3)
U contains a set of n left singular vectors that span the column space of X, V contains a set of n right singular vectors
that span the row space of X, and diag(Σ) ∈ Rn = [σ1 , σ2 , · · · , σn ] contains the singular values in decreasing order,
σ1 ≥ · · · ≥ σn ≥ 0. The first k vectors of U form the POD basis, Ψ ∈ RN ×k = [ψ 1 , ψ 2 , · · · , ψ k ]. The singular
values are a measure of the amount of information represented by each vector. Often, the singular values decay rapidly,
and only the first k singular vectors are chosen to form the POD basis to preserve the most important features of the
solution space. The relative information content E of the subspace is used in practice to choose a value of k,
Pk 2
j=1 σj
E(k) = Pn 2, (4)
j=1 σj
and k is chosen such that E(k) ≥ γ, where γ ∈ [0,1] is usually set to a value γ ≥ 0.95.23 Approximations of full-order
solutions at unseen design parameters x(µ∗ ) are obtained as
x(µ∗ ) ≈ Ψa∗ = a∗1 ψ 1 + a∗2 ψ 2 · · · + a∗k ψ k . (5)
Convolutional neural networks were initially developed for computer vision applications, where they have been shown to
vastly outperform traditional statistical image processing techniques.24, 25 They have also demonstrated high predictive
performance when used in ROMs,5, 26 highlighting their versatility. Autoencoders are a type of feedforward neural
network that aim to accurately reconstruct inputs in the output layer, g : x → x̃ where x ≈ x̃. Autoencoders are
composed of two individual feedforward neural networks. The encoder genc : RN → Rk where k ≪ N maps a
high-dimensional input x into the low-dimensional latent space a, and the decoder gdec : Rk → RN maps the latent
variables back to an approximation of the high-dimensional input x̃. The combination of the two results in
g : x̃ = gdec ◦ genc (x). (6)
Deep neural networks provide nonlinear function maps through the use of activation functions. During training, the
neural network parameters are tuned using backpropagation,27 an algorithm utilizing automatic differentiation that
aims to minimize a differentiable loss function that measures the discrepancy between the computed and true outputs.
This allows for discovering highly complex and abstract functional relationships that do not follow any strong model
assumptions. The number of trainable parameters in vanilla artificial neural networks grows large with the number of
hidden layers and neurons, and can lead to a very large computational training cost.
3
Convolutional neural networks effectively implement parameter sharing to limit the total number of trainable parameters
in the network, where rather than weight combinations existing for each pair of neurons between layers, multiple
neurons share a single weight. Convolutional layers use a number of filters that convolve over the input data, with each
filter having its own set of weights. Pooling layers are also used in convolutional networks to summarize the features
in input layers through operations including averaging and maximization. The input layer to a convolutional neural
network is composed of a number of channels, with each representing a different state. In images, this is generally
the levels of red, green, and blue present in each pixel, while for physical simulation data the states can represent
components of the normalized velocity, pressure, density, etc. Once an autoencoder is sufficiently trained and g(x) ≈ x
for all inputs over the training dataset X, the corresponding latent variables a can be passed to the decoder gdec (a)
to obtain accurate approximations x̃ for all data in X. High-dimensional data existing outside of the training set x∗
may also be well-approximated if an accurate approximation of the latent variables a∗ is obtained. Figure 1 shows an
example of the encoder section of a convolutional autoencoder, which is composed of convolutional, pooling, and fully
connected layers. A similar architecture in reverse would form the decoder.
Figure 1: Architecture of the encoder of a convolutional autoencoder (CAE) consisting of convolutional, pooling, and
fully connected layers.
The input and output layers of CAEs are often structured as two-dimensional states within each channel when used for
two-dimensional physical simulations. To accomodate this structure, training data needs to be reshaped before being
input into the network through the use of a reshape operator R,
Traditional recurrent neural networks are well-suited for handling sequential data; however, they have difficulties
learning and capturing long-range dependencies. A primary reason for this is the vanishing gradient problem,28 which
occurs when gradients shrink significantly as they are back-propagated through time, which can cause the model to
forget early sequence information. LSTMs were introduced29 to address this issue by using a more complex network
architecture that includes a series of gates which effectively control the flow of information. An individual LSTM cell
consists of a cell state ct which acts as an internal memory, and a hidden state ht , which serves as an output. The
cell state is responsible for selectively retaining or forgetting information from previous steps, while the hidden state
represents the model’s cumulative output. LSTM cells contain a forget gate ft , input gate it , and output gate ot . Forget
gates determine what information from previous states should be forgotten. Input gates decide what new pieces of
information should be stored in the current state. Output gates use the current and previous cell states to determine what
information is retained in the model. Using this architecture, the model is able to retain information that is relevant
for long-range sequential dependencies and discard information that becomes irrelevant over time, making LSTMs a
popular choice for sequential modeling. For a given input at ∈ Rk , the LSTM equations are given as
ft = σ(Ff (at ))
it = σ(Fi (at ))
ot = σ(Fo (at ))
(8)
c̃t = tanh(Fc (at ))
ct = ft ⊙ ct−1 + it ⊙ c̃t
ht = ot ⊙ tanh(ct ),
4
where σ is the sigmoid function, tanh is the hyperbolic tangent function, and ⊙ is the Hadamard product. F is a linear
function of the weights Wa ∈ Rnh and Wh ∈ Rnh and biases b ∈ Rnh , where nh is the number of neurons in the
hidden layer of each LSTM cell, and is given as
F = Wa at + Wh ht−1 + b. (9)
A unique set of weights and biases belongs to each gate or cell state. When using LSTMs for time-series prediction,
an autoregressive prediction is used, often referred to as the sliding window approach. For a given input sequence
containing multivariate data for w timesteps, the next step in the sequence is predicted. When making the next prediction,
the current prediction is incorporated into the input sequence by removing the first element and shifting the rest to
the prior position. Eventually, the model inputs will consist of only previously computed predictions, which makes
model performance sensitive to error propagation. The input sequence length, or window size w, is an important
hyperparameter to consider when training LSTMs. Although LSTMs are designed to effectively learn long-range
dependencies, using an input sequence length that is too long can lead to the inclusion of outdated and irrelevant
information. On the other hand, using an input sequence length that is too small can ignore important long-range
information. The optimal choice is highly problem-dependent; similar problems found in the literature can be consulted,
or methods such as cross-validation can be used to find an optimal value. A diagram of a single-layer LSTM making a
prediction one timestep ahead is shown in Figure 2.
Figure 2: Diagram of a single-layer LSTM neural network making a prediction one timestep ahead.
While LSTMs are powerful models for time-series prediction, the quality of predictions over long time horizons is
highly sensitive to the weights and biases of the model. Additionally, the total number of weights and biases is usually
very large, which makes finding an optimal configuration very difficult, even when using state of the art optimization
algorithms.
To mitigate the issue of high model variance, ensemble learning is a commonly used approach. By leveraging multiple
base models that exhibit high variance, ensemble methods significantly reduce errors and improve robustness. Boosting
and bootstrap aggregating (bagging) are the two main types of ensemble learning methods. Boosting30 trains individual
models sequentially, where each subsequent weak learner focuses on correcting prediction errors from the previous
5
ones. While boosting is widely used and offers good performance, the base models must be trained iteratively, which
incurs a large computational cost, especially in the context of neural networks.
Bagging31 is an algorithm that consists of two stages: bootstrapping and aggregation. Bootstrapping is a resampling
technique where multiple random subsets of the data set, chosen through sampling with replacement, are constructed.
The subsets of the data typically contain the same number of data points as the original data set. This approach leads to
individual data points being present multiple times in the individual subsets. An example of bootstrapping is shown in
Figure 3, where the bootstrapped data sets contain more or less instances of the original data points. Multiple base
models are trained individually on each of the bootstrapped data sets, which can be done in parallel. The aggregation
stage creates an ensemble model by taking an average of the predictions given by each weak learner. Given m weak
learners fi , the aggregate prediction f¯ is given as
m
1 X
f¯ = fi . (10)
m i
Bagging reduces the issue of error propagation by averaging the errors of multiple weak learners, which will become
increasingly small as the number of learners grows, leading to much lower variance. Bagging also offers a considerable
gain in robustness and generalization by leveraging the diversity of the weak learners to increase the capability of
handling varying patterns outside of the training data. As an increasing number of weak learners are added, the reduction
in variance plateaus, leading to diminishing returns in performance gains. After a certain point, the computational cost
of adding more weak learners outweighs the marginal improvement in predictive performance. As the complexity of the
problem increases, the number of weak learners required to make accurate predictions also generally increases. While
theoretical considerations can be made to analyze the trade-offs between bias, variance, and covariance to determine
the optimal number of weak learners, this number is difficult to choose a priori. A reasonable choice is based on both
the problem’s complexity and similar problems found in the literature, which are not yet present in our case and are
established in this work.
Figure 3: An example of bootstrapping, where random subsets of the original dataset are chosen through sampling with
replacement.
This section describes the non-intrusive ROM framework combining convolutional autoencoders for spatial reconstruc-
tion and LSTM ensembles utilizing bagging for temporal prediction, which we will refer to as a CAE-eLSTM ROM.
The offline stage is computationally intensive, requiring multiple solves of the FOM in addition to training multiple
neural networks. First, solutions to the FOM for designs µ in the training set of design parameters U train ∈ Rn×p are
obtained for all timesteps and are assembled into a matrix X. Next, a convolutional autoencoder with latent dimension
k is trained on X for a sufficient number of epochs such that inputs are accurately reconstructed. Then, the expansion
coefficients for the training data Atrain are computed by feeding the full-order states from X through the encoder genc .
Sequences of length w are then generated from Atrain and m LSTMs using the same set of initial weights and biases are
trained on individual data subsets chosen randomly through sampling with replacement. PyTorch,32 an open source
6
deep learning framework available for Python, is used to implement the autoencoder and LSTM using default weight
and bias initialization settings.
A schematic describing the online stage of the ROM is shown in Figure 4. The online stage involves executing the
ROM to compute approximate solutions at a point µ∗ . First, the FOM is run for Ti timesteps, as an initial sequence
of latent variables is required for the LSTM ensemble. The computed full-order states are fed through the encoder to
obtain the initial sequence of latent variables A∗Ti ∈ Rk×w . The FOM must be run for at least Ti ≥ w timesteps, and
potentially longer depending on the how useful simulation data from the initial timesteps are for the ROM. The latent
variables for the rest of the prediction horizon are calculated autoregressively using the LSTM ensemble by taking an
average of the predictions from the individual weak learners. After the predicted latent variables are found for each
timestep t, they are fed through the decoder to obtain approximate full-order fields. The online and offline stages are
outlined in Algorithm 1.
When using a ROM with a single LSTM as in previous works,19, 20 the LSTM is trained on the entire time-series dataset
and is used alone for time-series prediction. When compared to these works, the main difference in our presented ROM
framework is the use of an ensemble model for time-series prediction. Although this results in an increased offline
training cost, using an ensemble allows for better accuracy and stability over long prediction horizons. While the use of
an ensemble LSTM is simple, it can greatly improve ROM performance without requiring additional training data.
3 Results
The two test cases used to demonstrate the performance of the CAE-eLSTM ROM are a lid-driven cavity and the flow
over a cylinder. Both cases use two-dimensional computational domains. The lid-driven cavity case consists of three
design parameters controlling the geometry of the domain, while the cylinder case consists of two design parameters
controlling geometric and physical properties of the simulation. Different CFD solvers are used for each case; this
is done to illustrate the portability of the ROM and allow for the use of a structured grid for the cylinder case. Latin
hypercube sampling (LHS),33 a popular statistical method for generating samples from a multi-dimensional parameter
space is used to generate sets of design parameters that are split into training and test sets. LHS aims to maximize the
distance and minimize the correlation amongst samples produced by using uniform sampling. The ROM is used to
predict the vertical and horizontal components of the velocity. The training data for each velocity component is scaled
to a range [0,1] using min-max scaling before input into the CAE. Similarly, the CAE latent variables are also scaled
using min-max scaling before being used for the LSTM. Data normalization improves performance and allows the
optimizer to learn optimal network parameters at a much faster rate.34 Both the CAE and LSTM are trained using the
mean squared error loss function. An NVIDIA DGX system, consisting of eight NVIDIA A100 GPUs, is used to train
the CAE and LSTMs, run the cylinder simulations, and for ROM inference. The lid-driven cavity simulations are run
on a local workstation using a single CPU core.
7
Figure 4: Schematic of the online stage of the CAE-eLSTM ROM.
The performance of the ensemble ROM is tested against a CAE-LSTM ROM that uses a single LSTM model. Using
five different random initializations of the weights and biases prescribed by different seeds, the time-series of latent
variables are visually compared in addition to performance metrics averaged over the seeds. When training multiple
LSTMs on bootstrapped data sets for a single seed s, the same initial weights and biases are used for each individual
weak learner. Evaluating the ROM’s performance using different seeds allows for testing its sensitivity to the initial
weights and biases. Ns = 5 different seeds are used to initialize the LSTM model, with s =[1, 2, 3, 4, 5]. For a single
test point µ∗ and seed s, an error metric of interest is the relative L2 error in a field component averaged over the
prediction time horizon,
T 2
1 X ∥xt − x̃t ∥
ϵs = 2 . (11)
T − Ti ∥xt ∥
t=Ti +1
The errors are then averaged over each seed to give a single error metric ϵ̄
Ns
1 X
ϵ̄ = ϵs . (12)
Ns s=1
We are also interested in how sensitive the error is to the initial weights and biases of the model. The sample standard
deviation σs of ϵs is used as a measure of this,
σs = std(ϵs ). (13)
A lower standard deviation in the error term indicates that the model performance over the prediction horizon does not
vary much with the initial weights and biases.
8
3.1 Lid-Driven Cavity
The first test case is a geometrically parameterized two-dimensional lid-driven cavity flow, a popular benchmark
problem for CFD solvers. Unsteady, incompressible, laminar flow is simulated by solving the Navier-Stokes equations,
given as
→
−
Z
V · d⃗n dS = 0, (14)
S
→
−
→
−→− →
− →
−T
Z Z Z Z
∂V
dΩ + V V · d⃗n dS + ∇p dΩ − ν(∇ V + ∇ V ) · d⃗n dS = 0, (15)
Ω ∂t S Ω S
→
−
where V = [u, v] is the velocity vector, Ω is the fluid domain, S is the face-area vector, ⃗n is the outward-pointing normal,
ν is the kinematic viscosity, and p is the pressure. OpenFOAM,35 an open-source toolbox for multiphysics simulation is
used. The boundary conditions and design parameters are shown in Figure 5. On each edge Γi , i ∈ [1, 2, 3, 4] of the
computational domain Ω, the pressure gradient ∇p is set to 0. No slip conditions exist on all of the edges except the top,
where u = 1, v = 0. The reference pressure is set to p = 0 at the bottom left corner. The first design parameter µ1
controls the horizontal length, the second µ2 controls the slanting length, while the third µ3 controls the slanting angle.
The Reynolds number Re is set to 400, and is related to the kinematic viscosity as
max(µ1 , µ2 )
Re = 400 = . (16)
ν(µ)
The computational mesh consists of 128 × 128 cells distributed uniformly in the x and y directions. A single cell exists
in the spanwise direction z, resulting in N = 16384. The initial condition is the solution to steady, incompressible,
laminar flow at Re = 20 which is computed using the standard OpenFOAM solver simpleFoam. Unsteady flow is
simulated using the OpenFOAM solver icoFoam for T = 5 seconds with data being saved every 0.025 seconds, leading
to 200 time snapshots for a single simulation. Both solvers utilize the finite volume method (FVM). Figure 6 shows
contours of u and v at T = 5 at three different sets of design parameters. A vortex that varies in shape, size, and
location with the design parameters is shown for both velocity components. The variation of the flow with µ is highly
nonlinear, making this a difficult prediction problem for ROMs.
9
100 sets of design parameters are generated and randomly split into 90 training samples and 10 test points, which are
given in Table I. The CAE-eLSTM ROM uses m = 64 bagged LSTMs and a window size w = 20. These parameters
were chosen through a trial-and-error process with the goal of maximizing predictive accuracy while trying to keep the
computational cost of training the LSTM ensemble relatively low. This was done by directly evaluating the performance
on the test set, while in practice methods like cross-validation would be used. Initial guesses for the window size were
based on similar problems found in the literature. An initial number of approximately 50 weak learners was used, which
was increased in increments of 8 until the performance gains plateaued. The CAE latent dimension is set to k = 4;
below this value, the CAE reconstruction errors were found to be worse, and increasing k offered no improvement. The
LSTM architecture consists of two hidden layers consisting of 50 neurons each and a dropout36 rate of 0.1. The output
layer contains a sigmoid activation function so the outputs are constrained to a range of [0, 1]. The CAE architecture is
given in Appendix B. The Adam optimizer37 is used to train both the CAE and LSTM, with an initial learning rate of
η = 5 × 10−4 and weight decay of λ = 1 × 10−6 . Dropout and weight decay are used to prevent overfitting. The CAE
is trained for a total of 200 epochs, while an individual LSTM is trained for 250 epochs. At the test points, the FOM is
simulated for Ti = 0.75 seconds (30 snapshots), or 15% of the total time horizon. The final 20 of these snapshots are
used as the initial input sequence for the LSTM ensemble.
Figure 6: Snapshots at T = 5 of u (top) and v (bottom) for the lid-driven cavity problem at three different sets of design
parameters.
Figure 7 shows the trajectories of the latent variables computed using ensemble and single LSTMs at the test point
µ∗ = [1.299, 1.689, −0.0367] using five different seeds which control the initial weights and biases of the LSTM. It is
shown that for all of the latent variables, using LSTM ensembles leads to higher prediction accuracy and significantly
lower variance. When using a single LSTM, the different predictions quickly diverge from each other and the ground
truth. Due to their high capacity to learn, neural networks often exhibit high variance. Since neural networks contain a
large number of parameters, the effects of error propagation between different trained models are heightened due to
relatively large differences in autoregressive predictions being fed back into the model. In contrast, the ensemble model
is much less sensitive to the initial seed, and the predicted trajectories do not differ much. The second latent variable
exhibits diverging trajectories when using the ensemble model, but the effect is much less pronounced than when using a
single LSTM. Figure 8 shows contours of u and v at the test point as well as absolute ROM errors averaged over the last
10% of the simulation (t ∈ [4.5, 5] seconds) for a single seed. The errors are considerably larger when using a single
10
Test Case Index µ1 µ2 µ3
1 1.473 1.701 0.288
2 1.659 1.653 -0.372
3 1.593 1.611 0.455
4 1.209 1.443 0.371
5 1.299 1.689 -0.0367
6 1.371 1.311 -0.351
7 1.443 1.617 -0.487
8 1.232 1.335 -0.330
9 1.719 1.785 -0.455
10 1.281 1.449 -0.340
Table I: Test case design parameters for the lid-driven cavity problem.
LSTM throughout the computational domain for both u and v. Table II lists the seed-averaged relative errors ϵ̄ in u and
v for the test points. The ensemble method offers better performance in predicting both fields at all of the test points,
usually by wide margins. Table III lists the standard deviation of these errors; at all of the test points, the standard
deviation is also smaller. As there is a lower spread amongst the different seeds in the time-averaged error metrics,
this shows that using the ensemble method leads to much greater stability and reliability in predictions. Table IV lists
the computational costs associated with neural network training, CFD simulation, and ROM inference. The given
CFD wall time is the portion of the simulation over the prediction time horizon; using the ROM for inference over this
period of time offers a speed-up of approximately 9.4x over CFD. The ROM inference costs include autoregressive
prediction using the ensemble LSTM and obtaining full-order approximations using the decoder. The CAE is much
more expensive to train than a single LSTM as it has a larger number of trainable parameters. While a multi-GPU
architecture was used to train the LSTMs in parallel, the cost to train a single LSTM is given.
11
CAE Latent Variable #1 CAE Latent Variable #1
0.4
Ground Truth Ground Truth
Ensemble LSTM #1 0.25 Single LSTM #1
0.2 Ensemble LSTM #2 Single LSTM #2
Latent Variable Magnitude Ensemble LSTM #3 Single LSTM #3
0.00
0.4 0.50
0.6 0.75
0.8 1.00
1.0 1.25
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Time Snapshot Time Snapshot
CAE Latent Variable #2 CAE Latent Variable #2
Ground Truth
Ensemble LSTM #1
1.4 Ensemble LSTM #2 1.4
Ensemble LSTM #3
Latent Variable Magnitude
0.75 0.75
0.50 0.50
0.25 0.25
Ground Truth Ground Truth
Ensemble LSTM #1 Single LSTM #1
0.00 Ensemble LSTM #2 0.00 Single LSTM #2
Ensemble LSTM #3 Single LSTM #3
0.25 Ensemble LSTM #4 0.25 Single LSTM #4
Ensemble LSTM #5 Single LSTM #5
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Time Snapshot Time Snapshot
CAE Latent Variable #4 CAE Latent Variable #4
0.5
0.0
0.0
Latent Variable Magnitude
0.5
0.5
1.0
1.0
1.5
Ground Truth 1.5 Ground Truth
Ensemble LSTM #1 Single LSTM #1
2.0 Ensemble LSTM #2 Single LSTM #2
Ensemble LSTM #3 2.0 Single LSTM #3
Ensemble LSTM #4 Single LSTM #4
Ensemble LSTM #5 Single LSTM #5
2.5 2.5
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Time Snapshot Time Snapshot
Figure 7: Comparison of latent variable trajectories for the lid-driven cavity case at µ∗ = [1.299, 1.689, −0.0367].
12
Figure 8: Time snapshot of u (top) and v (bottom) at T = 5 and errors averaged over the last 10% of the simulation at
µ∗ = [1.299, 1.689, −0.0367].
13
Test Case Index ϵ̄, u (Ensemble) ϵ̄, u (Single) ϵ̄, v (Ensemble) ϵ̄, v (Single)
1 0.02421 0.07119 0.02514 0.07204
2 0.01958 0.04199 0.02174 0.03870
3 0.02622 0.05282 0.02868 0.06296
4 0.03640 0.06575 0.03924 0.06742
5 0.02442 0.06589 0.01996 0.05131
6 0.04177 0.04939 0.03964 0.04733
7 0.02445 0.03771 0.02789 0.03915
8 0.04559 0.05304 0.04251 0.05194
9 0.03313 0.04678 0.03891 0.04900
10 0.02877 0.04291 0.02717 0.04079
Table II: Average relative errors in u and v for the lid-driven cavity case. Boldface denotes the lowest values.
Table III: Standard deviation in relative errors of u and v for the lid-driven cavity case. Boldface denotes the lowest
values.
Table IV: Computational costs associated with the lid-driven cavity problem.
3.2 2D Cylinder
The next test case involves two-dimensional incompressible, unsteady, laminar flow over a cylinder between two solid
walls. Eventually, the lid-driven cavity flow from the previous test case reaches steady-state and does not exhibit
long-term transient behavior that is commonly found in fluid dynamics problems. Laminar flow over a cylinder is a
well-studied problem in fluid dynamics, with both experimental and computational results present in the literature.38, 39
Unsteady cylinder flow is characterized by the presence of vortices that separate from the surface and form in the wake.
This distinctive pattern is known as the Von Kármán vortex street, where alternating vortices of a regular pattern are
shed downstream of the cylinder. The unsteady Navier-Stokes equations found in equations 14 and 15 are solved using
XLB,40 a Lattice Boltzmann method41 (LBM) library utilizing the JAX framework42 available for Python, which allows
for effective scaling across multiple CPUs, GPUs, and distributed multi-GPU systems. The cited works can be referred
to for an overview of the Lattice Boltzmann method and its implementation in XLB.
No-slip boundary conditions are applied to the cylinder’s surface and top and bottom walls. A Poiseuille flow profile is
used for the inlet velocity. Extrapolation outflow boundary conditions are used for the outlet to allow the fluid flow to
exit the domain freely. The computational domain measures 1536 × 512 voxels that are uniformly spaced. The cylinder
14
is centered at xc , yc = [160, 256] (zero-based indexing is used). Simulation results are down-sampled onto a grid that
measures 384 × 128 before being used for the ROM, resulting in N = 49152, as the original domain’s large size leads
to a very large training cost as well as memory usage. Two design parameters are used, the diameter d of the cylinder
and the Reynolds number Re. The bounds of the design parameters are given as
µ1 = d ∈ [48, 68],
µ2 = Re ∈ [120, 240].
The diameter d is set to an integer quantity by rounding to the nearest whole number. The wake structure behind the
cylinder undergoes instabilities43, 44 at a critical reynolds number of approximately Rec = 180, where the vortices
transition to becoming turbulent. As a result, the prescribed range given for Re in the parameter space includes a
variety of physical regimes. Additionally, both Re and d control the size and periodicity of the shed vortices, making
this a difficult prediction problem for ROMs. The freestream velocity in the x-direction is set to u∞ = 0.001 in
non-dimensional units. The simulation is run for T = 750000 timesteps with data being saved every 1000 steps, leading
to 750 snapshots for a single simulation. In LBM simulations, time and space are often non-dimensionalized by ∆t and
∆x to arrive at dimensionless values of ∆x∗ = ∆t∗ = 1. However, one needs to ensure that ∆t/∆x ≪ 1 to avoid
compressibility errors. In these runs we picked ∆t/∆x = 0.001. The initial condition for a simulation of a given
diameter d is the solution to steady flow at Re = 20.
50 sets of design parameters are generated and split into 45 training samples and 5 test points, shown in Figure 9.
Table V also lists the test point design parameters. The ROM uses m = 96 bagged LSTMs and a window size of w =
30, which were again chosen through a trial-and-error process to maximize accuracy while keeping the number of weak
learners to a minimum. The CAE latent dimension is set to k = 10. Similar to the lid-driven cavity case, values below
k = 10 resulted in higher reconstruction errors and increasing the latent dimension further offers no improvement. A
bidirectional LSTM architecture45 is used. Bidirectional recurrent neural networks process sequential data in both
forward and backward directions, allowing the model to learn both past and future context. For problems that are
characterized by regularly repeating patterns such as a vortex street, bidirectional LSTMs can better learn the cyclic
nature of the pattern by leveraging context in both directions. The network consists of three hidden layers with 36
neurons each and a dropout rate of 0.15. The output layer again contains a sigmoid activation function, and the Adam
optimizer with the same initial learning rate η = 5 × 10−4 and weight decay of λ = 1 × 10−6 is used for both the
LSTM and CAE. Again, the CAE is trained for 200 epochs while an individual LSTM is trained for 250 epochs. The
CAE architecture is given in Appendix B. At the test points, the FOM is run for Ti = 300000 seconds (300 snapshots),
or 40% of the total simulation time. This value is required to be high as the flow exhibits highly oscillatory behavior
initially, leading to very noisy latent variables that cannot be used for model training. As a result, latent variable
sequences are generated starting at the 200th snapshot, and Atrain does not contain time-series data of latent variables
before this point.
220
200
180
Re
160
140
Training Points
120 Test Points
48 53 58 63 68
d
Figure 9: Training and test design parameters for the 2D cylinder case.
15
CAE Latent Variable #1 CAE Latent Variable #1
1.0
Ground Truth Ground Truth
0.8 Ensemble LSTM #1 Single LSTM #1
Ensemble LSTM #2 0.8 Single LSTM #2
Ensemble LSTM #3 Single LSTM #3
0.6 Ensemble LSTM #4 Single LSTM #4
Ensemble LSTM #5 0.6 Single LSTM #5
Latent Variable Magnitude
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
0.6
0.6
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
Time Snapshot Time Snapshot
CAE Latent Variable #2 CAE Latent Variable #2
0.8 Ground Truth Ground Truth
Ensemble LSTM #1 1.00 Single LSTM #1
Ensemble LSTM #2 Single LSTM #2
0.6 Ensemble LSTM #3 Single LSTM #3
Ensemble LSTM #4 0.75 Single LSTM #4
Ensemble LSTM #5 Single LSTM #5
Latent Variable Magnitude
0.4
0.6 0.6
0.5 0.5
Latent Variable Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
0.6 0.6
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
Time Snapshot Time Snapshot
Figure 10: Comparison of latent variable trajectories for the 2D cylinder case at µ∗ = [51, 142.2].
16
Test Case Index d Re
1 53 231.6
2 51 142.2
3 65 154.8
4 57 171.0
5 61 225.6
Figure 10 shows the latent variable trajectories for the first four latent variables at the test point µ∗ = [51, 142.2]. For
each latent variable, the predictions given by single LSTMs are initially similar, but eventually diverge in terms of
both the amplitude and frequency of the latent variables, leading to large inaccuracies. Using LSTM ensembles greatly
reduces this effect, and the resultant latent variable trajectories follow the ground truth closely and do not differ greatly
in amplitude nor frequency. The regular pattern of the Kármán vortex street is well-predicted given a small amount of
initial latent variable history.
Figure 11 shows snapshots of u and v at T = 750000 as well as ROM errors averaged over the last 10% of the
simulation for a single seed. The errors are significantly lower throughout the computational domain when using
LSTM ensembles, which are greatest in the wake of the cylinder and immediately downstream of it. Table VI lists
the seed-averaged relative errors in u and v at the test points. The ensemble ROM again offers better performance in
predicting both fields at all of the test points. Table VII lists the standard deviation of these errors, which are again
lower at all of the test points. At the fourth test case index (µ∗ = [57, 171.0]), the errors and standard deviations are
similar. This test point lies in the middle of the parameter space and is well-surrounded by training samples; as a result,
the advantage gained using an ensemble method may be marginal. Table VIII lists the computational costs associated
with the problem. Using the ROM for inference offers a speed-up of approximately 27x over CFD (again, the given
CFD and ROM inference costs are for the portion of the simulation over the prediction time horizon). Due to the
increased domain size and amount of training data, the CAE and LSTM are more expensive to train when compared to
the lid-driven cavity case.
Test Case Index ϵ̄, u (Ensemble) ϵ̄, u (Single) ϵ̄, v (Ensemble) ϵ̄, v (Single)
1 0.04324 0.07975 0.3379 0.6162
2 0.03463 0.05838 0.2971 0.4871
3 0.02380 0.05195 0.1399 0.3315
4 0.05446 0.06636 0.3995 0.4900
5 0.03655 0.06029 0.2269 0.4059
Table VI: Average relative errors in u and v for the 2D cylinder case. Boldface denotes the lowest values.
Table VII: Standard deviation in relative errors of u and v for the 2D cylinder case. Boldface denotes the lowest values.
17
Figure 11: Time snapshots of u and v at T = 750000 and errors averaged over the last 10% of the simulation at
µ∗ = [51, 142.2].
18
4 Conclusion
This work presents a fully data-driven ROM framework for time-dependent physics simulations using convolutional
autoencoders and LSTM ensembles. The combined framework, referred to as the CAE-eLSTM ROM, is used to
mitigate the issue of error propagation when performing autoregressive time-series prediction at unseen data sets, a
common problem in data-driven CFD applications. The ROM obtains a low-dimensional solution manifold using
convolutional autoencoders, a type of artificial neural network useful for reconstructing spatially distributed data. Using
the encoder section of the autoencoder, the low-dimensional latent variables of the training data are used to generate
multivariate time-series sequences that are used to train LSTMs, a type of recurrent neural network useful for modeling
sequential data. Bagging, a popular ensemble learning technique, is used to train multiple base LSTMs on subsets
of the training sequences sampled randomly with replacement. Ensemble learning is chosen as it is a useful tool for
improving the stability and accuracy of machine learning methods. At unseen test points, the full-order model is run to
obtain an initial sequence of latent variables through the encoder and autoregressive time-series prediction is used to
predict the latent variables over a prescribed time horizon. The predicted latent variables are then passed through the
decoder to obtain spatial reconstructions of the FOM at different points in time.
When applied to two incompressible, unsteady, laminar fluid dynamics problems, the proposed ROM exhibits high
levels of accuracy when compared to a ROM using a single LSTM model. The use of LSTM ensembles significantly
diminishes the error propagation issue, with latent variable trajectories from different weight initializations showing
low levels of variance. Error metrics show that using the ensemble ROM leads to better predictive performance at all
test points. Furthermore, the fully data-driven nature of the ROM allows it to be applied to two different CFD solvers.
Although the cost associated with training ensemble neural networks is high, bagging allows for models to be trained in
parallel, and multi-GPU architectures can be used to significantly reduce the cost. The addition of an ensemble model
for time-series predictions, while simple, greatly improves the performance of the ROM and does not require additional
training data. A limitation of the presented framework is its significantly increased offline training cost. While this can
be alleviated to a great degree by training weak learners in parallel, multi-GPU platforms are expensive and may not
be readily available for practitioners. Additionally, the use of convolutional autoencoders limits the application of the
ROM to structured grids unless a mesh interpolation method is used. Future work will focus on further developing the
method to apply it to three-dimensional geometries, problems exhibiting turbulent flow, and more complex geometries
involving unstructured meshes. Problems consisting of time-series datasets that exhibit non-smooth variations over long
periods of time will be of particular interest, as they are more representative of real-world applications and can reap
greater benefits from ensemble learning. Computationally efficient optimal hyperparameter selection methods will also
be explored to lower the offline costs of training the ROM.
Author Declarations
Conflict of Interest
Author Contributions
Rakesh Halder: Conceptualization (equal), Formal Analysis (lead), Methodology (lead), Software (equal), Writ-
ing/Original Draft Preparation (lead)
Mohammadmehdi Ataei: Conceptualization (equal), Software (equal), Supervision (equal), Writing/Review &
Editing (equal)
Hesam Salehipour: Conceptualization (equal), Software (equal), Supervision (equal), Writing/Review & Editing
(equal)
Krzysztof Fidkowski: Writing/Review & Editing (equal)
Kevin Maki: Writing/Review & Editing (equal)
19
Appendix A Reconstruction Errors
Reconstruction error comparisons between CAE and POD in u and v are given for both test cases. The same test points
from the results section are used. Figure 13 shows the reconstruction errors between CAE (k = 4) and POD at k = 4
and k = 25 for the lid-driven cavity case at µ∗ = [1.299, 1.689, −0.0367]. For the same latent dimension, the CAE
reconstruction errors are significantly smaller than those from POD throughout the domain in both u and v. For k = 25,
reconstruction errors from POD are similar to that of CAE. Figure 12 shows reconstruction errors for the cylinder case
at µ∗ = [51, 142.2]. Again, it is shown that for the same latent dimension, the reconstruction error from a CAE is much
smaller than from POD. At k = 50, the error contours are comparable. Using POD would require the LSTM model to
handle a very large multivariate time-series problem, which would degrade the accuracy of predictions.
Figure 12: Reconstruction errors at T = 5 in u and v between POD and CAE (k = 4) at µ∗ = [51, 142.2].
20
Figure 13: Reconstruction errors at T = 5 in u and v between POD and CAE (k = 4) at µ∗ = [1.299, 1.689, −0.0367].
21
Appendix B Convolutional autoencoder architectures
The convolutional autoencoder architectures for the lid-driven cavity and 2D cylinder cases are given in Tables IX
and X respectively. Both architectures are similar and use convolutional layers in the encoder section to progressively
reduce the number of pixels or grow the number of filters in subsequent layers. Convolutional layers are usually
followed max-pooling layers that have batch normalization46 applied to them (referred to as pool-norm layers). Batch
normalization normalizes the input of each layer over a mini-batch, reducing internal covariate shift, leading to more
efficient gradient flow during backpropagation. Fully connected layers are present in both the encoder and decoder.
Although their inclusion leads to a larger number of network parameters, reconstruction accuracies are significantly
improved. The decoder consists of convolutional transpose layers that are used to progressively increase the number of
pixels. The output layer uses a sigmoid activation function to constrain the range of outputs to [0,1]. The leaky ReLU
activation function is used for convolutional and fully connected layers, and is given as,
x, if x ≥ 0
ϕ(x) = . (17)
αx, if x < 0
A value of α = 0.25 is used for all leaky ReLU activation functions. The activation function is often used over the
standard ReLU activation function (α = 0). For inputs less than 0, the ReLU activation function returns a gradient of
zero, effectively rendering certain neurons inactive and inhibiting gradient flow, referred to as the dying ReLU problem.
Table IX: Convolutional autoencoder architecture for the lid-driven cavity case.
22
Layer Number of Filters Kernel Size Activation Function Size of Output
Input 384 × 128 × 2
Convolutional 8 5×5 Leaky ReLU 192 × 64 × 8
Batch Norm 192 × 64 × 8
Convolutional 16 5×5 Leaky ReLU 96 × 32 × 16
Batch Norm 96 × 32 × 16
Convolutional 32 3×3 Leaky ReLU 96 × 32 × 32
Pool-Norm 2×2 48 × 16 × 32
Convolutional 64 3×3 Leaky ReLU 48 × 16 × 64
Pool-Norm 2×2 24 × 8 × 64
Reshape 12288
Fully Connected Leaky ReLU 128
Batch Norm 128
Fully Connected (Latent Space) 10
Fully Connected Leaky ReLU 128
Batch Norm 128
Fully Connected Leaky ReLU 12288
Batch Norm 12288
Reshape 24 × 8 × 64
Convolutional Transpose 32 3×3 Leaky ReLU 48 × 16 × 32
Batch Norm 48 × 16 × 32
Convolutional Transpose 16 3×3 Leaky ReLU 96 × 32 × 16
Batch Norm 96 × 32 × 16
Convolutional Transpose 8 5×5 Leaky ReLU 192 × 64 × 8
Batch Norm 192 × 64 × 8
Convolutional Transpose 2 5×5 Sigmoid 384 × 128 × 2
23
References
1
David J Lucia, Philip S Beran, and Walter A Silva. Reduced-order modeling: new approaches for computational
physics. Progress in aerospace sciences, 40(1-2):51–117, 2004.
2
G Berkooz, PJ Holmes, and John Lumley. The proper orthogonal decomposition in the analysis of turbulent flows.
Annual Review of Fluid Mechanics, 25:539–575, 11 2003.
3
J.S. Hesthaven and S. Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neural networks.
Journal of Computational Physics, 363:55–78, 2018.
4
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
5
Kookjin Lee and K. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional
autoencoders. J. Comput. Phys., 404, 2020.
6
Rakesh Halder, Krzysztof J Fidkowski, and Kevin J Maki. Non-intrusive reduced-order modeling using convolutional
autoencoders. International Journal for Numerical Methods in Engineering, 123(21):5369–5390, 2022.
7
Hamidreza Eivazi, Soledad Le Clainche, Sergio Hoyas, and Ricardo Vinuesa. Towards extraction of orthogonal and
parsimonious non-linear modes from turbulent flows. Expert Systems with Applications, 202:117038, 2022.
8
Alberto Solera-Rico, Carlos Sanmiguel Vila, M. A. Gómez, Yuning Wang, Abdulrahman Almashjary, Scott T. M.
Dawson, and Ricardo Vinuesa. β-variational autoencoders and transformers for reduced-order modelling of fluid
flows, 2023.
9
Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential
equations with physics-informed DeepONets. Science advances, 7(40):eabi8605, 2021.
10
Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli,
and Anima Anandkumar. Physics-informed neural operator for learning partial differential equations. ACM/JMS
Journal of Data Science, 2021.
11
Sebastian Grimberg, Charbel Farhat, and Noah Youkilis. On the stability of projection-based model order reduction
for convection-dominated laminar and turbulent flows. Journal of Computational Physics, 419:109681, 2020.
12
Sk M Rahman, Suraj Pawar, Omer San, Adil Rasheed, and Traian Iliescu. Nonintrusive reduced order modeling
framework for quasigeostrophic turbulence. Physical Review E, 100(5):053306, 2019.
13
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang,
Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
14
Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with LSTM. Neural
computation, 12(10):2451–2471, 2000.
15
Tim Hill, Marcus O’Connor, and William Remus. Neural network models for time series forecasts. Management
science, 42(7):1082–1092, 1996.
16
Sangseung Lee and Donghyun You. Data-driven prediction of unsteady flow over a circular cylinder using deep
learning. Journal of Fluid Mechanics, 879:217–254, 2019.
17
Ricardo Vinuesa and Steven L Brunton. Enhancing computational fluid dynamics with machine learning. Nature
Computational Science, 2(6):358–366, 2022.
18
Pin Wu, Feng Qiu, Weibing Feng, Fangxing Fang, and Christopher Pain. A non-intrusive reduced order model with
transformer neural network and its application. Physics of Fluids, 34(11), 2022.
19
Romit Maulik, Bethany Lusch, and Prasanna Balaprakash. Reduced-order modeling of advection-dominated systems
with recurrent neural networks and convolutional autoencoders. Physics of Fluids, 33(3), 2021.
20
Kazuto Hasegawa, Kai Fukami, Takaaki Murata, and Koji Fukagata. Machine-learning-based reduced-order modeling
for unsteady flows around bluff bodies of various shapes. Theoretical and Computational Fluid Dynamics, 34:367–
383, 2020.
21
Joongoo Jeon, Juhyeong Lee, Ricardo Vinuesa, and Sung Joong Kim. Residual-based physics-informed transfer
learning: A hybrid method for accelerating long-term cfd simulations via deep learning. International Journal of
Heat and Mass Transfer, 220:124900, 2024.
22
Omer Sagi and Lior Rokach. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 8(4):e1249, 2018.
23
Markus Mrosek, Carsten Othmer, and Rolf Radespiel. Reduced-order modeling of vehicle aerodynamics via proper
orthogonal decomposition. SAE International Journal of Passenger Cars - Mechanical Systems, 12(3):225–236, oct
2019.
24
24
Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, Richard Howard, Wayne Hubbard, and Lawrence
Jackel. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing
Systems, 2, 1989.
25
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,
Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of
Computer Vision, 115:211–252, 2015.
26
Jiayang Xu and Karthik Duraisamy. Multi-level convolutional autoencoder networks for parametric prediction of
spatio-temporal dynamics. Computer Methods in Applied Mechanics and Engineering, 372:113379, 2020.
27
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning Representations by Back-propagating
Errors. Nature, 323(6088):533–536, 1986.
28
Sepp Hochreiter. The vanishing gradient problem during learning recurrent neural nets and problem solutions.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02):107–116, 1998.
29
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
30
Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal-Japanese Society For
Artificial Intelligence, 14(771-780):1612, 1999.
31
Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
32
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.
Advances in neural information processing systems, 32, 2019.
33
Ruichen Jin, Wei Chen, and Agus Sudjianto. An efficient algorithm for constructing optimal design of computer
experiments. Journal of Statistical Planning and Inference, 134(1):268–287, September 2005.
34
J. Sola and J. Sevilla. Importance of input data normalization for the application of neural networks to complex
industrial problems. IEEE Transactions on Nuclear Science, 44:1464–1468, 1997.
35
Henry G Weller, Gavin Tabor, Hrvoje Jasak, and Christer Fureby. A tensorial approach to computational continuum
mechanics using object-oriented techniques. Computers in physics, 12(6):620–631, 1998.
36
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple
way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
37
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014. cite arxiv:1412.6980
Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San
Diego, 2015.
38
David J Tritton. Experiments on the flow past a circular cylinder at low reynolds numbers. Journal of Fluid Mechanics,
6(4):547–567, 1959.
39
Eileen M Saiki and S Biringen. Numerical simulation of a cylinder in uniform flow: application of a virtual boundary
method. Journal of computational physics, 123(2):450–465, 1996.
40
Mohammadmehdi Ataei and Hesam Salehipour. XLB: A differentiable massively parallel lattice Boltzmann library
in Python. Computer Physics Communications, 300:109187, 2024.
41
Shiyi Chen and Gary D Doolen. Lattice Boltzmann method for fluid flows. Annual review of fluid mechanics,
30(1):329–364, 1998.
42
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George
Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations
of Python+NumPy programs, 2018.
43
J. H. Gerrard. Flow around circular cylinders; volume 1. fundamentals. by M. M. Zdravkovich. oxford science
publications, 1997. Journal of Fluid Mechanics, 350:375–378, 1997.
44
BN Rajani, A Kandasamy, and Sekhar Majumdar. Numerical simulation of laminar flow past a circular cylinder.
Applied Mathematical Modelling, 33(3):1228–1247, 2009.
45
Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal
Processing, 45(11):2673–2681, 1997.
46
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal
covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
25