008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
Abstract
This work proposes an extension of neural ordinary differential equations (NODEs) by introducing an
additional set of ODE input parameters to NODEs. This extension allows NODEs to learn multiple dynamics
specified by the input parameter instances. Our extension is inspired by the concept of parameterized ordinary
differential equations, which are widely investigated in computational science and engineering contexts,
where characteristics of the governing equations vary over the input parameters. We apply the proposed
parameterized NODEs (PNODEs) for learning latent dynamics of complex dynamical processes that arise
in computational physics, which is an essential component for enabling rapid numerical simulations for
time-critical physics applications. For this, we propose an encoder-decoder-type framework, which models
latent dynamics as PNODEs. We demonstrate the effectiveness of PNODEs with important benchmark
problems from computational physics.
Keywords: model reduction, deep learning, autoencoders, machine learning, nonlinear manifolds, neural
ordinary differential equations, latent-dynamics learning
1. Introduction
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned
subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
for a set of training parameter instances, ii) build a parameterized surrogate model, and iii) fit the model by
training with the data collected from the step i).
In the field of deep-learning, similar efforts have been made for learning latent dynamics of various
physical processes [21, 6, 31, 39, 13]. Neural ordinary differential equations (NODEs), a method of learning
time-continuous dynamics in the form of a system of ordinary differential equations from data, comprise
a particularly promising approach for learning latent dynamics of dynamical systems. NODEs have been
studied in [45, 6, 40, 27, 7, 15], and this body of work has demonstrated their ability to successfully learn
latent dynamics and to be applied to downstream tasks [6, 39].
Because NODEs learn latent dynamics in the form of ODEs, NODEs have a naturally good fit as a
latent-dynamics model in reduced-order modeling of physical processes and have been applied to several
computational physics problems including turbulence modeling [34, 30] and future states predictions in fluids
problems [1]. As pointed out in [10, 5], however, NODEs learn a single set of network weights, which fits best
for a given training data set. This results in an NODE model with limited expressibility and often leads to
unnecessarily complex dynamics [12]. To overcome this shortcoming, we propose to extend NODEs to have a
set of input parameters that specify the dynamics of the NODE model, which leads to parameterized NODEs
(PNODEs). With this simple extension, PNODEs can represent multiple trajectories such that the dynamics
of each trajectory are characterized by the input parameter instance.
The main contributions of this paper are
• an extension to NODEs that enables them to learn multiple trajectories with a single set of network
weights; even for the same initial condition, the dynamics can be different for different input parameter
instances,
• a framework for learning latent dynamics of parameterized ODEs arising in computational physics
problems,
• a demonstration of the effectiveness of the proposed framework with advection-dominated benchmark
problems, which are a class of problems where classical linear latent-dynamics learning methods (e.g.,
principal component analysis) often fail to learn accurately [25].
2. Related work
Classical reduced-order modeling. Classical reduced-order modeling (ROM) techniques rely heavily on linear
methods such as the proper orthogonal decomposition (POD) [19], which is analogous to principal component
analysis [20], for constructing the mappings between a high-dimensional space and a low-dimensional space.
These ROMs then identify the latent-dynamics model by executing a (linear) projection process on the
high-dimensional equations e.g., Galerkin projection [19]. We refer readers to [2, 3] for a complete survey on
classical methods.
Physics-aware deep-learning-based reduced-order modeling . Recent work has extended classical ROMs by
replacing proper orthogonal decomposition with nonlinear dimension reduction techniques emerging from
deep learning [25, 26]. These approaches operate by identifying a nonlinear mapping (via, e.g., convolutional
autoencoders) and subsequently identifying the latent dynamics as certain residual minimization problems,
which are defined on the latent space and are derived from the governing equations.
Another class of physics-aware methods include explicitly modeling time integration schemes [32, 41, 47, 14]
and adding stability/structure-preserving constraints in the latent dynamics [11, 18]. We emphasize that
our approach is closely related to [41], where neural networks are trained to approximate the action of the
first-order time-integration scheme applied to latent dynamics and, at each time step, neural network takes a
set of problem-specific parameters as well as reduced state as an input. Thus, our approach can be seen as a
time-continuous generalization of the approach in [41].
2
Purely data-driven deep-learning-based reduced-order modeling. Another approach for developing deep-
learning-based ROMs is to learn both nonlinear mappings and latent dynamics in purely data-driven ways.
Latent dynamics are modeled as recurrent neural networks with long short-term memory (LSTM) units along
with linear POD mappings [44, 37, 30] or nonlinear mappings constructed via (convolutional) autoencoders
[16, 46, 29, 42].
Enhancing NODE. Augmented NODEs [10] extends NODEs by augmenting additional state variables
to hidden state variables, which allows NODEs to learn dynamics using the additional dimensions and,
consequently, to have increased expressibility. ANODE [15] discretize the integration range into a fixed
number of steps (i.e., checkpoints) to mitigate numerical instability in the backward pass of NODEs; ACA
[50] further extends this approach by adopting adaptive stepsize solver in the bardward pass. ANODEV2
[49] proposes a coupled system of neural ODEs, where both hidden state variables and network weights are
allowed to evolve over time and their dynamics are approximated as neural networks. Neural optimal control
[5] formulates an NODE model as a controlled dynamical system and infers optimal control via an encoder
network This formulation results in an NODE that adjusts the dynamics for different input data. Moreover,
improved training strategies for NODEs have been studied in [12] and an extension of using spectral elements
in discretizations of NODE has been proposed in [35].
3. Neural ODE
Neural ODEs (NODEs) are a family of deep neural network models that parameterize the time-continuous
dynamics of hidden states using a system of ODEs:
dz(t)
= fΘ (z(t), t; Θ), (3.1)
dt
where z(t) is a time-continuous representation of a hidden state, fΘ is a parameterized velocity function,
which defines the dynamics of hidden states over time, and Θ is a set of neural network weights. Given the
initial condition z(0) (i.e., input), a hidden state at any time index z(t) can be obtained by solving the initial
value problem (IVP) (3.1). To solve the IVP, a black-box differential equation solver can be employed and
the hidden states can be computed with the desired accuracy:
z 1 , . . . , z nt = ODESolve(z(0), fΘ , t1 , . . . , tnt ). (3.2)
In the backward pass, as proposed in [6], gradients are computed by solving another system of ODEs, which
are derived using the adjoint sensitivity method [33], which allows memory efficient training of the NODE
model. As pointed out in the papers [10, 5], an NODE model learns a single dynamics for the entire data
distribution and, thus, results in a model with limited expressivity.
3
5. Applications to computational physics problems
We now investigate PNODEs within the context of performing model reduction of computational physics
problems. We start by formally introducing the full-order model that we seek to reduce. We then describe
our proposed framework, which uses PNODEs (or NODEs) as the reduced-order (latent-dynamics) model.
where û(t; µ), û : [0, T ] × D → Rp denotes the reduced state, which is a low-dimensional representative
state of the high-dimensional state (i.e., p N ). Analogously, û0 (µ), û0 : D → Rp denotes the reduced
parameterized initial condition, and fˆ(û, t; µ), fˆ : Rp × [0, T ] × D → Rp denotes the reduced velocity. The
objective of the ROM is to learn both a nonlinear mapping and a latent-dynamics model such that the ROM
generates accurate approximate solutions to the full-order model solution, i.e., d (û) ≈ u.
where fˆΘ (·, ·; ·, Θ) : Rp × [0, T ] × D → Rp denotes the reduced velocity, i.e., modeling a ROM (Eq. (5.2)) as
PNODE. To achieve this goal, we propose a framework, where, besides a latent-dynamics model described by
the PNODE, two additional functions are required: i) an encoder, which maps a high-dimensional initial state
u0 (µ) to a reduced initial state û0 (µ), and ii) a decoder, which maps a set of reduced states ûk , k = 1, . . . , nt
to a set of high-dimensional approximate states ũk , k = 1, . . . , nt . We approximate these functions with
two neural networks: the encoder û = h enc (u; θ enc ), h enc : RN → Rp and the decoder ũ = h dec (û; θ dec ),
h dec : Rp → RN (i.e., d = h dec ). Here, θ = (θθ enc , θ dec ) are the network weights.
With all these neural networks defined, the forward pass of the framework can be described as
1. encode a reduced initial state from the given initial condition: û0 (µ) = h enc (u0 (µ); θ enc ),
2. solve a system of ODEs defined by PNODE (or NODE):
4
Figure 1: The forward pass of the proposed framework: i) the encoder (red arrow), which provides a reduced initial state to
the PNODE, ii) solving PNODE (or NODE) with the initial state results in a set of reduced states, and iii) the decoder (blue
arrows), which maps the reduced states to high-dimensional approximate states.
3. decode a set of reduced states to a set of high-dimensional approximate states: ũk = h dec (ûk ; θ dec ), k =
1, . . . , nt , and
4. compute a loss function L(ũ1 , . . . , ũnt , u1 , . . . , unt ).
Figure 1 illustrates the computational graph of the forward pass in the proposed framework. We emphasize
that the proposed framework only takes the initial states from the training data and the problem-specific
ODE parameters µ as an input. PNODEs still can learn multiple trajectories, which are characterized by
the ODE parameters, even if the same initial states are given for different ODE parameters, which is not
achievable with NODEs. Furthermore, the proposed framework is significantly simpler than the common
neural network settings for NODEs when they are used to learn latent dynamics: the sequence-to-sequence
architectures as in [6, 39, 48, 29], which require that a (part of) sequence is fed into the encoder network to
produce a context vector, which is then fed into the NODE decoder network as an initial condition.
6. Numerical experiments
In the following, we apply the proposed framework for learning latent dynamics of parameterized dynamics
from computational physics problems. We then demonstrate the effectiveness of the proposed framework
with results of numerical experiments performed on these benchmark problems.
where nt is the number of time steps. The mode-2 unfolding [23] of the solution tensor U gives
where U (µktrain ) ∈ RN ×(nt +1) consists of the FOM solution snapshots for µktrain and the first column
corresponds to the initial condition u0 (µktrain ). Among the collected solution snapshots, only the first columns
5
of U (µktrain ), k = 1, . . . , ntrain (i.e., the initial conditions) are fed into the framework, the rest of solution
snapshots are used in computing the loss function.
Assuming the FOM arises from a spatially discretized partial differential equation, the total degrees of
freedom N can be defined as N = nu × n1 × · · · × nnd , where nu is the number of different types of solution
variables (e.g., chemical species), and nnd denotes the number of spatial dimensions of the partial differential
equation. Note that this spatially-distributed data representation is analogous to multi-channel images (i.e.,
nu corresponds to the number of channels); as such we utilize (transposed) convolutional layers [24, 17] in
our encoder and decoder.
U (µktest ) − Ũ
kU U (µktest )kF
k
, (6.1)
kUU (µtest )kF
6
Table 1: Network architecture: kernel filter length κ, number of kernel filters nκ , and strides s at each (transposed) convolutional
layers.
Encoder Decoder
Conv-layer (4 layers) FC-layer (1 layer)
κ [16, 8, 4, 4] din = p, dout = 128
nκ [ 8, 16, 32, 64] Trans-conv-layer (4 layers)
s [ 2, 4, 4, 4] κ [ 4, 4, 8, 16]
FC-layer (1 layer) nκ [32, 16, 8, 1]
din = 128, dout = p s [ 4, 4, 4, 2]
The relative errors for NODE and PNODE are 2.6648 × 10 and 2.6788 × 10 ; the differences between the
−3 −3
2 For setting p, we follow the results of the study on the effective latent dimension shown in [25].
5 5 5
4 4 4
w(x, t)
w(x, t)
w(x, t)
3 3 3
2 2 2
1 1 1
5 5 5
4 4 4
w(x, t)
w(x, t)
w(x, t)
3 3 3
2 2 2
1 1 1
Figure 2: Snapshots of reference solutions at t = {7.77, 11.7, 19.5, 23.3, 27.2, 35.0}.
7
5 5
4 4
w(x, t)
w(x, t)
3 3
2 2
1 1
Training parameters instances
0 100 0 100
x parameters instances
Validating x
(a) NODE (b) PNODE
Testing parameters instance
Figure 3: Reconstruction: snapshots of reference solutions and approximated solutions using NODE (left) and PNODE (right)
35
at t = { 15 k}15
k=1 .
parameter instances correspond to Dtrain = {(4.25 + (0.139)k, 0.015)}, k = 0, 2, 4, 6, the validating parameter
instances correspond to Dval = {(4.25 + (0.139)k, 0.015)}, k = 1, 5, and the testing parameter instances
correspond to Dtest = {(4.67, 0.015), (5.22, 0.015)}. Note that the initial condition is identical for all parameter
instances, i.e., u0 (µ) = 1.
(a) Scenario 1
(b) Scenario 2
We train the framework with NODE and PNODE with the same set of hyperparameters. Again, the
reduced dimension is set to p = 5. Figures 5a–5b depict snapshots of reference solutions and approximated
solutions using NODE and PNODE. Both NODE and PNODE learn the boundary condition (i.e., 4.67 at
x = 0) accurately. For NODE, this is only because the testing boundary condition is linearly in the middle
of two validating boundary conditions (and also in the middle of four training boundary conditions) and
minimizing the mean squared error results in learning a single trajectory with the NODE, where the trajectory
has a boundary condition, which is exactly the middle of two validating boundary conditions 4.389 and 4.944.
Moreover, as NODE learns a single trajectory that minimizes MSE, it actually fails to learn the correct
dynamics and results in poor approximate solutions as time proceeds. As opposed to NODE, the PNODE
accurately approximates solutions up to the final time. Table 2 (second row) shows the relative `2 -errors (Eq
6.1) for both NODE and PNODE.
Continuing from the previous experiment, we test the second testing parameter instance, Dtest =
{(5.22, 0.015)}, which is located outside Dtrain (i.e., next to µ(7) in Figure 4a). The results are shown in
Figures 5c–5d: the NODE only learns a single trajectory with the boundary condition, which lies in the
8
middle of validating parameter instances, whereas the PNODE accurately produces approximate solutions for
the new testing parameter instances. Table 2 (third row) reports the relative errors.
6 6
5 5
4 4
w(x, t)
w(x, t)
3 3
2 2
1 1
0 100 0 100
x x
(a) NODE, µ1test (b) PNODE, µ1test
6 6
5 5
4 4
w(x, t)
w(x, t)
3 3
2 2
1 1
0 100 0 100
x x
(c) NODE, µ2test (d) PNODE, µ2test
Figure 5: Prediction Scenario 1: snapshots of reference solutions (red) and approximated solutions (green) using NODE (left)
and PNODE (right) at t = { 35
15
k}15 1 2
k=1 for µtest = (4.67, 0.015) (top) and µtest = (5.22, 0.015) (bottom).
NODE PNODE
µ1test = µ(4) 4.3057 × 10−2 3.6547 × 10−3
µ2test = µ(8) 1.5740 × 10−1 5.6900 × 10−3
Next, in the second scenario, we vary both parameters µ1 and µ2 as shown in Figure 4b: the sets of the
training, validating, and testing parameter instances correspond to
Dtrain = {(4.25 + (0.139)k, 0.015 + (0.002)l)}, {(k, l)} = {(0, 0), (0, 2), (2, 0), (2, 2)},
Dval = {(4.25 + (0.139)k, 0.015 + (0.002)l)}, {(k, l)} = {(1, 0), (0, 1), (2, 1), (1, 2)},
Dtest = {(4.25 + (0.139)k, 0.0015 + (0.002)l)}, {(k, l)} = {(1, 1), (3, 2), (2, 3), (3, 3)}.
We have tested the set of testing parameter instances and Table 3 reports the relative errors; the result
shows that PNODE achieves sub 1% error in most cases. On the other hand, NODE achieves around 10%
errors in most cases. The 1.7% error of NODE for µ1test is achieved only because the testing parameter
instance is located in the middle of the validating parameter instances (and the training parameter instances).
9
Table 3: Prediction Scenario 2: the relative `2 -errors.
NODE PNODE
µ1test = µ(5) 1.7422 × 10−2 3.2672 × 10−3
µ2test = µ(10) 1.0713 × 10−1 7.7303 × 10−3
µ3test = µ(11) 8.9229 × 10−2 8.5650 × 10−3
µ4test = µ(12) 1.2377 × 10−1 1.0735 × 10−2
Figure 6: The geometry of the spatial domain for chemically reacting flow.
for i ∈ {H2 , O2 , H2 O}. Here, (vH2 , vO2 , vH2 O ) = (2, 1, −2) denote stoichiometric coefficients, (WH2 , WO2 , WH2 O ) =
(2.016, 31.9, 18) denote molecular weights in units g·mol−1 , ρ = 1.39×10−3 g·cm−3 denotes the density mixture,
R = 8.314J·mol−1 ·K−1 denotes the universal gas constant, and Q = 9800K denotes the heat of the reaction.
The problem has two input parameters (i.e., nµ = 2), which correspond to µ = (µ1 , µ2 ) = (A, E), where A
and E denote the pre-exponential factor and the activation energy.
Figure 6 depicts the geometry of the spatial domain and the boundary conditions are set as:
• Γ2 : the inflow boundary with Dirichlet boundary conditions wT = 950K, and (wH2 , wO2 , wH2 O ) =
(0.0282, 0.2259, 0),
• Γ1 and Γ3 : the Dirichlet boundary conditions wT = 300K, and (wH2 , wO2 , wH2 O ) = (0, 0, 0),
• Γ4 , Γ5 , and Γ6 : the homogeneous Neumann condition,
10
1600
1200
800
400
0.024
0.018
0.012
0.006
0.000
0.20
0.15
0.10
0.05
0.00
0.10
0.08
0.06
0.04
0.02
0.00
Figure 7: Snapshots of reference solutions of temperature (first row), H2 (second row), O2 (third row), and H2 O (fourth row) at
t = {0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06} (from left to right).
and the initial condition is set as wT = 300K, and (wH2 , wO2 , wH2 O ) = (0, 0, 0) (i.e., empty of chemical
species). For collecting data, we employ a finite-difference method with 64 × 32 uniform grid points (i.e.,
N = nµ × n1 × n2 = 4 × 64 × 32), the second-order backward Euler method (BDF2) with a uniform time
step ∆t = 10−4 and the final time 0.06 (i.e., nt = 600). Figure 7 depicts snapshots of the reference solutions
of each species for the training parameter instance (µ1 , µ2 ) = (2.3375 × 1012 , 5.6255 × 103 ).
Table 4: Network architecture: kernel filter length κ = κ1 = κ2 , number of kernel filters nκ , and strides s = s1 = s2 at each
(transposed) convolutional layer.
Encoder Decoder
Conv-layer (4 layers) FC-layer (1 layer)
κ [16, 8, 4, 4] din = p, dout = 512
nκ [ 8, 16, 32, 64] Trans-conv-layer (4 layers)
s [ 2, 2, 2, 2] κ [ 4, 4, 8, 16]
FC-layer (1 layer) nκ [32, 16, 8, 4]
din = 512, dout = p s [ 2, 2, 2, 2]
3 We have not investigated different scaling strategies for scaling of parameter instances as it is not the focus of this study.
11
testing parameter instances correspond to
Dtrain = {(2.3375 × 1012 + (0.5946 × 1012 )k,5.6255 × 103 + (0.482 × 103 )l)},
{(k, l)} = {(0, 0), (0, 2), (0, 4), (2, 0), (2, 2), (2, 4)},
12
Dval = {(2.3375 × 10 + (0.5946 × 10 )k,5.6255 × 103 + (0.482 × 103 )l)},
12
{(k, l)} = {(0, 1), (1, 0), (1, 4), (2, 3)},
12
Dtest = {(2.3375 × 10 + (0.5946 × 10 )k,5.6255 × 103 + (0.482 × 103 )l)},
12
{(k, l)} = {(0, 3), (1, 1), (1, 2), (1, 3), (2, 1), (3, 0), (3, 1), (4, 0)}.
Table 5 presents the relative `2 -errors of approximate solutions computed using NODE and PNODE
for testing parameter instances in the predictive scenario. The first three rows in Table 5 correspond to
the results of testing parameter instances at the middle three red circles in Figure 8. As expected, both
NODE and PNODE work well for these testing parameter instances: NODE is expected to work well for
these testing parameter instances because the single trajectory that minimizes the MSE over validating
parameter instances would be the trajectory associated with the testing parameter µ(8) . As we consider
testing parameter instances that are distant from µ(8) , we observe PNODE to be (significantly) more accurate
than NODE. From these observations, the NODE model can be considered as being overfitted to a trajectory
that minimizes the MSE. This overfitting can be avoided to a certain extent by applying e.g., early-stopping,
however, this cannot fundamentally fix the problem of the NODE (i.e., fitting a single trajectory for all input
data distributions).
NODE PNODE
µ1test = µ(7) 9.2823 × 10−3 4.2993 × 10−3
µ2test = µ(8) 3.3450 × 10−3 4.6429 × 10−3
µ3test = µ(9) 4.1516 × 10−3 5.0617 × 10−3
µ4test = µ(4) 4.0835 × 10−2 5.6011 × 10−3
µ5test = µ(12) 3.4767 × 10−2 4.4133 × 10−3
µ6test = µ(16) 5.9410 × 10−2 1.2935 × 10−2
µ7test = µ(17) 5.4553 × 10−2 1.1785 × 10−2
µ8test = µ(18) 7.4881 × 10−2 2.4660 × 10−2
12
6.5. Problem 3: Quasi-1D Euler equation
For the third benchmark problem, we consider the quasi-one-dimensional Euler equations for modeling
inviscid compressible flow in a one-dimensional converging–diverging nozzle with a continuously varying
cross-sectional area [28]. The system of the governing equations is
∂w 1 ∂f (w)
+ = g(w),
∂t A ∂x
where
ρ ρu 0
p ∂A
w = ρu , f (w) = ρu2 + p , g(w) = A ∂x
,
e (e + p)u 0
2
with p = (γ − 1)ρ, = ρe − u2 , and A = A(x). Here, ρ denotes density, u denotes velocity, p denotes pressure,
denotes energy per unit mass, e denotes total energy density, γ denotes the specific heat ratio, and A(x)
denotes the converging–diverging nozzle cross-sectional area. We consider a specific heat ratio of γ = 1.3, a
specific heat constant of R = 355.4m2 /s2 /K, a total temperature of Ttotal = 300K, and a total pressure of
ptotal = 106 N/m2 . The cross-sectional area A(x) is determined by a cubic spline interpolation over the points
(x, A(x)) = {(0, 0.2), (0.25, 1.05µ), (0.5, µ), (0.75, 1.05µ), (1, 0.2)}, where µ determines the width of the middle
cross-sectional area. Figure 9 depicts the schematic figures of converging–diverging nozzle determined by
A(x), parameterized by the width of the middle cross-sectional area, µ. A perfect gas, which obeys the ideal
gas law (i.e., p = ρRT ), is assumed.
Figure 9: The geometry of the spatial domain for quasi-1D Euler equation of converging–diverging nozzle.
For the initial condition, the initial flow field is computed as follows; a zero pressure-gradient flow field is
constructed via the isentropic relations,
γ+1
! 2(γ−1) −γ
γ−1 2
1+ 2 M (x) γ−1
γ−1
M m Am
M (x) = , p(x) = ptotal 1+ M (x)2 ,
A(x) 1 + γ−1
2 Mm
2 2
−1 s
γ−1
p(x) p(x)
T (x) = Ttotal 1+ M (x)2 , ρ(x) = , c(x) = R , u(x) = M (x)c(x),
2 RT (x) ρ(x)
where M denotes the Mach number, c denotes the speed of sound, a subscript m indicates the flow quantity
at x = 0.5 m. The shock is located at x = 0.85 m and the velocity across the shock (u2 ) is computed by
using the jump relations for a stationary shock and the perfect gas equation of state. The velocity across the
shock satisfies the quadratic equation
1 γ γ n
− u22 + u2 − h = 0,
2 γ−1 γ−1m
13
where m = ρ2 u2 = ρ1 u1 , n = ρ2 u22 + p2 = ρ1 u21 + p1 , h = (e2 + p2 )/ρ2 = (e1 + p1 )/ρ1 . The subscripts 1
and 2 indicates quantities to the left and to the right of the shock. We consider a specific Mach number of
Mm = 2.0.
For spatial discretization, we employ a finite-volume scheme with 128 equally spaced control volumes and
fully implicit boundary conditions, which leads to N = nu n1 = 3 × 128 = 384. At each intercell face, the Roe
flux difference vector splitting method is used to compute the flux. For time discretization, we employ the
backward Euler scheme with a uniform time step ∆t = 10−3 and the final time 0.6 (i.e., nt = 600). Figure
10 depicts the snapshots of reference solutions of Mach number M (x) for the middle cross-sectional area
µ = 0.15 at t = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}.
M
1.0 1.0 1.0
Figure 10: Snapshots of reference solutions of Mach number M (x) for µ = 0.15 at t = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}.
The varying parameter of this problem is the width of the middle cross-sectional area, which determines
the geometry of the spatial domain and, thus, determines the initial condition as well as the dynamics.
Analogously to the previous two benchmark problems, we select 4 training parameter instances, 3 validating
parameter instances, and 3 testing parameter instances (Figure 11):
14
Training parameters instances
Figure 11: Visualizations of parameter instances sampling for the quasi-1D Euler equations.
results in fairly distinct initial conditions, but does not significantly affect variations in dynamics; both the
initial condition and the dynamics are parameterized by the same input parameter, the width of the middle
cross-sectional area of the spatial domain.
Table 6: Network architecture: kernel filter length κ, number of kernel filters nκ , and strides s at each layer of (transposed)
convolutional layers.
Encoder Decoder
Conv-layer (5 layers) FC-layer (1 layer)
κ [16, 8, 4, 4, 4] din = p, dout = 512
nκ [16, 32, 64, 64, 128] Trans-conv-layer (5 layers)
s [ 2, 2, 2, 2, 2] κ [ 4, 4, 4, 8, 16]
FC-layer (1 layer) nκ [64, 64, 32, 16, 3]
din = 512, dout = p s [ 2, 2, 2, 2, 2]
Our general observation is that the benefits of using PNODE are most pronounced when the dynamics are
parameterized and there is a single initial condition. Moreover, we expect to see more improvements in the
approximation accuracy over NODE when the dynamics vary significantly for different input parameters, for
instance, modeling infectious diseases such as the novel corona virus (COVID-19) [43], where the dynamics of
transmission is greatly affected by parameters of the model, which are determined by e.g., quarantine policy,
social distancing.
7. Conclusions
In this study, we proposed a parameterized extension of neural ODEs and a novel framework for reduced-
order modeling of complex numerical simulations of computational physics problems. Our simple extension
allows neural ODE models to learn multiple complex trajectories. This extension overcomes the main drawback
of neural ODEs, namely that only a single set of dynamics are learned for the entire data distribution. We
have demonstrated the effectiveness of of parameterized neural ODEs on several benchmark problems from
computational fluid dynamics, and have shown that the proposed method outperforms neural ODEs.
8. Acknowledgments
This paper describes objective technical results and analysis. Any subjective views or opinions that might
be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the
United States Government. Sandia National Laboratories is a multimission laboratory managed and operated
by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell
International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under
contract DE-NA0003525.
15
2 2 2
M
M
1 1 1
0 0 0
0 1 0 1 0 1
x x x
(a) t = 0.1 (b) t = 0.2 (c) t = 0.3
2 2 2
M
M
1 1 1
0 0 0
0 1 0 1 0 1
x x x
(d) t = 0.4 (e) t = 0.5 (f) t = 0.6
Figure 12: Snapshots of reference solutions (solid red lines) and approximated solutions (dashed green lines) of Mach number
M (x) for µ = 0.15 at t = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}. The approximated solutions are obtained by using the framework with
PNODE.
[1] I. Ayed, E. de Bézenac, A. Pajot, J. Brajard, and P. Gallinari, Learning dynamical systems
from partial observations, arXiv preprint arXiv:1902.11136, (2019).
[2] P. Benner, S. Gugercin, and K. Willcox, A survey of projection-based model reduction methods
for parametric dynamical systems, SIAM review, 57 (2015), pp. 483–531.
[3] P. Benner, M. Ohlberger, A. Cohen, and K. Willcox, Model Reduction and Approximation:
Theory and Algorithms, SIAM, 2017.
[4] M. Buffoni and K. Willcox, Projection-based model reduction for reacting flows, in 40th Fluid
Dynamics Conference and Exhibit, 2010, p. 5008.
[5] M. Chalvidal, M. Ricci, R. VanRullen, and T. Serre, Neural optimal control for representation
learning, arXiv preprint arXiv:2006.09545, (2020).
[6] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential
equations, in Advances in neural information processing systems, 2018, pp. 6571–6583.
[7] M. Ciccone, M. Gallieri, J. Masci, C. Osendorfer, and F. Gomez, Nais-net: Stable deep
networks from non-autonomous differential equations, in Advances in Neural Information Processing
Systems, 2018, pp. 3025–3035.
[8] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by
exponential linear units (ELUs), arXiv preprint arXiv:1511.07289, (2015).
[9] J. R. Dormand and P. J. Prince, A family of embedded runge–kutta formulae, Journal of Computa-
tional and Applied Mathematics, 6 (1980), pp. 19–26.
16
[10] E. Dupont, A. Doucet, and Y. W. Teh, Augmented neural ODEs, in Advances in Neural Information
Processing Systems, 2019, pp. 3140–3150.
[11] N. B. Erichson, M. Muehlebach, and M. W. Mahoney, Physics-informed autoencoders for
Lyapunov-stable fluid flow prediction, arXiv preprint arXiv:1905.10866, (2019).
[12] C. Finlay, J.-H. Jacobsen, L. Nurbekyan, and A. M. Oberman, How to train your neural ODE,
arXiv preprint arXiv:2002.02798, (2020).
[13] L. Fulton, V. Modi, D. Duvenaud, D. I. Levin, and A. Jacobson, Latent-space dynamics for
reduced deformable simulation, in Computer Graphics Forum, vol. 38, Wiley Online Library, 2019,
pp. 379–391.
[14] N. Geneva and N. Zabaras, Modeling the dynamics of pde systems with physics-constrained deep
auto-regressive networks, Journal of Computational Physics, 403 (2020), p. 109056.
[15] A. Gholami, K. Keutzer, and G. Biros, Anode: Unconditionally accurate memory-efficient gradients
for neural odes, arXiv preprint arXiv:1902.10298, (2019).
[16] F. J. Gonzalez and M. Balajewicz, Deep convolutional recurrent autoencoders for learning low-
dimensional feature dynamics of fluid systems, arXiv preprint arXiv:1808.01346, (2018).
[17] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT press, 2016.
[18] Q. Hernandez, A. Badias, D. Gonzalez, F. Chinesta, and E. Cueto, Deep learning of
thermodynamics-aware reduced-order models from data, arXiv preprint arXiv:2007.03758, (2020).
[21] M. Karl, M. Soelch, J. Bayer, and P. van der Smagt, Deep variational bayes filters: Unsupervised
learning of state space models from raw data, in International Conference on Learning Representations,
2017.
[22] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980,
(2014).
[23] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review, 51 (2009),
pp. 455–500.
[24] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436–444.
[25] K. Lee and K. Carlberg, Deep conservation: A latent dynamics model for exact satisfaction of
physical conservation laws, arXiv preprint arXiv:1909.09754, (2019).
[26] K. Lee and K. T. Carlberg, Model reduction of dynamical systems on nonlinear manifolds using
deep convolutional autoencoders, Journal of Computational Physics, 404 (2020), p. 108973.
[27] Y. Lu, A. Zhong, Q. Li, and B. Dong, Beyond finite layer neural networks: Bridging deep
architectures and numerical differential equations, in International Conference on Machine Learning,
2018, pp. 3276–3285.
[28] R. W. MacCormack, Numerical computation of compressible and viscous flow, American Institute of
Aeronautics and Astronautics, Inc., 2014.
17
[29] R. Maulik, B. Lusch, and P. Balaprakash, Reduced-order modeling of advection-dominated systems
with recurrent neural networks and convolutional autoencoders, arXiv preprint arXiv:2002.00470, (2020).
[30] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, and D. Livescu, Time-series
learning of latent-space dynamics for reduced-order model closure, Physica D: Nonlinear Phenomena, 405
(2020), p. 132368.
[31] J. Morton, A. Jameson, M. J. Kochenderfer, and F. Witherden, Deep dynamical modeling
and control of unsteady fluid flows, in Advances in Neural Information Processing Systems, 2018,
pp. 9258–9268.
[32] S. Pawar, S. Rahman, H. Vaddireddy, O. San, A. Rasheed, and P. Vedula, A deep learning
enabler for nonintrusive reduced order modeling of fluid flows, Physics of Fluids, 31 (2019), p. 085101.
[33] L. S. Pontryagin, The mathematical theory of optimal processes, (1962).
[34] G. D. Portwood, P. P. Mitra, M. D. Ribeiro, T. M. Nguyen, B. T. Nadiga, J. A. Saenz,
M. Chertkov, A. Garg, A. Anandkumar, A. Dengel, et al., Turbulence forecasting via neural
ODE, arXiv preprint arXiv:1911.05180, (2019).
[35] A. Quaglino, M. Gallieri, J. Masci, and J. Koutník, SNODE: Spectral discretization of neural
odes for system identification, arXiv preprint arXiv:1906.07038, (2019).
[36] A. Quarteroni, A. Manzoni, and F. Negri, Reduced Basis Methods for Partial Differential Equations:
an Introduction, vol. 92, Springer, 2015.
[37] S. M. Rahman, S. Pawar, O. San, A. Rasheed, and T. Iliescu, Nonintrusive reduced order
modeling framework for quasigeostrophic turbulence, Physical Review E, 100 (2019), p. 053306.
[38] M. Rewienski, A trajectory piecewise-linear approach to model order reduction of nonlinear dynamical
systems, PhD thesis, Massachusetts Institute of Technology, 2003.
[39] Y. Rubanova, R. T. Chen, and D. Duvenaud, Latent odes for irregularly-sampled time series, arXiv
preprint arXiv:1907.03907, (2019).
[40] L. Ruthotto and E. Haber, Deep neural networks motivated by partial differential equations, Journal
of Mathematical Imaging and Vision, (2019), pp. 1–13.
[41] O. San, R. Maulik, and M. Ahmed, An artificial neural network framework for reduced order
modeling of transient flows, Communications in Nonlinear Science and Numerical Simulation, 77 (2019),
pp. 271–287.
[42] J. Tencer and K. Potter, Enabling nonlinear manifold projection reduced-order models by extending
convolutional neural networks to unstructured data, arXiv preprint arXiv:2006.06154, (2020).
[43] H. Wang, Z. Wang, Y. Dong, R. Chang, C. Xu, X. Yu, S. Zhang, L. Tsamlag, M. Shang,
J. Huang, et al., Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan,
china, Cell discovery, 6 (2020), pp. 1–8.
[44] Z. Wang, D. Xiao, F. Fang, R. Govindan, C. C. Pain, and Y. Guo, Model identification of
reduced order fluid dynamics systems using deep learning, International Journal for Numerical Methods
in Fluids, 86 (2018), pp. 255–268.
[45] E. Weinan, A proposal on machine learning via dynamical systems, Communications in Mathematics
and Statistics, 5 (2017), pp. 1–11.
[46] S. Wiewel, M. Becher, and N. Thuerey, Latent space physics: Towards learning the temporal
evolution of fluid flow, in Computer Graphics Forum, vol. 38, Wiley Online Library, 2019, pp. 71–82.
18
[47] X. Xie, G. Zhang, and C. G. Webster, Non-intrusive inference reduced order model for fluids using
deep multistep neural network, Mathematics, 7 (2019), p. 757.
[48] C. Yildiz, M. Heinonen, and H. Lahdesmaki, ODE2VAE: Deep generative second order ODEs with
Bayesian neural networks, in Advances in Neural Information Processing Systems, 2019, pp. 13412–13421.
19