2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
3
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
4
Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
5
Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN, USA
6
Center for Sustainable Energy (ND Energy), University of Notre Dame, Notre Dame, IN, USA
*
Corresponding author. E-mail: [email protected]
Abstract
Traditional data-driven deep learning models often struggle with high training costs, error
accumulation, and poor generalizability in complex physical processes. Physics-informed deep
learning (PiDL) addresses these challenges by incorporating physical principles into the model.
Most PiDL approaches regularize training by embedding governing equations into the loss func-
tion, yet this depends heavily on extensive hyperparameter tuning to weigh each loss term. To
this end, we propose to leverage physics prior knowledge by “baking” the discretized governing
equations into the neural network architecture via the connection between the partial differential
equations (PDE) operators and network structures, resulting in a PDE-preserved neural network
(PPNN). This method, embedding discretized PDEs through convolutional residual networks in
a multi-resolution setting, largely improves the generalizability and long-term prediction accu-
racy, outperforming conventional black-box models. The effectiveness and merit of the proposed
methods have been demonstrated across various spatiotemporal dynamical systems governed by
spatiotemporal PDEs, including reaction-diffusion, Burgers’, and Navier-Stokes equations.
INTRODUCTION
Computational modeling and simulation capabilities play an essential role in understanding,
predicting, and controlling various physical processes (e.g., turbulence, heat-flow coupling, and
fluid-structure interaction), which often exhibit complex spatiotemporal dynamics. These physical
phenomena are usually governed by partial differential equations (PDEs) and can be simulated
by solving these PDEs numerically based on, e.g., finite difference (FD), finite volume (FV), fi-
nite element (FE), or spectral methods. However, predictive modeling of complex spatiotemporal
dynamics using traditional numerical methods can be significantly challenging in many practical
scenarios: (1) governing equations for complex systems might not be fully known due to a lack
of complete understanding of the underlying physics, for which a first-principled numerical solver
cannot be built; (2) conventional numerical simulations are usually time-consuming, making it in-
feasible for many applications that require many repeated model queries, e.g., optimization design,
inverse problems, and uncertainty quantification (UQ), attracting increasing attention in scientific
discovery and engineering practice.
1
Recent advances in scientific machine learning (SciML) and ever-growing data availability open
up new possibilities to tackle these challenges. In the past few years, various deep neural net-
works (DNNs) have been designed to learn the spatiotemporal dynamics in latent spaces enabled
by proper orthogonal decomposition (POD) [1, 2, 3, 4] or convolutional encoding-decoding oper-
ations [5, 6, 7, 8]. In particular, fast neural simulators based on graph neural networks (GNN)
have been proposed and demonstrated to predict spatiotemporal physics on irregular domains with
unstructured meshes [9, 10]. Although showing good promise, most of these works are purely
data-driven and black-box in nature, which rely on “big data” and may have poor generalizability,
particularly in out-of-sample regimes in the parameter space. As a more promising strategy, baking
physics prior knowledge (e.g., conservation laws, governing equations, and constraints) into deep
learning is believed to be very effective to improve its sample efficiency and generalizability [11],
here referred to as physics-informed deep learning (PiDL). An impressive contribution in this di-
rection is physics-informed neural networks (PINNs) [12], where well-posed PDE information is
leveraged to enable deep learning in data-sparse regimes. The general idea of PINNs is to learn
(or solve) the PDE solutions with DNNs, where the loss functions are formulated as a combina-
tion of the data mismatch and residuals of known PDEs, unifying forward and inverse problems
within the same DNN optimization framework. The merits of PINNs have been demonstrated over
various scientific applications, including fast surrogate/meta modeling [13, 14, 15], parameter/field
inversion [16, 17, 18, 19], and solving high-dimensional PDEs [20, 21], to name a few. Due to the
scalability challenges of the pointwise fully-connected PINN formulation to learn continuous func-
tions [22, 23, 24] or operators [4, 5, 7, 28], many remedies and improvements in terms of training
and convergence have been proposed [29, 30, 31]. In particular, there is a growing trend in develop-
ing field-to-field discrete PINNs by leveraging convolution operations and numerical discretizations,
which have been demonstrated to be more efficient in spatiotemporal learning [32, 33]. For example,
convolution neural networks (CNN) or graph convolution networks (GCN) were built to approximate
the discrete PDE solutions, where the PDE residuals can be formulated in either strong or weak
forms by finite-difference [34, 35, 36], finite volume [37], or finite element methods [38, 39, 40, 41, 42].
Moreover, recurrent network formulation informed by discretized PDEs have been developed for
spatiotemporal dynamic control using model-based reinforcement learning [43].
In the realm of PINN framework, the term "physics-informed" generally denotes the incorpo-
ration of PDE residuals into the loss or likelihood functions to guide or constrain DNN training.
Despite this development, the question of how to effectively use physics-inductive bias—i.e., (par-
tially) known governing equations—to inform the learning architecture design remains an intriguing,
relatively unexplored area. The primary focus of this paper is to address this issue. Recent studies
have revealed the deep-rooted relationship between neural network structures and ordinal/partial
differential equations (ODEs/PDEs) [44, 45, 46, 47, 48, 49]. For example, Lu et al. [45] bridged deep
convolutional network architectures and numerical differential equations. Chen et al. [50] showed
that the residual networks (ResNets) [51] can be interpreted as the explicit Euler discretization of an
ODE, and ODEs can be used to formulate the continuous residual connection with infinite depths,
known as the NeuralODE [52]. Motivated by differential equations, novel deep learning architectures
have been recently developed in the computer science community, e.g., new convolutional ResNets
guided by parabolic and hyperbolic PDEs [47], GRAND as a graph network motivated by diffusion
equations [48], and PDE-GCN motivated by hyperbolic PDEs to improve over-smooth issues in
deep graph learning [49]. However, these studies mainly aimed to develop generic DNN architec-
tures with some desired features by utilizing specific properties of certain PDEs (e.g., diffusion,
dispersion, etc.), and the designed neural networks are not necessarily used to learn the physical
processes governed by those PDEs. An attempt was made by Shi et al. [53] to learn PDE-governed
dynamics by limiting trainable parameters of CNN using finite difference operators. Despite being a
2
novel attempt, the approach is still purely data-driven without effectively utilizing governing PDEs.
Therefore, this work explores PiDL through learning architecture design, inspired by the broader
concept of differentiable programming (BP) - extending DNNs to more general computer programs
that can be trained in a similar fashion to deep learning models [54]. In general, a BP model is for-
mulated by marrying DNNs with a fully differentiable physics-based solver, and thus the gradients
can be back-propagated through the entire hybrid neural solver based on automatic differentia-
tion (AD) or discrete adjoint methods. Relevant works include universal differential equations
(UDE) [55], NeuralPDE [56], and others, where DNNs are formulated within a differentiable PDE
solver for physics-based modeling. In particular, this idea has been recently explored in predictive
modeling of rigid body dynamics [57, 58], epidemic dynamics [59], and fluid dynamics [60, 61, 62].
These studies imply great promise of incorporating physics-induced prior (i.e., PDE) into DNN
architectures.
In this paper, we present a creative approach to designing distinctive learning architectures
for predicting spatiotemporal dynamics, where the governing PDEs are preserved as convolution
operations and residual connections within the network architecture. This is in sharp contrast to
prior PiDL work where the physical laws were enforced as soft constraints within the loss functions,
supported by an comprehensive comparision between the proposed method and physics-informed
variants of multiple state-of-the-art neural operators. Specifically, we develop an auto-regressive
neural solver based on a convolutional ResNet framework, where the residual connections are con-
structed by preserving the PDE operators in governing equations, which are (partially) known a
priori, discretized on low-resolution grids. Meanwhile, encoding-decoding convolution operations
with trainable filters enable high-resolution state predictions on fine grids. Compared to classic
ResNets with black-box residual connections, the proposed PPNN is expected to be superior in
terms of both training efficiency and out-of-sample generalizability for, e.g., unseen boundary con-
ditions and parameters, and extrapolating in time. Conceptually, the proposed framework is similar
to using neural networks for closure modeling of classic numerical solvers, which has been explored
previously. However, several distinct features make our methodology more general that extends
substantially beyond prior studies on merging machine learning with numerical solvers [63, 64, 65].
Our work is not focused on simply coupling a neural network with a numerical solver or train-
ing it to learn specific closures. Instead, the proposed framework integrates (partially or wholly
known) physical laws, expressed as PDE operators, directly into the neural networks. This leads to
a creative neural architecture design, reflecting a unique design strategy that leverages the profound
connection between neural network architecture components and ODEs/PDEs. The differentiability
brought by representing numerical operators with neural network components makes an end-to-end
time sequence training possible, which distincts the proposed method from closure model learning.
This strategy offers a fresh perspective on incorporating physical knowledge into neural network de-
sign, underscoring that such integration can enhance the model’s performance in predicting complex
spatiotemporal dynamics. When compared with the other approach of leveraging physics priors into
neural network training: the "physics-informed" methods, our proposed PPNN does show signifi-
cant merit in terms of cost, generalizability and long-term prediction accuracy. The contributions
of this work are summarized as follows: (i) a framework for physics-inspired learning architecture
design is presented, where the PDE structures are preserved by the convolution filters and residual
connection; (ii) multi-resolution information passing through network layers is proposed to improve
long-term model rollout predictions over large time steps; (iii) the superiority of the proposed PPNN
is demonstrated for PDE operator learning in terms of training complexity, extrapolability, and
generalizability in comparison with the baseline black-box models, using a series of comprehensive
numerical experiments on spatiotemporal dynamics governed by various parametric unsteady PDEs,
including reaction-diffusion equations, Burgers’ equations, and unsteady Navier-Stokes equations.
3
RESULTS AND DISCUSSION
Learning spatiotemporal dynamics governed by PDEs
We consider a multi-dimensional spatiotemporal system of upx, t; λq governed by a set of non-
linear coupled PDEs parameterized by λ P Rd , which is a d´dimensional parameter vector, while
x and t are spatial and temporal coordinates, respectively. Our goal is to develop a data-driven
neural solver for rapid predictions of spatiotemporal dynamics given different parameters λ. The
neural solver is formulated as a next-step DNN model by learning the dynamic transitions from the
current step t to the next time step t ` ∆t (∆t is the time step).
This study focuses on the learning architecture design for improving the robustness, stability,
and generalizability of data-driven next-step predicting models, which commonly suffer from consid-
erable error accumulations due to the auto-regressive formulation and fails to operate in a long-span
model rollout. In contrast to existing models which are black-box, we propose a PDE-preserved
neural network (PPNN) architecture inspired by the relationship between network structures and
PDEs, by hypothesizing that the predictive performance can be significantly improved if the network
is constructed by preserving (partially) known governing PDEs of the spatiotemporal dynamics to
be learned. Specifically, the known portion of the governing PDEs in discrete forms are preserved
in residual connection blocks. As shown in Fig. 1a, the PPNN architecture features a residual con-
nection which consists of two parts: a trainable network and a PDE preserving network, where the
right hand side (RHS) of the governing PDE, discretized on finite difference grid, is represented by
a convolution neural network. The weights of the PDE preserved convolutional residual component
are determined by the discretization scheme and remain constant during training.
However, in practice, neural solvers are expected to roll out much faster than numerical solvers,
and the time step ∆t would be orders of magnitude larger than that used in conventional numerical
solvers, which may lead to catastrophic stability issues if naively embedding the discretized PDE into
the neural network. To this end, we implement a multi-resolution PPNN based on the convolutional
(conv) ResNet backbone (shown in Fig. 1b), where PDE-preserving blocks work on a coarse grid to
enable stable model rollout with large evolving steps. This is achieved by using the bilinear down-
sampling and bicubic up-sampling algorithms to auto-encode the PDE-preserved hidden feature in
a low-resolution space, which is then fed into the main residual connection in the original high-
resolution space.
Together with the trainable block, which consists of decoding-encoding convResNet blocks de-
fined on the fine mesh, PPNN enables predictions at a high resolution. Moreover, the network is
conditioned on physical parameters λ, enabling fast parametric inference and generalizing over the
high-dimensional parameter space. (More details are discussed in the METHODS section.)
In this section, we evaluate the proposed PDE structure-preserved neural network (PPNN) on
three nonlinear systems with spatiotemporal dynamics, where the governing PDEs are known or
partially-known a priori. Specifically, the spatiotemporal dynamics governed by FitzHugh-Nagumo
reaction diffusion (RD) equations, Burgers’ equations, and incompressible Navier-Stokes (NS) equa-
tions with varying parameters λ (e.g., IC, diffusion coefficients, Reynolds number, etc.) in 2D
domains are studied. In particular, we will study the scenarios where either fully-known or incom-
plete/inaccurate governing PDEs are preserved. To demonstrate the merit of preserving the discrete
PDE structure in ConvResNet, the proposed PPNN is compared with the corresponding black-box
ConvResNet next-step model as a baseline, which is a CNN variant of the MeshGraphNet [9] (see
section Next-step prediction models based on convolutional ResNets). For a fair comparison, the
network architecture of the trainable portion of the PPNN is the same as the black-box baseline
model. Moreover, all models are compared on the same sets of training data in each test case. The
4
a.
Trainable Network
+
× 𝑑𝑑𝑑𝑑
b.
ConvResNet Blocks
Pixel shuffle
⋯
Up
Parameters:
𝜃𝜃𝜃𝜃𝜃𝜃i0i0 𝜃𝜃𝜃𝜃𝜃𝜃i1i1 𝜃𝜃𝜃𝜃𝜃𝜃i2i2
sampling × 𝑑𝑑𝑑𝑑
+
i0 i1 i2
-1
Bilinear
Down
-1 2 -1 4 -1 ⋯ ⋅ 2
PDE-
-1 preserving
Part
(partially) known PDE operators
Figure 1: Schematic diagram of the proposed partial differential equation (PDE)-preserved neural network (PPNN).
a. A schematic representation illustrating the concept of the PPNN framework. b. A detailed schematic of the
ConvResNet-based PPNN, which consists of the trainable part and the PDE-preserving part. The two portions of
PPNN are combined together in a multi-resolution setting. The discretized form of the governing PDEs are embedded
into the network structure via prescribed convolutions filters and the residual connection.
generalizability, robustness, training and testing efficiency of the PPNN are investigated in compar-
ison with its corresponding blackbox baseline. It is noted that the novelty of this work lies not in
exploring varied methods for learning closures for traditional PDE solvers but in the inventive inte-
gration of known physical laws into the architecture of convolutional residual neural networks. We,
therefore, consider it critical to compare the PPNN with its black-box counterpart, which learn from
data without explicit integration of the underlying physics. This comparison enables us to highlight
the unique benefits of integrating known physics into deep learning models, an area that has, to
date, received limited attention. Given the prevalence of black-box neural networks in data-driven
surrogate modeling where the governing PDEs are often known or partially known, this comparison
is both relevant and fair. We believe that this provides a valuable perspective and a substantial
contribution to the field. Moreover, it is also worth noting that, PPNN is not constrained to any
specific DNN architectures. Rather, we demonstrate that it serves as a versatile framework that
can be synergistically combined with a variety of DNN architectures such as U-Net [66] – widely
recognized for its multi-scale structure, and Vision Transformer (ViT) [67], which has become the
backbone for most computer vision tasks. (see section PPNN as a general framework for embed-
ding known physics). Moreover, the relationship between the PDE-preserving portion of PPNN
and numerical solvers is discussed. Note that we use the same network setting, i.e., same network
structure, hyperparameters and training epochs, for all the test cases (except for the NS system,
5
which has slight modifications adapting to three state variables). More details about the neural
network settings can be found in Section 3 in supplementary information.
All the DNN predictions are evaluated against the high-resolution fully-converged numerical
solutions as the reference using a full-field error metric ϵt defined at time step t as,
where N indicates the number of the testing physical parameters λi , ut pλi q is the reference solution
at time step t corresponding to the physical parameter λi , fθ represents the trained neural network
function with optimized weights θ̃, and ût´1 represents the state predicted by the model at previous
time step t ´ 1,
where n is the number of testing steps, u0 pλi q represents the initial condition given λi . For brevity,
numerical details for each case are given in Section 4 of the supplementary information.
Ru pu, vq “ u ´ u3 ´ v ` α,
(4)
Rv pu, vq “ βpu ´ vq,
where α “ 0.01 represents the external stimulus and β “ 0.25 is the reaction coefficient. The initial
condition (IC) u0 is a random field and generated by randomly sampling from a normal distribution,
which is then linearly scaled to r0.1, 1.1s. Given different ICs and diffusion coefficients γ, varying
dynamic spatial patterns of neuron activities can be simulated. Here, the next-step neural solvers are
trained to learn and used to predict the spatiotemporal dynamics of varying modeling parameters
(i.e., ICs and diffusion coefficients). Namely, we attempt to build a surrogate model in a very high-
dimensional parameter space λ P Rd , where d “ 65, 537, since the dimensions for IC and diffusion
coefficient are 2562 and 1, respectively. The reference solutions are obtained on the simulation
6
domain px, yq P r0, 6.4s ˆ r0, 6.4s, discretized with a fine mesh of 256 ˆ 256 grids, based on the finite
difference method.
Figure 2a shows the PPNN-predicted solution snapshots of the RD equations at four randomly
selected test parameters (i.e., randomly generated ICs and unseen diffusion coefficients). The pre-
diction results of baseline black-box ConvResNet (first row) and the proposed PPNN (second row)
are compared against the ground truth reference (third row). It can be seen that both models agree
with the reference solutions for t ă 0.6T , showing good generalizability on testing ICs and γ for a
short-term model rollout. However, the error accumulation becomes noticeable for the black-box
baseline when t ą T , and the spatial patterns of the baseline predictions significantly differ from the
reference at t “ 2T , which is an expected issue for the next-step predictors. In contrast, the results
of our PPNN have an good agreement with the reference solutions over the entire time span r0, 2T s
on all testing parameters, showing great robustness, predictability, and generalizability in both the
spatiotemporal domain and parameter space. Predicted solutions on more testing parameters are
presented in Fig. S12.
To further examine the error propagation in time for both models, the relative testing errors
ϵt averaged over 100 randomly selected parameters in training and testing sets are computed and
plotted in Fig. 2, where Fig. 2c shows the averaged model-rollout error evaluated on 100 training
parameters and the Fig. 2d shows the error averaged on 100 randomly generated testing parameters.
(Zoom in views of Fig. 2c and Fig. 2d can be found in Fig. 2g and Fig. 2h, respectively.) The model
is only trained within the range of 1T (100∆t), and it is clearly seen that the rollout error of the
black-box model significantly grows in the extrapolation range rT, 2T s (from 100∆t to 200∆t), where
∆t “ 200δt is the learning step size which is 200 numerical timesteps δt. The error accumulation
becomes more severe for the unseen testing parameters. However, our PPNN predictions maintain
an impressively low error, even when extrapolating twice the length of the training range. Besides,
the scattering of the error ensemble is significantly reduced compared to the black-box baseline,
indicating great robustness of the PPNN for various testing parameters.
Viscous Burgers’ equation For the second case, we study the spatiotemporal dynamics gov-
erned by the viscous Burgers’ equations on a 2D domain with periodic boundary conditions,
Bu
` u ¨ ∇u “ ν∇2 u , t P r0, T s, (6)
Bt
where u “ rpupx, y, tq, vpx, y, tqsT P R2 is the velocity vector, T “ 2s is the time length we simulated,
and ν represents the viscosity. The initial condition (IC) u0 is generated according to,
4 4
$
’ ÿ ÿ p1q p2q
’u0 “
’
’ ri,j sin pix ` jyq ` ri,j cos pix ` jyq
&
i“´4 j“´4 pkq
u0 “ 4 4 ri,j „ N p0, 1q; k “ 1, 2, 3, 4, (7)
’ ÿ ÿ p3q p4q
’v0 “ ri,j sin pix ` jyq ` ri,j cos pix ` jyq
’
’
%
i“´4 j“´4
pkq
where x, y are spatial coordinates of grid points, and ri,j ; k P 1, 2, 3, 4 are random variables sampled
independently from a normal distribution. The IC is normalized in the same way as mentioned in
the RD case. We attempt to learn the dynamics given different ICs and viscosities. Similar to the
RD cases, the parameter space Rd is also high-dimensional (d “ 324), as the IC is parameterized by
4 ˆ 92 independent random variables and the scalar viscosity can also vary in range r0.02, 0.07s. The
reference solution is generated by solving the Burgers’ equations on the domain of px, yq P r0, 3.2s2 ,
discretized by a fine mesh of 256 ˆ 256 grids using finite difference method.
7
a. 0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T c. ·10−3
Relative error
Training Forecasting
2
Ground truth PPNN (Ours)
0
d.·10 −3
Relative error
Forecasting
2
0.60 0.62 0.58 0.585 0.552 0.556 0.516 0.518 0.54 0.56 0.525 0.53 0.502 0.506 0.470 0.472
0
𝜆𝜆0 𝜆𝜆1 0 50 100 150 200
b. 0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T e. Evolving steps
Relative error
Training Forecasting
1
Ground truth PPNN (Ours)
0
f.
2 Error on testing (Unseen)
parameter set
Relative error
Forecasting
1
0
0.2 0.8 0.5 0.7 0.55 0.65 0.6 0.65 0.2 0.8 0.5 0.7 0.6 0.7 0.6 0.65
0 50 100 150 200
𝜆𝜆0 𝜆𝜆1 Evolving steps
g. ·10−4 h. i. ·10−2 j. PPNN Black-box
5 5
Relative error
Relative error
4 4
3 3
2 2
1 1
0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Evolving steps Evolving steps
Figure 2: Prediction comparison in the reaction-diffusion (RD) case and viscous Burgers’ case. a and b, Predicted
solution snapshots of u for the RD equations (a), and the velocity magnitude ∥u∥2 for the Burgers’ equations (b)
at different time steps and unseen parameters, obtained by black-box ConvResNet (baseline model, first rows), and
partial differential equation preserved neural network (PPNN, our method, second rows), compared against ground
truth (high-resolution numerical simulation, third rows). λ0 , λ1 are randomly selected testing (unseen) parameters in
each system. c - f, Relative prediction error ϵt of PPNN (blue lines ) and black-box ConvResNet baseline (orange
lines ) for the RD dynamics (c-d) and Burgers’ equations (e - f ), averaged on 100 randomly sampled training
parameters λ (c, e) and testing (unseen) parameters (d, f ). The shaded area shows the maximum and minimum
relative errors of all testing trajectories. g, h, Zoom in views of the relative error curve of PPNN shown in c (g) and
d (h), respectively. i, j, Zoom in views of the relative error curve of PPNN shown in e (i) and f (j), respectively.
The velocity magnitude contours of the 2D Burgers’ equation with different testing parameters
are shown in Fig. 2b, obtained by the black-box baseline, PPNN, and reference numerical solver,
respectively. Note that all the testing parameters are not seen during training. (More predicted
solutions on different testing parameters are presented in Fig. S13.) Similar to the RD case, PPNN
shows a significant improvement over the black-box baseline in terms of long-term rollout error
accumulation and generalizability on unseen ICs and viscosity ν. Due to the strong convection effect,
black-box baseline predictions deviate from the reference very quickly, and significant discrepancies
in spatial patterns can be observed as early as t ă 0.6T . In general, the black-box baseline suffers
from the poor out-of-sample generalizability for unseen parameters, making the predictions useless.
Our PPNN significantly outperforms the black-box baseline, and its prediction results agree with
the reference for all testing samples. Although slight prediction noises are present after a long-term
8
model rollout (t ą 1.2T ), the overall spatial patterns can be accurately captured by the PPNN
even at the last learning step (t “ 2T ). The error propagation of both models is given in Fig. 2,
where the rollout errors ϵt at each time step, averaged over 100 randomly selected parameters from
training and testing sets, are plotted. Fig. 2e shows the averaged model rollout error evaluated
on 100 training parameters, while Fig. 2f shows the error averaged on 100 randomly generated
parameters, which are not used for training. Zoom in views of Fig. 2e and Fig. 2f can be found
in Fig. 2i and Fig. 2j, respectively. As both models are only trained with the 1T (100∆t) time
steps for each parameter in the training set, it is clear that the error of the black-box model grows
rapidly once stepping into the extrapolation range rT, 2T s. The error accumulation effect of the
black-box model becomes more obvious for those parameters which are not in the training set due
to the poor generalizability. In contrast, the error of PPNN predictions remains surprisingly low
even in the extrapolation range for both training and testing regimes, and there is nearly no error
accumulation. In addition, the error scattering significantly shrinks compared to that of the black-
box model, indicating significantly better accuracy, generalizability and robustness of the PPNN
compared to the black-box baseline.
Naiver-Stokes equations The last case investigates the performance of PPNN to learn an un-
steady fluid system exhibiting complex vortex dynamics, which is governed by the 2D parametric
unsteady Naiver-Stokes (NS) equations:
Bu
` u ¨ ∇u “ ´∇p ` ν∇2 u , t P r0, T s,
Bt (8)
∇ ¨ u “ 0,
where u “ rupx, y, tq, vpx, y, tqsT P R2 is the velocity vector, ppx, y, tq P R is the pressure, and
ν “ 1{Re represents the kinematic viscosity (Re is the Reynolds number). The NS equations are
solved in a 2D rectangular domain px, yq P r0, 4s ˆ r0, 1s, where a jet with dynamically-changed jet
angle is placed at the inlet. Namely, the inflow boundary is defined by a prescribed velocity profile
up0, y, tq, » ´ ¯ fi
2
exp y
„ ȷ
up0, y, tq ´50 py ´ 0 q
up0, y, tq “ “– ´ ¯fl (9)
vp0, y, tq sin ptq ¨ exp ´50 py ´ y0 q2
where y0 represents the vertical position of the center of the inlet jet. The outflow boundary
condition is set as pressure outlet with a reference pressure of pp4, y, tq “ 0. No-slip boundary
conditions are applied on the upper and lower walls. In this case, the neural network models are
expected to learn the fluid dynamics with varying Reynolds number Re and jet locations y0 . Namely,
a two-dimensional physical parameter vector λ ““ rRe, y0 sT is considered. In training set, we use
five different Re evenly distributed in the range 2 ˆ 10 , 1 ˆ 10 and 9 different jet locations y0
3 4
‰
uniformly selected from 0.3 to 0.7. Figure 3a-b shows the snapshots of velocity magnitude of the
NS equations at two representative testing parameters, which are not seen in the training set. To
be specific, λ0 “ r2500, 0.325sT represents a relatively low Reynolds number Re “ 2500 with the
jet located at y0 “ 0.325, while λ1 “ r8500, 0.575sT is a higher Reynolds number case (Re “ 8500)
with the jet located at y0 “ 0.325. The rollout prediction results of the PPNN and baseline
black-box ConvResNet are compared with the ground truth reference. Although both models can
accurately capture the spatiotemporal dynamics at the beginning stage (when t ď 0.4T ), showing
good predictive performance for the unseen parameters for a short-term rollout, the predictions
by the black-box model are soon overwhelmed by the noises due to the rapid error accumulation
(t ą T ). However, the proposed PPNN significantly outperforms the black-box baseline as it
9
managed to provide accurate rollout predictions even at the last testing steps (t “ 3T ), which
extrapolate as three times long as the training range, indicating preserving the PDE structure
can effectively suppress the error accumulation, which is unavoidable in most auto-regressive neural
predictors. To further investigate the error propagation in time for both models, we plot the relative
testing errors ϵt against time in Fig. 3c-d, which are averaged over 5 randomly selected parameters
in both training (Fig. 3c) and testing sets (Fig. 3d). We can clearly see that PPNN managed to
maintain low rollout error in both training and extrapolation ranges, in contrast to the significantly
higher error accumulation in the black-box baseline results. In particular, the black-box model
relative error visibly grows only after a short-term model rollout and increases rapidly once it enters
the extrapolation range even for testing on the training parameter set (Fig. 3c), and the errors
are accumulated even faster for the testing on unseen parameters (Fig. 3d). On the contrary, our
PPNN has almost no error accumulation and performs much more consistently between the training
and extrapolation ranges, with significantly lower rollout errors. The results again demonstrate
outstanding predictive accuracy and generalizability of the proposed method. Besides, PPNN also
shows a significantly smaller uncertainty range, indicating great robustness among different testing
parameters.
Reaction diffusion equations with unknown reaction term We first revisit the aforemen-
tioned FitzHugh–Nagumo RD equations. Here, we consider the scenario where only the diffusion
phenomenon is known in the FitzHugh–Nagumo RD dynamics. Namely, the reaction source terms
remain unknown and PPNN only preserves the incomplete RD equations, i.e., 2D diffusion equa-
tions,
Bu
“ γ∇2 u. (10)
Bt
All the case settings remain the same as those discussed previously. Although incomplete/inaccurate
prior knowledge about the RD system is preserved, our PPNN still shows a significant advantage over
the black-box baseline. Figure 4a compares the snapshots of reactant u at two randomly selected
unseen parameters λ2 and λ3 predicted by black-box baseline model (first rows), PPNN with the
diffusion terms preserved only (second rows), PPNN with the complete RD equation preserved
(third rows), against the ground truth (fourth rows). The PPNNs preserving either complete or
incomplete RD equations accurately capture the overall patterns and well agree with the reference
solutions, while the black-box baseline shows notable discrepancy and large errors, particularly at
t “ 2T , which is the twice of the training phase length. At the last extrapolation step, the prediction
results of black-box baseline show some visible noise and are less smooth compared to the results by
preserving the complete RD equation, indicating that lack of the prior information on the reactive
terms could slightly reduce the improvement by PPNN. Figure 4b-c shows the relative model rollout
errors averaged over 100 test trajectories, which are not seen in the training set. The shaded area in
10
a. 𝜆𝜆0
Baseline PPNN (Ours) Ground truth c.
Error on training parameter set
0.01T
1.6
0.1T
1.2 Training Forecasting
Relative error
0.4T
0.8
T
0.4
2T 0.0
0 50 100 150 200
Evolving steps
3T
1.6
0.1T
Forecasting
1.2
Relative error
0.4T
0.8
T
0.4
2T
0.0
0 50 100 150 200
Evolving steps
3T
PPNN Black-box
Figure 3: Prediction comparison in the case governed by Naiver-Stocks (NS) equations. a-b, Predicted solution
snapshots of velocity magnitude ∥u∥2 for the NS equations obtained by black-box ConvResNet (baseline); partial
differential equation preserved neural network (PPNN, Ours), compared against the ground truth (high-resolution
numerical simulation), where λ0 is (Re “ 2500, y0 “ 0.325, shown in a), and λ1 is high Reynolds number (Re “
8500, y0 “ 0.575, shown in b). c-d, Relative prediction error ϵt of PPNN (blue lines ) and black-box ConvResNet
baseline (orange lines ) at different timesteps for the NS equation, averaged on 5 randomly sampled (c) training
parameters and (d) testing (unseen) parameters. The shaded areas show the scattering of the relative errors over all
testing trajectories.
the upper panel shows the error distribution range of these 100 test trajectories. Even the preserved
PDEs are not complete/accurate, the mean relative error (blue line ) remains almost the same
as the PPNN with fully-known PDEs (see Fig. 2a), which is significantly lower than that of the
black-box baseline (orange line ), showing a great advantage of preserving governing equation
structures even if the prior physics knowledge is imperfect. Compared to the PPNN with fully-
known PDEs, the error distribution range by preserving partially-known PDEs is increased and
error ensemble is more scattered, implying slightly decreased robustness. Although the envelope of
the error scattering for incomplete PDEs is much larger than that of the case with fully-known PDEs,
this is due to a single outlier trajectory, which can be seen in Fig. 4c. This indicates embedding
a incomplete PDE terms will leads to restricted performance of PPNN when the disregarded term
plays an important role in the dynamic system. In general, the standard deviation of the error
ensemble from the PPNN with partially-known PDE (σ “ 1.123 ˆ 10´4 ) is still significantly lower
than that of the black-box baseline (σ “ 3.412 ˆ 10´4 ). In comparison, the standard deviation of
errors in PPNN with fully-known PDEs over the 100 test trajectories is 0.854 ˆ 10´4 .
11
a. b.
0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T ·10−3
Baseline 4
Relative error
2
Diffusion
only
0
c. ·10−3
PPNN
4
known
Fully
Relative error
2
Ground
truth
0
0 50 100 150 200
0.60 0.62 0.58 0.586 0.554 0.558 0.518 0.521 0.58 0.60 0.556 0.56 0.526 0.528 0.484 0.487 Evolving steps
𝜆𝜆2 𝜆𝜆3
PPNN Black-box
d. Fully known PDE (without source term) f.
3
Fully known PDE (without
Baseline
source term)
Relative error
2
PPNN
1
Truth
0
Relative error
Baseline 2
PPNN 1
0
Truth
0 50 100 150 200
0.01T T Evolving steps
0.1T 3T
Figure 4: Prediction comparison in the cases where the governing equations are partially known. a, Predicted
solution snapshots of u for the reaction-diffusion (RD) equations at different time steps and unseen parameters,
obtained by black-box ConvResNet (baseline model), and partial differential equation (PDE)-preserved neural net-
work (PPNN, preserving diffusion terms only), and PPNN (preserving complete FitzHugh–Nagumo RD equations),
compared against ground truth. λ2 and λ3 are two randomly selected testing (unseen) parameters. b-c, Averaged
relative testing error ϵt of the PPNN with incomplete PDE (blue lines ) and Black-Box ConvResNet baseline
(orange lines ) for the RD dynamics evaluated on 100 randomly generated testing (unseen) parameters (same
parameters as shown in Fig. 2c). Shaded areas in b indicate envelopes of the maximum and minimum relative errors
of all testing trajectories, while the dash lines in c indicate the relative error of each test trajectory. d-e, Predicted
solution snapshots of flow velocity magnitude ∥u∥2 obtained by black-box ConvResNet (baseline), PPNN (ours),
compared against ground truth (high-resolution numerical simulation) of the NS equations without (d) and with (e)
an unknown magnetic source term, respectively. The PPNN only preserves a NS equation portion for both scenarios,
which are at the same testing (unseen) parameter λ “ r9000, 0.475sT , which is not in the training set. f -g, Relative
prediction errors ϵt of the PPNN (blue line ) and black-box ConvResNet baseline (orange line ) for the NS
equation with (e) and without (f ) a unknown magnetic body force, averaged on five randomly sampled unseen pa-
rameters. The shaded area shows the scattering of relative errors for all testing trajectories.
Naiver-Stokes equations with an unknown magnetic field In the second case, we con-
sider the the complex magnetic fluid dynamic system governed by Naiver-Stokes equations with an
unknown magnetic field:
Bu
` u ¨ ∇u “ ´∇p ` ν∇2 u ` F ,
Bt (11)
∇ ¨ u “ 0,
12
where u “ ru, vsT is the velocity vector; p is the pressure; while ν represents the kinematic viscosity.
Here F “ rFx , Fy sT represents the body force introduced by a magnetic field:
BH BH
Fx “ mH , Fy “ mH
Bx By (12)
” ´ ¯ı
Hpx, yq “ exp ´8 px ´ L{2q ` py ´ W {2q2
2
where m “ 0.16 is the magnetic susceptibility, and H is a time-invariant magnetic intensity. The
contour of the magnitude of the body force source term is shown in the supplementary information
(see Fig. S11). In this case, the magnetic field remains unknown and PPNN only preserves the
NS equation without the magnetic source term. All the other case settings remain unchanged as
described in the Naiver-Stokes equation case.
Similar to what we observed in the example of RD equations with the unknown reaction term,
the PPNN still remains a significant advantage over the black-box baseline even by preserving an
incomplete physics of the flow in a magnetic field. Fig. 4d-e, shows the velocity magnitude ∥u∥2
results of the flow with (Fig. 4d) or without (Fig. 4e) a magnetic field at the same testing parameter
(λ “ rRe “ 9000, y0 “ 0.475sT ), predicted by the PPNN and black-box ConvResNet, compared
against the reference solution. For both scenarios, only the NS equation portion is preserved in
the PPNN, i.e., the magnetic field remains unknown. Figure 4d shows the solution snapshots for
the flow without the magnetic field (i.e., PPNN preserving the complete physics), while Fig. 4e
shows the predictions of the flow with the magnetic field (i.e., PPNN preserving an incomplete
physics). Comparing the reference solutions at upper and lower panels, the spatiotemporal patterns
of the flow fields exhibit notable differences for the cases with and without magnetic fields. In both
scenarios, the black-box baseline model suffers from the long-term model rollout, particularly for
the flow within the magnetic field, the black-box baseline completely fails to capture the physics
when t ą 2T . In both scenarios, the PPNN outperforms the black-box baseline. In particular at
the last time step t “ 3T , which is three times the training phase length, the black-box predictions
are totally overwhelmed by noise, while our PPNN predictions still agree with the reference very
well. Compared to the case preserving the complete physics (Fig. 4d), a slight deviation from
the reference solution can be observed in the PPNN predictions of the flow with an unknown
magnetic field (Fig. 4 e), indicating that incomplete prior knowledge could slightly affect the PPNN
performance negatively. Nonetheless, preserving the partially-known PDE structure still brings
significant merit. The error propagation is shown in Fig. 4f -g. The relative model rollout errors are
averaged over 5 randomly selected unseen parameters for the systems with (Fig. 4f ) and without
(Fig. 4g) the magnetic field. Comparing to the PPNN with completely-known PDEs, the PPNN
preserving incomplete/inaccurate prior knowledge does show a slight increment in the mean relative
error ϵt as well as the error scattering, which implies a slight decrease in the robustness. However,
the significant advantage over the black-box baseline remains, and almost no error accumulation is
observed in PPNN for both scenarios.
13
processes at all. This experiment aims to assess our model’s performance when the physics are
completely mis-specified and determine how this mismatch affects the overall model performance.
These results show the model’s behavior under the extreme conditions, when the underlying physics
2.5
PPNN with correct terms embedded
2.0
Black-box
Relative error
terms embedded
1.0
0.5
0.0
Figure 5: Relative error ϵt comparison when wrong terms are embedded in partial differential equation preserved
neural network (PPNN), tested on 2D Burgers’ equation. The relative error of PPNN (blue line ), its black-box
counterpart (orange line ), and PPNN with completely wrong partial differential equation terms (green line )
tested on unseen parameters is shown in the figure. Solid lines show the mean relative error, while the shaded areas
show the distribution of all the 100 sample trajectories.
might be either completely unknown or inaccurately specified. As depicted in Fig. 5, the performance
of the PPNN model suffers when the embedded PDE terms diverges significantly from the actual
physics. In such cases, the performance of the PPNN model is adversely affected, with its predictions
being worse than those of the black-box method. As expected, this result suggests that an certain
level of alignment between the embedded PDEs and the underlying physics is essential for optimal
performance. Particularly, the error distribution range of the PPNN model is significantly narrower
than that of the black-box baseline, indicating that mis-specified embedded PDEs also impose an
inductive bias to the deep learning model.
Training cost As shown in Fig. 6a-c, the averaged relative (rollout) prediction error ϵT on n
testing parameters λ at the last time step T in the training process (n “ 8 in RD, n “ 6 in Burgers
and n “ 5 in NS). For all the cases, PPNN features a significantly (orders of magnitude) lower error
than the black-box model from a very early training stage. This means that, to achieve the same
(if not higher) level of accuracy, our PPNN requires significantly less training cost compared to the
black-box baseline. In addition, under the same training budget, the PPNN is much more accurate
than the black-box baseline, demonstrating the merit of PPNN by leveraging the prior knowledge
for network architecture design.
Inference cost The inference costs of different neural networks and numerical solvers on the
three testing cases (see section When the governing PDEs are fully known) with the model rollout
14
a. b. c.
105
10−2
Relative error
10−5
100
10−6 10−4
0 2 4 0 2 4 0 4 8
Training cost (hours) PPNN Black-box
d. e. f.
70.87 6612.64
Inference time cost
75 75
28.65
30
(seconds)
50 600
20
400
10 25
3.19 3.05 200 116.65
3.31 3.15 25.37
0 0 0
Black-box Black-box Black-box
Solver PPNN ConvResNet Solver PPNN ConvResNet Solver PPNN ConvResNet
Figure 6: Testing error ϵT during training and the inference cost of partial differential equation (PDE)-preserved
neural network (PPNN), black-box baseline and numerical solvers. a-c, Averaged relative test error at the last time
step ϵT of PPNN (blue lines) and black-box ConvResNet (orange lines) in training process of different cases in the
section When the governing PDEs are fully known (governed by reaction-diffusion (a), Burgers’ equation (b), and
Naiver-stokes equations (c)). d-f, inference time cost of numerical solver, PPNN and black-box ConvResNet in the
case (governed by reaction-diffusion (d), Burgers’ equation (e), and Naiver-stokes equations (f )). The reaction-
diffusion and Burgers cases are inferred (simulated) on a NVIDIA RTX 3070 GPU and the time is measured for
infer/simulate 10 trajectories for 200 time steps. The Navier-Stocks case is inferred/simulated on a single of Intel
Xeon Gold 6138 CPU and the time is measured for infer/simulate 1 trajectory for 219 time steps.
length of T are summarized in Fig. 6b-f. Due to the fast inference speed of neural networks, both
next-step neural models show significant speedup compared to the high-fidelity numerical solvers.
In particular, the speedup by the PPNN varies from 10ˆ to 60ˆ without significantly sacrificing the
prediction accuracy. Such speedup will become more tremendous considering a longer model rollout
and enormous repeated model queries on a large number of different parameter settings, which are
commonly required in many-query applications such as optimization design, inverse problems, and
uncertainty quantification. Note that all models are compared on the same hardware (GPU or CPU)
to eliminate the difference introduced by hardware. However, as most legacy numerical solvers can
only run on CPUs, the speedup by neural models can be much more significant if they leverage
massive GPU parallelism. Admittedly, adding the PDE-preserving part inevitably increases the
inference cost compared to the black-box baseline, but the huge performance improvement by doing
so outweighs the slight computational overhead, as demonstrated in section When the governing
PDEs are fully known. We have to point out that the computation of the PDE-preserving portion is
not fully optimized, particularly in the NS case, where low-speed I/O interactions reduce the overall
speedup ratio compared to the numerical solver based on the mature CFD platform OpenFOAM.
Further performance improvements are expected by customized code optimizations in future work.
15
a. PPNN
1.5
Relative error
1.0 Black-box
𝜆𝜆0 𝜆𝜆2
Figure 7: Prediction comparison between partial differential equation (PDE)-preserved neural network (PPNN),
the PDE-preserving part of PPNN (numerical solver results on a coarse mesh), the black-box baseline, and the label
data. a, Relative error at different time steps of PPNN (blue line ), Black-Box neural network (orange line )
and the coarse solver (green line ) compared to the ground truth results obtained by icoFoam on a fine mesh . The
relative error is an averaged value of 5 test trajectories with randomly sampled parameters, these parameters are not
in the training set. The shaded area shows the maximum and minimum relative error of these testing trajectories. In
coarse solver, 2 of the testing trajectories diverged (NaN) at the 72nd step thus the green curve ( ) stops at the 71st
step. b, The contours show predicted solution snapshots of velocity magnitude ∥u∥2 for the NS equations, obtained
by black-box ConvResNet (baseline), PDE-preserving part only(coarse solver), and PPNN (ours); compared against
ground truth (high-resolution numerical simulation), where λ0 “ r2500, 0.325sT and λ2 “ r9000, 0.475sT , which are
unseen in the training set. The black color indicates NaN, i.e., solution blow up.
16
neural network structures and differential equations. From the numerical modeling perspective,
if our understanding of the underlying physics is complete and accurate (i.e., complete governing
PDEs are available), the PDE-preserving portion in PPNN can be interpreted as a numerical solver
with the explicit forward Euler scheme defined on a coarse mesh. For simplicity, we here refer to
this numerical solver derived from the fully-known PDE preserving part as the “coarse solver". It
is interesting to see how well it performs by the coarse solver only when governing equations and
IC/BCs/physics properties are fully known.
We use the NS case as an example. Fig. 7b shows the magnitude of velocity ∥u∥2 predicted by
the PPNN, black-box ConvResNet and coarse solver, respectively, compared against the reference
solution. Two representative testing parameters are studied here, one is at a lower Reynolds number
Re “ 2500, y0 “ 0.325 (Fig. 7b, λ0 ), and the other is at a higher Reynolds number Re “ 9000, y0 “
0.475 (Fig. 7b, λ2 ). It is clear that the predictions by the coarse solver noticeably deviate from
the reference solution from 0.4T , and most vortices are damped out due to the coarse spatial
discretization. This becomes worse in the higher Reynolds number scenario, where the coarse solver
predicted flow field is unphysical at 0.1T and the simulation completely diverged at t “ 1.16T ,
because of the large learning timestep making traditional numerical solvers fail to satisfy the stability
constraint.
As shown in the error propagation curves in Fig. 7a, the coarse solver has large prediction errors
over the testing parameter set from the very beginning, which is much higher than that of the
black-box data-driven baseline. Since several of the testing trajectories by the coarse solver diverges
quickly after 70 evolving steps, the error propagation curve stops.
This figure again empirically demonstrates that the PPNN structure not only overcomes the
error accumulation problem in black-box methods, but also significantly outperforms numerical
solvers by simply coarsen the spatiotemporal grids. On the other hand, for those trajectories that
do not diverge, the coarse solver’s relative errors are limited to a certain level, which is in contrast
to black-box, data-driven methods where the error constantly grows due to the error accumulation.
This phenomenon implies that preserving PDEs plays a critical role in addressing the issue of
error accumulation, which does not simply provide a rough estimation of the next step, but carries
underlying physics information that guides the longer-term prediction.
17
a. b.
0.4 0.3
0.3
0.2
Relative error
0.2
0.1
0.1 PPNN
Black-box
0.0 0.0
0 50 100 150 200 0 50 100 150 200
Evolving steps Evolving steps
Figure 8: Prediction comparison when partial differential equation preserved neural network (PPNN) using different
deep neural networks as the trainable part. The relative error ϵt of 100 randomly sampled testing parameters λ is
showed in this figure. The solid lines shows the averaged error over these 100 samples with shaded area shows the
maximum and minimum relative errors of all testing trajectories. Blue ( ) indicates the error of PPNN while
orange ( ) represents the corresponding black-box method. a. shows the relative error of ViT and its PPNN
variant. b. shows the relative error of U-Net and its PPNN counterpart.
els. This observation suggests a potential overfitting issue in the PPNN variant, warranting further
investigation.
By successfully incorporating PPNN with a variety of DNN architectures and exhibiting its
superior performance in the setting of the viscous Burgers’ equation, we furnish compelling evidence
that PPNN operates as a flexible framework for integrating known physics into deep neural networks.
This underlines its potential for enhancing the predictive accuracy and robustness across various
neural architectures.Moreover, our approach not only demonstrates compatibility with different
neural networks but also shows impressive generalizability across varying boundary conditions. For
additional insights into the application of PPNN on diverse boundary value problems, we invite
readers to refer to the Section 1 in the supplementary information.
where ξ “ rx, ts represents spatial and temporal coordinates and u P Rn is the n-dimensional
state variable. In addition to the auto-regressive formulation, one can directly learn the operator
G using deep neural networks in a continuous manner, generally referred to as neural operators.
In the past few years, several continuous neural operator learning methods have been proposed,
e.g., DeepONet [4, 3] and Fourier Neural Operator (FNO) [5]. Although many of them have shown
great success for a handful of PDE-governed systems, it remains unclear how these methods perform
compared to our proposed PPNN on the challenging scenarios studied in this work,
• Limited training data for good generalizability in parameter space and temporal domain.
18
Therefore, we conduct a comprehensive comparison of PPNN with existing state-of-the-art (SOTA)
neural operators, including physics-informed neural network (PINN) [12], DeepONet [4, 3], and
Fourier Neural Operator (FNO) [5], on one of the previous test cases, Viscous Burgers’ equation,
where the PDE is fully known. (Strictly speaking, original PINN by Rassi et al. [12] is not an
operator learner, but can be easily extended to achieving so by augmenting the network input layer
with the parameter dimension, as shown in [13].) For a fair comparison, the problem setting and
training data remain the same for all the methods and the number of trainable parameters of each
models are comparable (PINN: 1.94M parameters; DeepONet: 1.51M parameters; PPNN: 1.56M
parameters. Please note in DeepONet, we used two separate but identical neural networks to learn
the two components ux , uy of velocity u respectively to achieve optimal performance; each network
contains 0.755M trainable parameters). Except for FNO, which has 0.58M trainable parameters
due to the spatial Fourier transformation in FNO is too memory-hungry for a larger model to fit
into the GPU used for training (RTX A6000 48GB RAM). It is worth noting that FNO could
be formulated either as a continuous operator or as an autoregressive model. Here we show the
performance of the continuous FNO. The performance of autoregressive FNO (named as aFNO) is
shown in the Section 3.6 supplementary information, which is slightly better compared to continuous
FNO in terms of testing error with unseen parameters. Besides, we also include a DeepONet
with significantly more trainable parameters (79.19M) to show the highest possible performance
DeepONet would achieve, which is named as DeepONet-L. Note that since some of these models’
original forms cannot be directly applied to learn parametric spatiotemporal dynamics in multi-
variable settings, necessary modifications and improvement has been made. The implementation
details and hyper-parameters of these models are provided in the supplementary information (see
3).
Predictive performance comparison All the models are used to predict the spatiotemporal
dynamics of 100 randomly generated initial fields which are not seen in training. The relative
prediction errors ϵt of the existing SOTA neural operators and PPNN are compared in Fig. 9.
As shown in Fig. 9a and 9b, PPNN significantly outperforms all the other SOTA baselines for
all the time steps in both training and testing parameter regimes. All the existing SOTA neural
operators have much higher prediction errors (several orders of magnitude higher) compared to
PPNN, especially when entering the extrapolation range (after 100 time steps), where the error
grows rapidly. In contrast, the relative error of PPNN predictions remains very low (10´3 ) and
barely accumulated evolving with time (shown in Fig 9a). The prediction errors of most continuous
neural operators do not grow monotonically since their predictions do not rely on auto-regressive
model rollout and thus does not have error accumulation issue. However, the overall accuracy of all
continuous neural operators (particularly in extrapolation range) is much lower than that of PPNN.
Besides, PPNN exhibits a much smaller error scattering over different testing samples (shown in
Fig. 9b), indicating significantly higher robustness compared to existing SOTA methods. All of
these observations suggest the obvious superiority of the PPNN in terms of extrapolability in time.
The comparison of the generalizability in parameter space of all methods is shown in Fig. 9a,
where the dashed lines represent the averaged testing errors on the training parameter set, while the
solid lines indicate the errors on the testing parameter set. It is clear that all the continuous neural
operators have a significantly higher prediction error on the testing set than that on the training
set, while the PPNN’s prediction errors are almost the same on both the testing and training sets,
which are much lower than all the other methods, indicating a much better generalizability.
It is worth mentioning that the notable overfitting issue is observed in DeepONet with in-
creased trainable parameters, i.e., DeepONet-L. It can be seen that although the prediction errors
19
a. 100
PINN
DeepONet
10-1
DeepONet-L
Relative error
FNO
10-2
PPNN
b. c. d.
PPNN PPNN PPNN
Mean relative testing error 𝜀𝜀𝑡𝑡̅ Memory footprint (MB) Inference cost (seconds)
Figure 9: Comparison between partial differential equation preserved neural network (PPNN) and various neural
operators. a. Comparison of the relative error ϵt of the physics-informed neural network (PINN, purple lines ),
deep operator networks (DeepONet, orange lines ), DeepONet-L (green lines ), Fourier neural operator (FNO,
red lines ), and PPNN (blue lines ) of the velocity u in the viscous Burgers’ equation evaluated on 100 randomly
generated testing (unseen) parameters (solid lines), and 100 randomly selected training parameters (dashed lines).
Only the first 100 time steps are used for training. Note that the y axis is in log scale. b. Relative testing error
averaged over all testing parameters and time steps ϵ¯t with error bar. Note that PINN has a much higher error
than the other models which cannot be completely shown in this figure. The error bar here indicates the lowest and
highest relative testing error ϵt among all the testing parameters, while the blue bar shows the mean relative testing
error ϵ¯t . c. The memory footprint of different methods when testing 10 trajectories. Please note that, the time cost
and memory footprint measured of PINN is the amount required for inferring a single row of the target mesh at a
time. Inferring the whole field requires more memory, which exceeds the inference device’s capacity. d. The inference
cost of testing 10 trajectories on a NVIDIA RTX 3070 GPU. Please note the inference time of PINN is measured for
the inference-optimized variants of the original models, which are significantly faster than the original form used for
training.
of DeepONet-L and FNO are relatively lower on training parameter sets and interpolation regimes,
they rapidly increase when stepping into extrapolation ranges and unseen parameter regimes. We
would like to point out that there are “physics-informed” variants of DeepONet and FNO [7, 8],
which regulate the DNN training by minimizing the residual of governing PDEs, in conjunction with
data loss. However, these approaches typically necessitate the knowledge of the complete equation
forms to formulate the physics-informed loss, while our method excel at integrating partially known
physics, such as individual PDE operators, into the neural network structures. Moreover, the chal-
lenge of balancing DNN training with equation loss and label data is a well-documented issue, often
requiring sophisticated hyperparameter tunning to adjust the weights between equation loss and
data loss [30, 70, 71]. In addition, the use of equation loss for problems with a high-dimensional
parameter space poses a significant challenge in minimizing the composed loss function, leading to
marginal and often unstable improvement over purely data-driven methods. We provide a more
detailed comparison and discussion regarding the performance of these physics-informed variants of
FNO/DeepONet in the supplementary information (see 5).
20
Cost comparison Figures 9c and 9d show the time cost and memory footprint in the inference
phase. Even compared to the fastest baseline DeepONet, PPNN is still about 20% faster and
the memory footprint of PPNN is very close to the model with the smallest memory footprint:
DeepONet. It should be noted that all the models, including our PPNN are not exhaustively
fine-tuned. Although carefully tuning hyperparameters may further improve the performance of
each model, issues such as generalizability or robustness presented above cannot be addressed by
hyperparameter tuning.
21
graph convolution operation, interpreted as a localized spectral filtering on unstructured data, can
be viewed as a generalization of the CNN’s convolution operation. By carefully designing spectral
filters, the concept of “PDE-preserving” can be incorporated into desired spatial PDE operators
through finite-volume-based or finite-element-based kernel functions. Although such an extension
would require rigorous mathematical derivations and extensive empirical studies, we believe it serves
as an intriguing direction for future research.
Furthermore, the ConvResNet formulation in the current version of PPNN is not mesh-invariant
due to the discrete convolution operation, suggesting it cannot directly process data represented on
different meshes without interpolations. However, the proposed PPNN framework can be extended
to accommodate mesh invariance. One potential way to achieve this is to use mesh-invariant con-
volutional layers, which apply the same operations to the input data regardless of the underlying
mesh structure. This could be realized, for instance, by employing geodesic convolutions or graph
convolution kernel in spectral domain, allowing the model to adapt to variations in the mesh resolu-
tions. Additionally, integrating adaptive mesh refinement techniques into the training process might
provide another route towards mesh invariance. This strategy would involve dynamically adjusting
the mesh resolution by incorporating mesh info ∆x into the model, allowing the model to capture
the mesh variations.
In real-world applications, training data can be gathered from experiments or in-situ sens-
ing, where data uncertainty may arise due to measurement noises in both inputs and training
labels. Our current PPNN model does not include an uncertainty quantification (UQ) capability,
but uncertainty propagation and quantification represent fascinating directions for future research.
Extending the PPNN model to incorporate Bayesian learning could be a potential solution. Tech-
niques like Bayesian neural networks using variational inference [72, 73, 74, 75] or deep ensemble
methods [76, 77, 78] may offer promising avenues for expanding the PPNN model to include UQ
capabilities.
Spatiotemporal dynamics constitute a fundamental aspect of numerous physics systems, ranging
from classical fields like fluid dynamics, acoustics, and electromagnetics to the intricate realm of
Quantum mechanics. The governing equations for such dynamics often fall within the domain of
partial differential equations. Consequently, the ability to effectively solve these PDEs is imperative
for comprehending, modeling, and controlling the underlying physical processes. By integrating the
PDE structure into deep neural networks, PPNN represents a powerful tool for modeling such PDEs.
In the context of various physics applications, PPNN exhibits considerable potential. Contrasted
with traditional numerical solvers or earlier physics-informed neural networks, PPNN demonstrates
lower training and inferring cost and the capacity to assimilate unknown physics from data. Ad-
ditionally, when compared to purely data-driven methods, PPNN provides enhanced accuracy in
out-of-sample scenarios while maintaining stability over prolonged model rollouts. The versatile
nature of PPNN makes it a promising candidate for applications in modeling and predicting dy-
namic physics systems, including heat transfer, turbulent flow, and electromagnetic fields. While
not delving into the specifics of each application, it is evident that PPNN holds significant promise
for speeding up the study and understanding of complex spatiotemporal dynamics across various
physics domains.
Conclusion
In this work, we proposed a physics-inspired deep learning framework, PDE-preserved neural
network (PPNN), aiming to learn parametric spatiotemporal physics, where the (partially) known
governing PDE structures are preserved via fixed convolutional residual connection blocks in a multi-
22
resolution setting. The PDE-preserving ConvResNet blocks together with trainable blocks in an
encoding-decoding manner bring the PPNN significant advantages in long-term model rollout accu-
racy, spatiotemporal/parameter generalizability, and training efficiency. The effectiveness and merit
have been demonstrated over a handful of challenging spatiotemporal prediction tasks, including
the FitzHugh–Nagumo reaction diffusion equations, viscous Burgers equations and Naiver-Stokes
equations, compared to the existing baselines, including ConvResNet, U-Net, Vision transformer,
PINN, DeepONet, and FNO. The proposed PPNN shows satisfactory predictive accuracy in testing
regimes and significantly lower error-accumulation effect for long-term model rollout in time, even
if the preserved physics is incomplete or inaccurate. Finally, the discussion on the inference and
training costs shows the great potential of the proposed model to serve as a reliable and efficient
surrogate model for spatiotemporal dynamics in many applications that require repeated model
queries, e.g., design optimization, data assimilation, uncertainty quantification, and inverse prob-
lems. While Direct Numerical Simulations (DNS) are used as the source of labeled training data in
our study, the data could just as well originate from experimental results or field observations. A
unique feature of PPNN, and one of its significant advances, lies in its ability to generalize to differ-
ent physical parameters and initial/boundary conditions. Unlike most label-free PINN techniques,
which act as PDE solvers for a given set of parameters and conditions, PPNN’s ability to adapt to
varying parameters and conditions underscores its capability to learn the PDE system. In general,
this work explored a creative design of leveraging physics-inductive bias in scientific machine/deep
learning and showcased how to use physical prior knowledge to inform the learning architecture
design, shedding new light on physics-informed deep learning from a different aspect. Therefore,
this work represents a inventive PiDL development and a significant advance in the realm of SciML.
METHODS
Problem formulation
We are interested in predictive modeling of physical systems with spatiotemporal dynamics,
which can be described by a set of parametric coupled PDEs in the general form,
„ ȷ
Bu
` F u, u , ..., ∇x u, ∇x u, ∇x u ¨ u, ...; λ “ 0,
2 2
x, t P Ω ˆ r0, T s, λ P Rd , (14a)
Bt
„ ȷ
2
I x, u, ∇x u, ∇x u ¨ u; λ “ 0, x P Ω, t “ 0, λ P Rd , (14b)
„ ȷ
B t, x, u, ∇2x u, ∇x u ¨ u; λ “ 0, x, t P BΩ ˆ r0, T s, λ P Rd , (14c)
where u “ upx, t; λq P Rn is the n-dimensional state variable; t denotes time and x P Ω specifies
the space; F r¨s is a complex nonlinear functional governing the physics, while differential opera-
tors Ir¨s and Br¨s describe the initial and boundary conditions (I/BCs) of the system, respectively;
λ P Rd is a d-dimensional vector, representing physical/modeling parameters in the governing
PDEs and/or I/BCs. Solving this parametric spatiotemporal PDE system typically relies on tra-
ditional FD/FV/FE methods, which are computationally expensive in most cases. This is due to
the spatiotemporal discretization of the PDEs into a high-dimensional algebraic system, making the
numerical simulation time-consuming, particularly considering that a tiny step is often required for
the time integration to satisfy the numerical stability constraint. Moreover, as the system solution
upx, t; λq is parameter-dependent, we have to start over and conduct the entire simulation given a
new parameter λ, making it infeasible for application scenarios that require many model queries,
23
e.g., parameter inference, optimization, and uncertainty quantification. Therefore, our objective
is to develop a data-driven neural solver for rapid spatiotemporal prediction, enabled by efficient
time-stepping with coarse-gaining and fast inference speed of neural networks. In particular, this
study focuses on the learning architecture design by preserving known PDE structures for improving
the robustness, stability, and generalizability of data-driven auto-regressive predicting models.
where f pjq represents the generic neural network function of j th layer and θ pjq are corresponding
weights. For end-to-end spatiotemporal learning, f pjq are often formulated by (graph) convolutional
neural networks with trainable convolution stencils and biases. In a ResNet block, the dimension of
the feature vectors (i.e., the image resolution and the number of channels) should remain the same
across all layers. The ResNet-based next-step models have been demonstrated powerful and effective
for predicting complex spatiotemporal physics. One of the examples is the MeshGraphNet [9],
which is a GNN-based ResNets and shows the SOTA performance in spatiotemporal learning with
unstructured mesh data.
In this work, as we limit ourselves to structured data within regular domains, a CNN variant of
the MeshGraphNet, Convolutional ResNet (ConvResNet)-based next-step model, is used as one of
the baseline black-box models in this work, whose network structure is shown in Fig. 10a. The Con-
vResNet takes the previous state and physical parameters as the input and predicts the next-step
state using a residual connection across the entire hidden ConvResNet layers after a pixel shuf-
fle layer. The hidden layers consist of several ConvResNet blocks, constructed based on standard
convolution layers with residual connections and ReLU activation functions, followed by layer nor-
malization. To learn the dependence of physical parameters λ, each scalar component of the physical
parameter vector is multiplied by a trainable matrix, which is obtained by vector multiplication of
trainable weight vectors.
24
a.
ConvResNet Block Conv Conv
𝒖𝒖𝑡𝑡 ResNet ResNet
Layer Normalization
Block Block
Convolution
Convolution
Convolution
Convolution
Convolution
PixelShuffle
ReLU
ReLU
ReLU
× ⋯
𝒖𝒖𝑡𝑡+1
+
𝝀𝝀
Trainable Vectors
× ⋯
⋯⋯
⋯
⋯
⋯
⋯⋯
b.
ConvResNet Block Conv Conv
𝒖𝒖𝑡𝑡 ResNet ResNet
Layer Normalization
Block Block
Convolution
Convolution
Convolution
Convolution
Convolution
PixelShuffle
ReLU
ReLU
ReLU
⋯
𝒖𝒖𝑡𝑡+1
+
𝓕𝓕(𝒖𝒖𝑡𝑡 )
×
𝝀𝝀
Trainable Vectors
× ⋯
⋯⋯
⋯
⋯
⋯
⋯⋯
Figure 10: Schematics of the deep neural network structures used in this work. a, Network architecture of the
baseline black-box ConvResNet-based next-step model. b, Network architecture of the trainable portion of partial
differential equation (PDE)-preserved neural network (PPNN). The only difference between them is trainable portion
of PPNN has an extra input variable Fput q, provided by the PDE-preserving portion of PPNN.
Residual connections and ODEs As discussed in [44, 50], the residual connection as defined
in Eq. 16 can be seen as a forward Euler discretization of a ODE,
Bzptq
“ F pzptq | θptqq, for t P p0, T s, (17)
Bt
where zpt “ 0q “ z0 and T is total time. In ResNets, a fixed time step size of ∆t “ 1 is set for the
entire time span and N ¨ ∆t “ T . Namely, the depth of the residual connection (i.e., the number of
layers in a ResNet block) can be controlled by changing the total time T . On the other hand, an
ODE as given by Eq. 17 can be interpreted as a continuous ResNet block with infinite number of
layers (i.e., infinite depth). Based on this observation, the classic ResNet structure can be extended
by discretizing an ODE using different time-stepping schemes (e.g., Euler, Runge-Kutta, leapfrog,
etc.). Moreover, we can also define a residual connection block by directly coupling a differentiable
ODE solver with a multi-layer perception (MLP) representing F p¨q, where the hybrid ODE-MLP is
trained as a whole differentiable program using back-propagation, which is known as a neural-ODE
block [50].
Convolution operations and PDEs In the neuralODE, MLP is used to define F p¨q, which,
however, can be any neural network structure in a general setting. When dealing with structured
data (e.g., images, videos, physical fields), the features zpt, xq can be seen as spatial fields, and
convolution operations are often used to construct a CNN-based F p¨q. A profound relationship
between convolutions and differentiations has been presented in [79, 45, 80]. Following the deep
learning convention, a 2D convolution is defined as,
´ ¯ ż
conv z, h pθq
“ zpx1 ´ xqhpθq pxqdx, (18)
25
where h represent convolution kernel parameterized by θ. Based on the order of sum rules, the kernel
h can be designed to approximate any differential operator with prescribed order of accuracy [81],
and thus the convolution in Eq. 18 can be expressed as [49],
´ ¯ „ ȷ
conv z, h pθq
“ D u, ..., ∇x u, ∇x u, ∇x u ¨ u, ...; θ ,
2
(19)
where D is a discrete differential operator based FD/FV/FE methods. For example, from the
point of view of FDM, convolution filters can be seen as the finite difference stencils of certain can
be interpreted as the discrete forms of certain PDEs, and thus the PDEs can be used to inform
ConvResNet architecture design.
26
Data availability
All the used datasets in this study can be generated by the openly available Python scripts on
GitHub at https://github.com/jx-wang-s-group/ppnn upon publication.
Code availability
All the source codes to reproduce the results in this study will be openly available on GitHub
at https://github.com/jx-wang-s-group/ppnn upon publication.
References
[1] Hugo FS Lui and William R Wolf. Construction of reduced-order models for fluid flows using
deep feedforward neural networks. Journal of Fluid Mechanics, 872:963–994, 2019.
[2] Omer San, Romit Maulik, and Mansoor Ahmed. An artificial neural network framework for
reduced order modeling of transient flows. Communications in Nonlinear Science and Numerical
Simulation, 77:271–287, 2019.
[3] Han Gao, Jian-Xun Wang, and Matthew J Zahr. Non-intrusive model reduction of large-
scale, nonlinear dynamical systems using deep learning. Physica D: Nonlinear Phenomena,
412:132614, 2020.
[4] Stefania Fresca and Andrea Manzoni. Pod-dl-rom: enhancing deep learning-based reduced
order models for nonlinear parametrized pdes by proper orthogonal decomposition. Computer
Methods in Applied Mechanics and Engineering, 388:114181, 2022.
[5] Takaaki Murata, Kai Fukami, and Koji Fukagata. Nonlinear mode decomposition with convo-
lutional neural networks for fluid dynamics. Journal of Fluid Mechanics, 882, 2020.
[6] Arvind T Mohan, Dima Tretiak, Misha Chertkov, and Daniel Livescu. Spatio-temporal deep
learning models of 3d turbulence with physics informed diagnostics. Journal of Turbulence,
21(9-10):484–524, 2020.
[7] Romit Maulik, Bethany Lusch, and Prasanna Balaprakash. Reduced-order modeling of
advection-dominated systems with recurrent neural networks and convolutional autoencoders.
Physics of Fluids, 33(3):037106, 2021.
[8] Kai Fukami, Kazuto Hasegawa, Taichi Nakamura, Masaki Morimoto, and Koji Fukagata. Model
order reduction with neural networks: Application to laminar and turbulent flows. SN Com-
puter Science, 2(6):1–16, 2021.
[9] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter Battaglia. Learning mesh-
based simulation with graph networks. In International Conference on Learning Representa-
tions, 2020.
[10] Xu Han, Han Gao, Tobias Pfaff, Jian-Xun Wang, and Liping Liu. Predicting physics in mesh-
reduced space with temporal attention. In International Conference on Learning Representa-
tions, 2022.
27
[11] Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm,
Manish Parashar, Abani Patra, James Sethian, Stefan Wild, et al. Workshop report on ba-
sic research needs for scientific machine learning: Core technologies for artificial intelligence.
Technical report, USDOE Office of Science (SC), Washington, DC (United States), 2019.
[12] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks:
A deep learning framework for solving forward and inverse problems involving nonlinear partial
differential equations. Journal of Computational physics, 378:686–707, 2019.
[13] Luning Sun, Han Gao, Shaowu Pan, and Jian-Xun Wang. Surrogate modeling for fluid flows
based on physics-constrained deep learning without simulation data. Computer Methods in
Applied Mechanics and Engineering, 361:112732, 2020.
[14] Ruiyang Zhang, Yang Liu, and Hao Sun. Physics-informed multi-lstm networks for meta-
modeling of nonlinear structures. Computer Methods in Applied Mechanics and Engineering,
369:113226, 2020.
[15] Ehsan Haghighat, Maziar Raissi, Adrian Moure, Hector Gomez, and Ruben Juanes. A physics-
informed deep learning framework for inversion and surrogate modeling in solid mechanics.
Computer Methods in Applied Mechanics and Engineering, 379:113741, 2021.
[16] Luning Sun and Jian-Xun Wang. Physics-constrained bayesian neural network for fluid flow re-
construction with sparse and noisy data. Theoretical and Applied Mechanics Letters, 10(3):161–
169, 2020.
[17] Amirhossein Arzani, Jian-Xun Wang, and Roshan M. D’Souza. Uncovering near-wall blood
flow from sparse data with physics-informed neural networks. Physics of Fluids, 33(7):071905,
2021.
[18] Lu Lu, Raphael Pestourie, Wenjie Yao, Zhicheng Wang, Francesc Verdugo, and Steven G
Johnson. Physics-informed neural networks with hard constraints for inverse design. SIAM
Journal on Scientific Computing, 43(6):B1105–B1132, 2021.
[19] Enrui Zhang, Ming Dao, George Em Karniadakis, and Subra Suresh. Analyses of internal
structures and defects in materials using physics-informed neural networks. Science advances,
8(7):eabk0644, 2022.
[20] Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equa-
tions using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510,
2018.
[21] Dongkun Zhang, Lu Lu, Ling Guo, and George Em Karniadakis. Quantifying total uncer-
tainty in physics-informed neural networks for solving forward and inverse stochastic problems.
Journal of Computational Physics, 397:108850, 2019.
[22] Yibo Yang and Paris Perdikaris. Adversarial uncertainty quantification in physics-informed
neural networks. Journal of Computational Physics, 394:136–152, 2019.
[23] Ehsan Kharazmi, Zhongqiang Zhang, and George Em Karniadakis. hp-vpinns: Variational
physics-informed neural networks with domain decomposition. Computer Methods in Applied
Mechanics and Engineering, 374:113547, 2021.
28
[24] Ameya D Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-
informed neural networks on discrete domains for conservation laws: Applications to forward
and inverse problems. Computer Methods in Applied Mechanics and Engineering, 365:113028,
2020.
[25] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning
nonlinear operators via deeponet based on the universal approximation theorem of operators.
Nature Machine Intelligence, 3(3):218–229, 2021.
[26] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, An-
drew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differ-
ential equations. In International Conference on Learning Representations, 2020.
[27] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of para-
metric partial differential equations with physics-informed deeponets. Science advances,
7(40):eabi8605, 2021.
[28] Somdatta Goswami, Minglang Yin, Yue Yu, and George Em Karniadakis. A physics-informed
variational deeponet for predicting crack path in quasi-brittle materials. Computer Methods in
Applied Mechanics and Engineering, 391:114587, 2022.
[29] Ameya D Jagtap, Kenji Kawaguchi, and George Em Karniadakis. Adaptive activation functions
accelerate convergence in deep and physics-informed neural networks. Journal of Computational
Physics, 404:109136, 2020.
[30] Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why pinns fail to train: A neural
tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022.
[31] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality is all you need for
training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022.
[32] Han Gao, Luning Sun, and Jian-Xun Wang. PhyGeoNet: physics-informed geometry-adaptive
convolutional neural networks for solving parameterized steady-state PDEs on irregular domain.
Journal of Computational Physics, 428:110079, 2021.
[33] Pu Ren, Chengping Rao, Yang Liu, Jian-Xun Wang, and Hao Sun. Phycrnet: Physics-informed
convolutional-recurrent network for solving spatiotemporal pdes. Computer Methods in Applied
Mechanics and Engineering, 389:114399, 2022.
[34] Nicholas Geneva and Nicholas Zabaras. Modeling the dynamics of pde systems with physics-
constrained deep auto-regressive networks. Journal of Computational Physics, 403:109056,
2020.
[35] Han Gao, Luning Sun, and Jian-Xun Wang. Super-resolution and denoising of fluid flow
using physics-informed convolutional neural networks without high-resolution labels. Physics
of Fluids, 33(7):073603, 2021.
[36] Nils Wandel, Michael Weinmann, and Reinhard Klein. Teaching the incompressible navier–
stokes equations to fast neural surrogate models in three dimensions. Physics of Fluids,
33(4):047117, 2021.
29
[37] Rishikesh Ranade, Chris Hill, and Jay Pathak. Discretizationnet: A machine-learning based
solver for navier–stokes equations using finite volume discretization. Computer Methods in
Applied Mechanics and Engineering, 378:113722, 2021.
[38] Houpu Yao, Yi Gao, and Yongming Liu. Fea-net: A physics-guided data-driven model for
efficient mechanical response prediction. Computer Methods in Applied Mechanics and Engi-
neering, 363:112892, 2020.
[39] Sebastian K Mitusch, Simon W Funke, and Miroslav Kuchta. Hybrid fem-nn models: Com-
bining artificial neural networks with the finite element method. Journal of Computational
Physics, 446:110651, 2021.
[40] Zhenlin Wang, Xun Huan, and Krishna Garikipati. Variational system identification of the
partial differential equations governing microstructure evolution in materials: Inference over
sparse and spatially unrelated data. Computer Methods in Applied Mechanics and Engineering,
377:113706, 2021.
[41] Minglang Yin, Enrui Zhang, Yue Yu, and George Em Karniadakis. Interfacing finite elements
with deep neural operators for fast multiscale modeling of mechanics problems. Computer
Methods in Applied Mechanics and Engineering, page 115027, 2022.
[42] Han Gao, Matthew J Zahr, and Jian-Xun Wang. Physics-informed graph neural galerkin net-
works: A unified framework for solving pde-governed forward and inverse problems. Computer
Methods in Applied Mechanics and Engineering, 390:114502, 2022.
[43] Xin-Yang Liu and Jian-Xun Wang. Physics-informed dyna-style model-based deep reinforce-
ment learning for dynamic control. Proceedings of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 477(2255):20210618, 2021.
[44] Eldad Haber and Lars Ruthotto. Stable architectures for deep neural networks. Inverse prob-
lems, 34(1):014004, 2017.
[45] Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks:
Bridging deep architectures and numerical differential equations. In International Conference
on Machine Learning, pages 3276–3285. PMLR, 2018.
[46] François Rousseau, Lucas Drumetz, and Ronan Fablet. Residual networks as flows of diffeo-
morphisms. Journal of Mathematical Imaging and Vision, 62(3):365–375, 2020.
[47] Lars Ruthotto and Eldad Haber. Deep neural networks motivated by partial differential equa-
tions. Journal of Mathematical Imaging and Vision, 62(3):352–364, 2020.
[48] Ben Chamberlain, James Rowbottom, Maria I Gorinova, Michael Bronstein, Stefan Webb, and
Emanuele Rossi. Grand: Graph neural diffusion. In International Conference on Machine
Learning, pages 1407–1418. PMLR, 2021.
[49] Moshe Eliasof, Eldad Haber, and Eran Treister. PDE-GCN: Novel architectures for graph
neural networks motivated by partial differential equations. Advances in Neural Information
Processing Systems, 34, 2021.
[50] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary
differential equations. Advances in neural information processing systems, 31, 2018.
30
[51] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 770–778, 2016.
[52] Amir Gholami, Kurt Keutzer, and George Biros. Anode: Unconditionally accurate memory-
efficient gradients for neural odes. arXiv preprint arXiv:1902.10298, 2019.
[53] Zheng Shi, Nur Sila Gulgec, Albert S Berahas, Shamim N Pakzad, and Martin Takáč. Finite
difference neural networks: Fast prediction of partial differential equations. In 2020 19th IEEE
International Conference on Machine Learning and Applications (ICMLA), pages 130–135.
IEEE, 2020.
[54] Mike Innes, Alan Edelman, Keno Fischer, Chris Rackauckas, Elliot Saba, Viral B Shah, and
Will Tebbutt. A differentiable programming system to bridge machine learning and scientific
computing. arXiv preprint arXiv:1907.07587, 2019.
[55] Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit
Supekar, Dominic Skinner, Ali Ramadhan, and Alan Edelman. Universal differential equations
for scientific machine learning. arXiv preprint arXiv:2001.04385, 2020.
[56] Yifan Sun, Linan Zhang, and Hayden Schaeffer. Neupde: Neural network based ordinary and
partial differential equations for modeling time-dependent data. In Mathematical and Scientific
Machine Learning, pages 352–372. PMLR, 2020.
[57] Andreas Hochlehnert, Alexander Terenin, Steindór Sæmundsson, and Marc Deisenroth. Learn-
ing contact dynamics using physically structured neural networks. In International Conference
on Artificial Intelligence and Statistics, pages 2152–2160. PMLR, 2021.
[58] Eric Heiden, David Millard, Erwin Coumans, Yizhou Sheng, and Gaurav S Sukhatme. Neural-
sim: Augmenting differentiable simulators with neural networks. In 2021 IEEE International
Conference on Robotics and Automation (ICRA), pages 9474–9481. IEEE, 2021.
[59] Maren Hackenberg, Marlon Grodd, Clemens Kreutz, Martina Fischer, Janina Esins, Linus
Grabenhenrich, Christian Karagiannidis, and Harald Binder. Using differentiable programming
for flexible statistical modeling. The American Statistician, pages 1–10, 2021.
[60] Dmitrii Kochkov, Jamie A Smith, Ayya Alieva, Qing Wang, Michael P Brenner, and Stephan
Hoyer. Machine learning–accelerated computational fluid dynamics. Proceedings of the National
Academy of Sciences, 118(21), 2021.
[61] Filipe De Avila Belbute-Peres, Thomas Economon, and Zico Kolter. Combining differentiable
pde solvers and graph neural networks for fluid flow prediction. In International Conference
on Machine Learning, pages 2402–2411. PMLR, 2020.
[62] Kiwon Um, Robert Brand, Yun Raymond Fei, Philipp Holl, and Nils Thuerey. Solver-in-the-
loop: Learning from differentiable physics to interact with iterative pde-solvers. Advances in
Neural Information Processing Systems, 33:6111–6122, 2020.
[63] Yohai Bar-Sinai, Stephan Hoyer, Jason Hickey, and Michael P Brenner. Learning data-driven
discretizations for partial differential equations. Proceedings of the National Academy of Sci-
ences, 116(31):15344–15349, 2019.
31
[64] Omer San and Romit Maulik. Neural network closures for nonlinear model order reduction.
Advances in Computational Mathematics, 44:1717–1750, 2018.
[65] Andrea Beck, David Flad, and Claus-Dieter Munz. Deep neural networks for data-driven les
closure models. Journal of Computational Physics, 398:108910, 2019.
[66] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks
for biomedical image segmentation. In Medical Image Computing and Computer-Assisted
Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9,
2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
[67] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai,
Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly,
et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929, 2020.
[68] Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang,
and George Em Karniadakis. A comprehensive and fair comparison of two neural operators
(with practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and
Engineering, 393:114778, 2022.
[69] Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kam-
yar Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning
partial differential equations. arXiv preprint arXiv:2111.03794, 2021.
[70] Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gra-
dient normalization for adaptive loss balancing in deep multitask networks. In International
conference on machine learning, pages 794–803. PMLR, 2018.
[71] Levi McClenny and Ulisses Braga-Neto. Self-adaptive physics-informed neural networks using
a soft attention mechanism. arXiv preprint arXiv:2009.04544, 2020.
[72] Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Auto-
matic differentiation variational inference. Journal of machine learning research, 2017.
[73] Alex Graves. Practical variational inference for neural networks. Advances in neural information
processing systems, 24, 2011.
[74] Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational
inference. Journal of Machine Learning Research, 2013.
[75] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model
uncertainty in deep learning. In international conference on machine learning, pages 1050–1059.
PMLR, 2016.
[76] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable pre-
dictive uncertainty estimation using deep ensembles. Advances in neural information processing
systems, 30, 2017.
[77] Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua
Dillon, Balaji Lakshminarayanan, and Jasper Snoek. Can you trust your model’s uncertainty?
evaluating predictive uncertainty under dataset shift. Advances in neural information processing
systems, 32, 2019.
32
[78] Rahul Rahaman et al. Uncertainty quantification and deep ensembles. Advances in Neural
Information Processing Systems, 34:20063–20075, 2021.
[79] Bin Dong, Qingtang Jiang, and Zuowei Shen. Image restoration: Wavelet frame shrinkage,
nonlinear evolution pdes, and beyond. Multiscale Modeling & Simulation, 15(1):606–660, 2017.
[80] Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. Pde-net: Learning pdes from data. In
International Conference on Machine Learning, pages 3208–3216. PMLR, 2018.
[81] Zichao Long, Yiping Lu, and Bin Dong. Pde-net 2.0: Learning pdes from data with a numeric-
symbolic hybrid deep network. Journal of Computational Physics, 399:108925, 2019.
[82] Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Pe-
ter Battaglia. Learning to simulate complex physics with graph networks. In International
Conference on Machine Learning, pages 8459–8468. PMLR, 2020.
Acknowledgement: X.Y.L. and J.X.W. would like to acknowledge the funds from Office of Naval
Research under award numbers N00014-23-1-2071 and National Science Foundation, under award
numbers OAC-2047127. H.S. acknowledges the funds from the National Natural Science Founda-
tion of China (No. 62276269) and the Beijing Natural Science Foundation (No. 1232009). L.L. was
supported by the funds from U.S. Department of Energy under award number No. DE-SC0022953.
We would like to express our sincere gratitude to the three anonymous reviewers and the editor for
their valuable comments and suggestions, which contributed to enhancing the quality of this paper.
Author contributions: X.Y.L., H.S. and J.X.W. contributed to the ideation and design of the
research; X.Y.L. and J.X.W. performed the research (implemented the model, conducted numer-
ical experiments, analyzed the data, contributed materials/analysis tools); X.Y.L, M.Z. and L.L.
contributed to the comparison study with other models; X.Y.L. and J.X.W. wrote the manuscript;
H.S. and L.L. contributed to manuscript editing.
33
Supplementary Information:
Multi-resolution partial differential equations preserved learning frame-
work for spatiotemporal dynamics
Bu B2u
“ 2, t P r0, 100s, x P Ω “ r0, Ls (20)
Bt Bx
where u denotes the diffusing material density, t and x representing the time and space coordinates,
respectively. The simulation space Ω is defined as Ω “ r0, Ls, L “ 16π. The boundary conditions
were randomly sampled from two types: Dirichlet and Neumann boundaries, as below
$
&upx, tq “ βb, c “ 0
x P BΩ (21)
% Bupx, tq “ b, c “ 1
Bx
where β “ 5 is a scaling factor. We use b as the boundary value parameter and c to represent the
boundary type. In this case, c “ 0 corresponds to Dirichlet boundary and c “ 1 indicates Neumann
boundary. The sampling is based on the following distributions,
P pc “ 0q “ P pc “ 1q “ 0.5
(22)
b „ U p´1, 1q
where P p¨q represent probability density and U p´1, 1q is uniform distribution over range p´1, 1q.
Meanwhile, the initial condition is sampled from a 256 dimensional space:
16 ÿ
ÿ 16
upx, 0q “ κi,j sinpi ¨ x{L ` φi,j q (23)
i“1 j“1
where κi,j and φi,j are random variables sampled from uniform distribution over r´0.5, 0.5q and
r0, 1q, respectively. The training and testing data are generated by a finite difference solver using
forward Euler and second order of accuracy central difference scheme. The spatial domain is dis-
cretized into 128 grid points while the numerical time step is δt “ 0.01 and learning step size is
∆t “ 50δt. The boundary value problem studied here is of a high-dimensional parameter space
consisting of b, c and the initial condition space, denoted as λ.
Figure S1 shows the relative error ϵt of PPNN compared with its corresponding black-box method
tested with 256 randomly sampled unseen parameters λ for 200 time steps. Similar to other cases,
PPNN shows significantly lower relative error with an average error of merely 0.328% at the last
time step, which is almost one order of magnitude smaller than that of its black-box counterpart
(1.996%). Meanwhile, the error distribution range is significantly narrower.
Fig. S2 compares the PPNN prediction against the black-box results and ground truth. We can
see the PPNN results are visually identical to the ground truth while for the black-box method,
while unphysical discontinuity can be observed especially close to the end of each trajectory.
34
PPNN
1.0 0.020 Black-box
0.015
0.8
0.010
Relative error
0.6 0.005
0.2
0.0
0 25 50 75 100 125 150 175 200
Evolve steps
Figure S1: The error of 256 randomly selected parameters λ, predicted by PPNN (blue line) compared to black-box
baseline (orange line). Shaded areas indicate the error distribution range.
x
time
35
2.5 2.5
0.02 PPNN PPNN
BlackBox BlackBox
PPNN-Enrichment Only PPNN-Enrichment Only
2.0 PPNN-Residual Only 2.0
0.01 PPNN-Residual Only
1.5
t
1.5
t
Relative error
Relative error
0.00
0 50 100 150 200
1.0 1.0
0.5 0.5
0.0 0.0
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Evolve Steps Evolve Steps
(a) 500 epochs (b) 50 epochs
Figure S3: Ablation study of the PPNN framework. Relative error ϵt of 100 randomly selected testing parameters
λ in the viscous Burgers’ case predicted by PPNN, black-box ConvResNet, PPNN with input enrichment only
(PPNNEo), PPNN with PDE residual only (PPNNRo). (a) shows the relative error after sufficient training epochs
(500 training epochs), while (b) shows the error with insufficient training epochs (i.e. 50 epochs). solid lines indicate
the average error of the 100 testing parameters while the shaded areas represent the error distribution range.
these two PPNN variants is compared with the complete PPNN and the black-box method, using
unseen testing parameters. Fig. S3(a), illustrating the relative error after sufficient training (500
epochs), shows that while PPNNEo’s performance closely mirrors that of the complete PPNN, the
full PPNN nonetheless delivers the lowest relative error and the tightest error distribution range over
the majority of testing steps. Meanwhile, PPNNRo struggles from significant error accumulation,
suggesting that training becomes notoriously challenging when PDE residuals are connected but
not included as input features in the trainable networks. Even so, PPNNRo exhibits a significantly
narrower error distribution range than the black-box baseline, indicating the value of embedding
PDEs for biasing the final results. The performance gap between PPNNRo and the full PPNN or
PPNNEo could be narrowed by using a higher-resolution mesh or a higher-accuracy scheme in the
PDE-preserving part, though this would come with increased inference time.
However, with insufficient training (50 epochs), as depicted in Fig. S3(b), PPNNEo experi-
ences rapid error accumulation after rolling out for 100 steps and exhibits a wide error distribution
range—worse than that of the black-box baseline—indicating it is more challenging to train due to
feature expansion. This can be substantially mitigated by adding the PDE-based residual connec-
tion, benefiting from the robust bias introduced by the PDE operators preserved on coarse grids.
Throughout all stages, the complete PPNN retains the best performance.
The results indicate that both the feature enrichment and the residual connection elements
are crucial to our model’s overall performance. The residual connection in particular stabilizes
training—especially in the early stages, which is particularly necessary when training in a sequence-
to-sequence (Seq2Seq) style—and significantly reduces training cost, while input enrichment can
eventually enhance accuracy at the final stage with sufficient training.
36
Table S1: Resolution used in the PPNN
which leverages multiple convolution layers (except FNO which uses channel-wise MLP) to compress
the multi-channel input field to a smaller field (for PPNN, black-box ConvResNet and FNO) or to a
hidden vector (for PINN and DeepONet). The physical parameters are first mapped to a field with
the same shape as the input field and then passed to the CNN encoder as an additional channel.
Here we use two trainable vectors (except DeepONet) of shape rnx , 1s and r1, ny s (where nx , ny are
the number of evaluating points in the input field in x and y directions respectively) to generate a
matrix and multiply with the physical parameters to mapping to a field.
37
𝑢𝑢𝑡𝑡
ConvResNet Block
Convolution
Convolution
Convolution
Concatenate
ReLU
𝑢𝑢𝑡𝑡+1
+
×
𝜆𝜆
Bilinear Upsampling
ConvResNet Block
ConvResNet Block
Convolution
Convolution
Concatenate
× ⋯
⋯
ReLU
ReLU
⋯
⋯
⋯
⋯⋯
⋯
Trainable Vectors
Bilinear Upsampling
ConvResNet Block
ConvResNet Block
Convolution
Convolution
Concatenate
ConvResNet Block
ReLU
ReLU
Layer Norm
Convolution
Convolution
Bilinear Upsampling
ReLU
ConvResNet Block
+
Figure S4: Schematic of the Unet used in this paper ReLU
The ViT consists of 6 layers of attention encoder and a single layer attention decoder. In each multi-
head attention layer, we use 16 attention heads, with a hidden dimension of 1024, while the output of
the encoder is a 2048 dimensional vector for each patch. The overall trainable parameters for black-
box baseline is 64, 025, 088 while the corresponding PPNN has 66,126,336 trainable parameters.
3.4 PINN
Modified MLP For PINN, we apply the modified, multi-layer perceptron (MLP) which has been
“proved to be empirically & uniformly better than the conventional DeepONet architecture” [2].
The structure of this modified MLP is shown in Fig. S5(a)
Linear
ReLU
α
Linear
Linear
Linear
ReLU
ReLU
h * ⋯ * *
Linear
ReLU
Input Output
β
* : 𝑓(ℎ) = 1 − ℎ 𝛼 + ℎ𝛽
(a)
Hidden vector Hidden vector
𝒖0
Convolution
Convolution
ReLU
⋯
⋯
⋯⋯
Append coordinates
to the hidden vector
(b)
Figure S5: (a)Structure of modified MLP. (b) The PINN structure we used
38
PINN consists of a CNN encoder followed by a modified MLP to compress the input parameters:
initial condition and viscosity to a hidden vector. Then the space and time coordinates are appended
to the hidden vector and passed to another modified MLP. The structure is shown in Fig. S5(b).
The encoder consists of 4 layers CNN, with kernel size equals to 8, 6, 6, and 8 respectively. The first
three convolution layers has a stride equals to 3, while in the last layer stride equals 1. After the
CNN, the field is compressed to a vector of size 48. For both modified MLPs, each has 25 hidden
layers with 40000 neurons in each layer.
Because we have to evaluate the boundary values in PINN, the PINN has to deal with all the
points, including those at the bottom and right boundaries which are omitted in PPNN and FNO
due to these points have exact same value as those points at the top and left boundaries (periodic
BC). The output is the predicted solution with two velocity components ux and uy . In training,
the total loss L is a weighted summation of 4 components:
where wic “ 20, wbc “ 1, weq “ 1 and wd “ 20 are the balancing weights for the loss terms initial
condition loss Lic , equation loss Leq , boundary loss Lbc and data loss Ld , respectively.
3.5 DeepONet
We followed the description of DeepONet as proposed in [3, 4], the structure of DeepONet
is shown in Fig. S6. Based on the numerical experiments, we find that use two separate neural
Branch Net
Leaky ReLU
Leaky ReLU
Convolution
Convolution
Batch Norm
Batch Norm
𝒖𝒖0
⋯ Linear
𝝀𝝀
× u(x, y, t)
Trunk Net
x
⋯
Linear
Linear
Linear
ReLU
ReLU
y
t
Space & time
coordinates
MLP
Figure S6: Structure of DeepONet (only shows the network used to predict ux , the network used to predict uy
shares completely identical structure.)
networks to predict the two velocity components achieves slightly better performance. DeepONet
consists of two parts: branch net and trunk net. The branch net is responsible for handling the
hidden vector encoded from the input parameters, while the trunk net deals with the space & time
coordinates. The output of these two sub-nets are element-wisely multiplied with each other and
generates the output: the velocity components ux or uy . The DeepONet has 8 convolution layers
with batch norm in the branch net, while the trunk net consists of 4 linear layers. For DeepONet-L,
the branch net contains 11 convolution layers while trunk net has 4 linear layers. For more details
please refer to the source code.
39
3.6 Fourier Neural Operators (FNO)
As discussed in the main text, FNO can be formulated either as a autoregressive model like
PPNN or a continuous operator like DeepONet/PINN. To distinguish the two formulations, we
name the autoregressive FNO as aFNO. aFNO learns the following mapping relationship G:
where u is the state variable, λ represents the physical parameters. The core component of FNO is
the Fourier layer, which is shown in Fig. S7. One Fourier layer consists of two parts (as shown in the
orange dash box of Fig. S7). One part contains spatial Fourier transformation, a convolution layer
and inverse Fourier transformation. The other part is a channel-wise linear layer. The output of two
parts are summed together. Since we want to learn a dynamic process and the evaluation position
remain unchanged, we replace the original space coordinates input [5] with the time coordinate. In
total there are 5 Fourier layers and each deals with 20 channels and projects to/from 12 Fourier
modes. Here we follow the original FNO paper to use GeLU [6] as the activation function for every
layer
Linear
GeLU
𝒖𝒖𝒕𝒕
×
𝝀𝝀
Trainable Vectors
ℱ Convolution ℱ -1 ℱ ℱ -1
Linear
Linear
⋯
GeLU
GeLU
𝑢𝑢𝑡𝑡+1
+
+
× ⋯
⋯⋯
⋯ Linear (Channel-wise)
⋯
⋯
⋯⋯
Fourier layer
(a)
𝒖𝒖0
Linear
GeLU
×
𝝀𝝀
Trainable Vectors
× ⋯
⋯
⋯
⋯ ℱ Convolution ℱ -1 ℱ ℱ -1
⋯
⋯
⋯
Linear
Linear
⋯
GeLU
GeLU
⋯
𝒖𝒖𝒕𝒕
+
Linear (Channel-wise)
×
𝑡𝑡
Fourier layer
Trainable Vector
× ⋯
⋯
(b)
Figure S7: Structure of Fourier neural operators (FNOs). a. Structure of the autoregressive FNO (aFNO). b.
structure of continuous operator FNO (FNO)
The difference of the two formulations of FNOs in terms of relative error is shown in Fig. S8.
Although FNO is free of the error accumulation issue, aFNO shows lower relative error in the
extrapolation range. However, compared to PPNN, aFNO has a significant performance gap in
both training set and testing set.
40
100
PPNN-Test FNO-Train
PPNN-Train aFNO-Test
10 1 FNO-Test aFNO-Train
Relative error
10 2
10 3
41
numerical steps, with numerical timestep δt “ 0.01. The reference snapshots are collected every
other 20 numerical steps, beginning from the 160th step, to form the training trajectory. Namely,
the learning step is set as ∆t “ 80δt. In the training set, there are 45 trajectories, each of which
includes 73 snapshots covering the total time of T “ 73∆t. The same FVM discretization scheme
is used to construct the PDE-preserving portion, which is defined on a coarse mesh of size 100 ˆ 25
and operated by several sub-iterations with a step size of ∆t1 “ 2δt.
Fig. S9 presents the relative error of Pi-DeepONet compared to the DeepONet. The dashed lines
represent performance on the training set, while the solid lines illustrate the relative error on the
testing set. The performance of Pi-DeepONet varies significantly when trained with different loss
weighting parameters; in most cases, it underperforms compared to the DeepONet. This is probably
due to the fact that original Pi-DeepONet and DeepONet do not share the same DNN structure.
To isolate the influence of the physics-informed loss, we contrast the results of Pi-DeepONet 1 „ 3
with Pi-DeepONet*, which has an identical DNN structure but is purely data-driven as the PDE
weighting terms wic , wbc , and weq are set to zero. We find that even with an identical DNN structure,
an improvement in prediction accuracy through physics-informed loss is not guaranteed. Instead,
any potential improvement depends heavily on the proper tuning of weighting hyperparameters.
Furthermore, it’s worth highlighting that most of the Pi-DeepONets display a larger prediction error
at the start than at the end of trajectories. This is likely because the initial condition is sampled
from a very high-dimensional space, posing a learning challenge for Pi-DeepONet. However, as
more time steps are rolled out, the velocity decays, and the overall dimensions of the velocity field
decrease, making it easier for Pi-DeepONet to predict.
42
0.35
Pi-DeepONet1
Pi-DeepONet2
0.30 Pi-DeepONet3
Pi-DeepONet*
DeepONet
0.25
t
Relative error
0.20
0.15
0.10
0.05
Figure S9: Relative error ϵt of different variants of (Pi-)DeepONet in the Burgers’ equation case, averaged over
100 randomly selected parameters λ from the training set (dashed lines) and 100 from the testing set, including
unseen initial conditions and physical parameters, i.e. viscosity (solid lines), respectively. Pi-DeepONet 1(blue),
Pi-DeepONet 2 (orange), Pi-DeepONet 3 (green), Pi-DeepONet* (red), DeepONet (purple).
43
FNO2d PINO PINO-L2
FNO2d-Auto PINO-L1 PINO-L3
0.14 FNO3d 0.20
0.12
0.10 0.15 0.15
t
t
Relative error
Relative error
0.08 0.14
Figure S10: Relative error ϵt of different variants of FNO/PINOs in the Burgers’ equation case. Left panel shows
the averaged relative error of FNO (blue), aFNO (orange), FNO3d (green), PINO (red), PINO-L (purple), PINO-
L2 (brown), and PINO-L3 (pink) on 100 randomly selected parameters λ from the training set (dashed lines) and
100 from the testing set (including unseen initial conditions and physical parameters (i.e. viscosity), solid lines),
respectively. The right panel shows the averaged relative error ϵt tested with one specific viscosity (ν “ 0.02) from
the training set.
marginal here. However, unlike Pi-DeepONet, we didn’t observe a large variation in prediction
accuracy when altering the weighting terms of loss function components, suggesting that PINO is
relatively less sensitive to the weighting hyperparameters.Moreover, we observe that the FNO2d
structures show much better performance than the FNO3d structures (used in PINO). We further
tested PINO’s performance under a specific physical parameter (i.e., viscosity ν “ 0.02), as shown in
the right panel of Fig. S10. This is the scenario studied in the original PINO paper. The brown curve
represents the relative error of PINO-L4, trained exclusively with viscosity ν “ 0.02. In comparison
with other PINOs trained with a variety of viscosities, PINO-L4 exhibits a lower prediction error,
suggesting that PINO struggles to generalize across different physical parameters.
6 Extra contours
6.1 Contour of the “unknown” magnetic field added to the Naiver-Stokes equa-
tion
Figure S11: The magnitude distribution in the computational domain of the “unknown” source term added to the
Naiver-Stokes equation
44
6.2 Extra comparison with black-box baselines
This section provides more contour comparison among PPNN, black-box ConvResNet and label
data on the testing (unseen) dataset, for the reaction-diffusion case (Fig. S12) and viscous Burgers’
case (Fig. S13), respectively.
0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T
Baseline
Ground truth PPNN (Ours)
0.60 0.62 0.582 0.586 0.554 0.558 0.518 0.521 0.58 0.6 0.562 0.566 0.536 0.54 0.503 0.505 0.58 0.6 0.556 0.56 0.526 0.528 0.485 0.487
𝜆𝜆2 𝜆𝜆3 𝜆𝜆4
Baseline
Ground truth PPNN (Ours)
0.55 0.565 0.534 0.538 0.510 0.512 0.478 0.480 0. 605 0. 615 0. 582 0. 586 0. 557 0. 559 0. 521 0. 523 0.57 0.58 0.5525 0.5575 0.532 0.538 0.506 0.510
𝜆𝜆5 𝜆𝜆6 𝜆𝜆7
Baseline
Ground truth PPNN (Ours)
0.59 0.60 0.564 0.57 0.538 0.540 0.501 0.504 0.60 0.62 0.585 0.590 0.562 0.566 0.532 0.534 0.60 0.62 0.584 0.588 0.556 0.560 0.522 0.524
𝜆𝜆8 𝜆𝜆9 𝜆𝜆10
Figure S12: Predicted solution snapshots of reactant u for the reaction-diffusion (RD) equations, obtained by
black-box ConvResNet (baseline), and PPNN (ours); compared against ground truth. where λ2 ´ λ10 are testing
parameters, which are not in the training set.
45
Baseline 0.01T
Ground truth PPNN (Ours) 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T 0.01T 0.6T 1.2T 2T
0.2 0.8 0.5 0.6 0.5 0.65 0.55 0.6 0.2 0.8 0.45 0.55 0.45 0.5 0.46 0.50 0.2 0.8 0.5 0.6 0.55 0.60 0.56 0.58
𝜆𝜆2 𝜆𝜆3 𝜆𝜆4
Baseline
Ground truth PPNN (Ours)
0.4 1.0 0.7 0.9 0.7 0.8 0.75 0.80 0.4 1.0 0.6 0.8 0.65 0.75 0.65 0.75 0.5 1.0 0.6 0.8 0.7 0.8 0.70 0.75
𝜆𝜆5 𝜆𝜆6 𝜆𝜆7
Baseline
Ground truth PPNN (Ours)
0.6 1.0 0.7 0.8 0.72 0.78 0.74 0.76 0.6 1.0 0.75 0.80 0.78 0.80 0.78 0.80 0.4 1.0 0.6 0.9 0.7 0.9 0.7 0.8
Figure S13: Predicted solution snapshots of reactant u for the Burgers’ equations, obtained by black-box ConvRes-
Net (baseline), and PPNN (ours); compared against ground truth. where λ2 ´ λ10 are testing parameters, which are
not in the training set.
range in time (t ě T ), PPNN shows its great generalizability over the other methods though some
noise could be observed. The advantage of PPNN is even more obvious with testing (unseen)
parameters. Most methods fail to give an acceptable prediction even at the first time step. While
FNO gives comparable results at the first few steps, with time matching forward, PPNN show much
less discrepancy from the ground truth. Besides, PPNN also keeps a very consistent performance
with testing and training parameters, which indicates a great generalizability among parameters.
46
Ground truth PPNN FNO DeepONet DeepONet-L PINN
0.01T
0.4T
0.8T
1.2T
1.6T
2T
Figure S14: Predicted solution for the velocity magnitude ∥u∥2 of the Burgers’ equations at different time steps and
training parameters λ9 . Each row represents the predicted solution at a certain time step, while each column shows
the results predicted by (from left to right) numerical solver (ground truth), PPNN, FNO, DeepONet, DeepONet-L,
and PINN respectively.
47
Ground truth PPNN FNO DeepONet DeepONet-L PINN
0.01T
0.4T
0.8T
1.2T
1.6T
2T
Figure S15: Predicted solution for the velocity magnitude ∥u∥2 of the Burgers’ equations at different time steps and
training parameters λ10 . Each row represents the predicted solution at a certain time step, while each column shows
the results predicted by (from left to right) numerical solver (ground truth), PPNN, FNO, DeepONet, DeepONet-L,
and PINN respectively.
48
Ground truth PPNN FNO DeepONet DeepONet-L PINN
0.01T
0.4T
0.8T
1.2T
1.6T
2T
Figure S16: Predicted solution for the velocity magnitude ∥u∥2 of the Burgers’ equations at different time steps and
unseen parameters λ11 . Each row represents the predicted solution at a certain time step, while each column shows
the results predicted by (from left to right) numerical solver (ground truth), PPNN, FNO, DeepONet, DeepONet-L,
and PINN respectively.
49
Ground truth PPNN FNO DeepONet DeepONet-L PINN
0.01T
0.4T
0.8T
1.2T
1.6T
2T
Figure S17: Predicted solution for the velocity magnitude ∥u∥2 of the Burgers’ equations at different time steps and
unseen parameters λ12 . Each row represents the predicted solution at a certain time step, while each column shows
the results predicted by (from left to right) numerical solver (ground truth), PPNN, FNO, DeepONet, DeepONet-L,
and PINN respectively.
50
References
[1] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf,
Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit
Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-
performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-
Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32,
pages 8024–8035. Curran Associates, Inc., 2019.
[2] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Improved architectures and training algorithms
for deep operator networks. Journal of Scientific Computing, 92(2):1–42, 2022.
[3] Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and
George Em Karniadakis. A comprehensive and fair comparison of two neural operators (with
practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and Engi-
neering, 393:114778, 2022.
[4] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning
nonlinear operators via deeponet based on the universal approximation theorem of operators.
Nature Machine Intelligence, 3(3):218–229, 2021.
[5] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew
Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential
equations. In International Conference on Learning Representations, 2020.
[6] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint
arXiv:1606.08415, 2016.
[7] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric
partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605,
2021.
[8] Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar
Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning partial
differential equations. arXiv preprint arXiv:2111.03794, 2021.
51