0% found this document useful (0 votes)

62 views17 pages

Journal of Computational Physics: Zichao Long, Yiping Lu, Bin Dong

Uploaded by

Aman Jalan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views17 pages

Journal of Computational Physics: Zichao Long, Yiping Lu, Bin Dong

Uploaded by

Aman Jalan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Journal of Computational Physics 399 (2019) 108925

Contents lists available at ScienceDirect

Journal of Computational Physics

www.elsevier.com/locate/jcp

PDE-Net 2.0: Learning PDEs from data with a

numeric-symbolic hybrid deep network
Zichao Long a , Yiping Lu a , Bin Dong b,c,∗
a
School of Mathematical Sciences, Peking University, Beijing, China
b
Beijing International Center for Mathematical Research, Peking University, Beijing, China
c
Center for Data Science, Peking University, Beijing, China

a r t i c l e i n f o a b s t r a c t

Article history: Partial differential equations (PDEs) are commonly derived based on empirical observations.
Received 29 November 2018 However, recent advances of technology enable us to collect and store massive amount of
Received in revised form 15 August 2019 data, which offers new opportunities for data-driven discovery of PDEs. In this paper, we
Accepted 1 September 2019
propose a new deep neural network, called PDE-Net 2.0, to discover (time-dependent) PDEs
Available online 6 September 2019
from observed dynamic data with minor prior knowledge on the underlying mechanism
Keywords: that drives the dynamics. The design of PDE-Net 2.0 is based on our earlier work [1]
Partial differential equations where the original version of PDE-Net was proposed. PDE-Net 2.0 is a combination of
Dynamic system numerical approximation of differential operators by convolutions and a symbolic multi-
Convolutional neural network layer neural network for model recovery. Comparing with existing approaches, PDE-Net
Symbolic neural network 2.0 has the most ﬂexibility and expressive power by learning both differential operators
and the nonlinear response function of the underlying PDE model. Numerical experiments
show that the PDE-Net 2.0 has the potential to uncover the hidden PDE of the observed
dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy
environment.
© 2019 Elsevier Inc. All rights reserved.

1. Introduction

Differential equations, especially partial differential equations (PDEs), play a prominent role in many disciplines to de-
scribe the governing physical laws underlying a given system of interest. Traditionally, PDEs are derived mathematically or
physically based on some basic principles, e.g. from Schrödinger’s equations in quantum mechanics to molecular dynamic
models, from Boltzmann equations to Navier-Stokes equations, etc. However, the mechanisms behind many complex sys-
tems in modern applications (such as many problems in multiphase ﬂow, neuroscience, ﬁnance, biological science, etc.) are
still generally unclear, and the governing equations of these systems are commonly obtained by empirical formulas [2,3].
With the recent rapid development of sensors, computational power, and data storage in the last decade, huge quantities of
data can now be easily collected, stored and processed. Such vast quantity of data offers new opportunities for data-driven
discovery of (potentially new) physical laws. Then, one may ask the following interesting and intriguing question: can we
learn a PDE model to approximate the observed complex dynamic data?

* Corresponding author.
E-mail address: [email protected] (B. Dong).

https://doi.org/10.1016/j.jcp.2019.108925
0021-9991/© 2019 Elsevier Inc. All rights reserved.
2 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

1.1. Existing work and motivations

Earlier attempts on data-driven discovery of hidden physical laws include [4,5]. Their main idea is to compare numerical
differentiations of the experimental data with analytic derivatives of candidate functions, and apply the symbolic regression
and the evolutionary algorithm to determining the nonlinear dynamic system. When the form of the nonlinear response
function of a PDE is known, except for some scalar parameters, [6] presented a framework to learn these unknown param-
eters by introducing regularity between two consecutive time step using Gaussian process. Later in [7], a PDE constraint
interpolation method was introduced to uncover the unknown parameters of the PDE model. An alternative approach is
known as the sparse identification of nonlinear dynamics (SINDy) [8–13]. The key idea of SINDy is to first construct a
dictionary of simple functions and partial derivatives that are likely to appear in the equations. Then, it takes the advan-
tage of sparsity promoting techniques (e.g. 1 regularization) to select candidates that most accurately represent the data.
In [14], the authors studied the problem of sea surface temperature prediction (SSTP). They assumed that the underlying
physical model was an advection-diffusion equation. They designed a special neural network according to the general solu-
tion of the equation. Comparing with traditional numerical methods, their approach showed improvements in accuracy and
computation efficiency.
Recent work greatly advanced the progress of PDE identification from observed data. However, SINDy requires to build
a sufficiently large dictionary which may lead to high memory load and computation cost, especially when the number of
model variables is large. Furthermore, the existing methods based on SINDy treat spatial and temporal information of the
data separately and does not take full advantage of the temporal dependence of the PDE model. Although the framework
presented by [6,7] is able to learn hidden physical laws using less data than the approach based on SINDy, the explicit
form of the PDEs is assumed to be known except for a few scalar learnable parameters. The approach of [14] is specifically
designed for advection-diffusion equations, and cannot be readily extended to other types of equations. Therefore, extracting
governing equations from data in a less restrictive setting remains a great challenge.
The main objective of this paper is to design a transparent deep neural network to uncover hidden PDE models from
observed complex dynamic data with minor prior knowledge on the mechanisms of the dynamics, and to perform accurate
predictions at the same time. The reason we emphasize on both model recovery and prediction is because: 1) the ability to
conduct accurate long-term prediction is an important indicator of accuracy of the learned PDE model (the more accurate
is the prediction, the more confident we have on the underlying recovered PDE model); 2) the trained neural network
can be readily used in applications and does not need to be re-trained when initial conditions are altered. Our inspiration
comes from the latest development of deep learning techniques in computer vision. An interesting fact is that some popular
networks in computer vision, such as ResNet[15,16], have close relationship with ODEs/PDEs and can be naturally merged
with traditional computational mathematics in various tasks [17–26]. However, existing deep networks designed in deep
learning mostly emphasis on expressive power and prediction accuracy. These networks are not transparent enough to
be able to reveal the underlying PDE models, although they may perfectly fit the observed data and perform accurate
predictions. Therefore, we need to carefully design the network by combining knowledge from deep learning and numerical
PDEs.

1.2. Our approach

The proposed deep neural network is an upgraded version of our original PDE-Net [1]. The main difference is the use
of a symbolic network to approximate the nonlinear response function, which signiﬁcantly relaxes the requirement on
the prior knowledge on the PDEs to be recovered. During training, we no longer need to assume the general type of the
PDE (e.g. convection, diffusion, etc.) is known. Furthermore, due to the lack of prior knowledge on the general type of
the unknown PDE models, more carefully designed constraints on the convolution ﬁlters as well as the parameters of the
symbolic network are introduced. We refer to this upgraded network as PDE-Net 2.0.
Assume that the PDE to be recovered takes the following generic form

U t = F (U , ∇ U , ∇ 2 U , . . .), x ∈ ⊂ R2 , t ∈ [0, T ].

PDE-Net 2.0 is designed as a feed-forward network by discretizing the above PDE using forward Euler in time and finite
difference in space. The forward Euler approximation of temporal derivative makes PDE-Net 2.0 ResNet-like [15,18,21], and
the finite difference is realized by convolutions with trainable kernels (or filters). The nonlinear response function F is
approximated by a symbolic neural network, which shall be referred to as S ymNet. All the parameters of the S ymNet and
the convolution kernels are jointly learned from data. To grant full transparency to the PDE-Net 2.0, proper constraints are
enforced on the S ymNet and the filters. Full details on the architecture and constraints will be presented in Section 2.

1.3. Relation with model reduction

Data-driven discovery of hidden physical laws and model reduction have a lot in common. Both of them concern on
representing observed data using relatively simple models. The main difference is that, model reduction emphasis more on
numerical precision rather than acquiring the analytic form of the model.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 3

It is common practice in model reduction to use a function approximator to express the unknown terms in the reduced
models, such as approximating subgrid stress for large-eddy simulation [27–29] or approximating interatomic forces for
coarse-grained molecular dynamic systems [30,31]. Our work may serve as an alternative approach to model reduction and
help with analyzing the reduced models.

1.4. Novelty

The particular novelties of our approach are that we impose appropriate constraints on the learnable filters and use
a properly designed symbolic neural network to approximate the response function F . Using learnable filters makes the
PDE-Net 2.0 more flexible, and enables more powerful approximation of unknown dynamics and longer time prediction (see
numerical experiments in Section 3 and Section 4). Furthermore, the constraints on the learnable filters and the use of a
deep symbolic neural network enable us to uncover the analytic form of F with minor prior knowledge on the dynamic,
which is the main advantage of PDE-Net 2.0 over the original PDE-Net. In addition, the composite representation by the
symbolic network is more efficient and flexible than SINDy. Therefore, the proposed PDE-Net 2.0 is distinct from the existing
learning based methods to discover PDEs from data.

2. PDE-Net 2.0: architecture, constraints and training

Given a series of measurements of some physical quantities {U (t , x, y ) : t = t 0 , t 1 , · · · , (x, y ) ∈ ⊂ R2 } ⊂ Rd with

d being the number of physical quantities of interest, we want to discover the governing PDEs from the observed data
{U (t , x, y )}. We assume that the observed data are associated with a PDE that takes the following general form:

U t (t , x, y ) = F (U , U x , U y , U xx , U xy , U y y , . . .), (1)

here U (t , ·) : → Rd , F (U , U x , U y , U xx , U xy , U y y , . . .) ∈ Rd , (x, y ) ∈ ⊂ R2 , t ∈ [0, T ]. Our objective is to design a feed-

forward network, called PDE-Net 2.0, to approximate the unknown PDE (1) from its solution samples in the way that: 1)
we are able to reveal the analytic form of the response function F and the differential operators involved; 2) we can con-
duct long-term prediction on the dynamical behavior of the equation for any given initial conditions. There are two main
components of the PDE-Net 2.0 that are combined together in the same network: one is automatic determination on the
differential operators involved in the PDE and their discrete approximations; the other is to approximate the nonlinear re-
sponse function F . In this section, we start with discussions on the overall framework of the PDE-Net 2.0 and then introduce
the details on these two components. Regularization and training strategies will be given near the end of this section.

2.1. Architecture of PDE-Net 2.0

Inspired by the dynamic system perspective of deep neural networks [17–22], we consider forward Euler as the temporal
discretization of the evolution PDE (1), and unroll the discrete dynamics to a feed-forward network. One may consider more
sophisticated temporal discretization which naturally leads to different network architectures [21]. For simplicity, we focus
on forward Euler in this paper.

2.1.1. δt-block:
Let Ũ (t + δt , ·) be the predicted value at time t + δt based on Ũ (t , ·). Then, we design an approximation framework as
follows

k
Ũ (t + δt , ·) ≈ Ũ (t , ·) + δt · S ymNetm ( D 00 Ũ , D 01 Ũ , D 10 Ũ , · · · ). (2)

Here, the operators D i j are convolution operators with the underlying ﬁlters denoted by q i j , i.e. D i j u = qi j u. The oper-
i+ j
ators D 10 , D 01 , D 11 , etc. approximate differential operators, i.e. D i j u ≈ ∂i ju . In particular, the operators D 00 is a certain
∂ x∂ y
averaging operator. The purpose of introducing the average operators in stead of simply using the identity is to improve the
expressive power of the network and enables it to capture more complex dynamics.
Other than the assumption that the observed dynamics is governed by a PDE of the form (1), we assume that the highest
order of the PDE is less than some positive integer. Then, we can assume that F is a function of m variables with known
m. The task of approximating F in (1) is equivalent to a multivariate regression problem. In order to be able to identify
k
the analytic form of F , we use a symbolic neural network denote by S ymNetm to approximate F , where k denotes the
k
depth of the network. Note that, if F is a vector function, we use multiple S ymNetm to approximate the components of F
separately.
Combining the aforementioned approximation of differential operators and the nonlinear response function, we obtain
an approximation framework (2) which will be referred to as a δt-block (see Fig. 1). Details of these two components can
be found later in Section 2.2 and Section 2.3.
4 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Fig. 1. The schematic diagram of a δt-block.

Fig. 2. The schematic diagram of the PDE-Net 2.0.

2.1.2. Multiple δt-blocks:

One δt-block only guarantees the accuracy of one-step dynamics, which does not take error accumulation into considera-
tion. In order to facilitate a long-term prediction, we stack multiple δt-blocks into a deep network, and call this network the
PDE-Net 2.0 (see Fig. 2). The importance of stacking multiple δt-blocks will be demonstrated by our numerical experiments
in Section 3 and 4.
The PDE-Net 2.0 can be easily described as: (1) stacking one δt-block multiple times; (2) sharing parameters in all
δt-blocks. Given a set of observed data {U (t , ·)}, training a PDE-Net 2.0 with n δt-blocks needs to minimize the accumulated
error ||U (t + nδt , ·) − Ũ (t + nδt , ·)||22 , where Ũ (t + nδt , ·) is the output from the PDE-Net 2.0 (i.e. n δt-blocks) with input
U (t , ·), and U (t + nδt , ·) is observed training data.

2.2. Convolutions and differentiations

In the original PDE-Net [1], the learnable filters are properly constrained so that we can easily identify their corre-
spondence to differential operators. PDE-Net 2.0 adopts the same constrains as the original version of the PDE-Net. For
completeness, we shall review the related notions and concepts, and provide more details.
A profound relationship between convolutions and differentiations was presented by [32,33], where the authors discussed
the connection between the order of sum rules of filters and the orders of differential operators. Note that the definition of
convolution we use follows the convention of deep learning, which is defined as

( f q)[l1 , l2 ] := q[k1 , k2 ] f [l1 + k1 , l2 + k2 ].
k1 ,k2

This is essentially correlation instead of convolution in the mathematics convention. Note that if f is of finite size, we use
periodic boundary condition.
The order of sum rules is closely related to the order of vanishing moments in wavelet theory [34,35]. We first recall the
definition of the order of sum rules.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 5

Deﬁnition 2.1 (Order of sum rules). For a ﬁlter q, we say q to have sum rules of order α = (α1 , α2 ), where α ∈ Z2+ , provided
that

kβ q[k] = 0 (3)
k∈Z2

for all β = (β1 , β2 ) ∈ Z2+ with |β| := β1 + β2 < |α | and for all β ∈ Z2+ with |β| = |α | but β = α . If (3) holds for all β ∈ Z2+
with |β| < K except for β = β̄ with certain β̄ ∈ Z2+ and |β̄| = J < K , then we say q to have total sum rules of order
K \{ J + 1}.

In practical implementation, the filters are normally finite and can be understood as matrices. For an N × N filter q (N
being an odd number), assuming the indices of q start from − N −2
1
, (3) can be written in the following simpler form
N −1 N −1

2
2

lβ1 mβ2 q[l, m] = 0.

l=− N −
2
1
m=− N −
2
1

The following proposition from [33] links the orders of sum rules with orders of differential operators.

Proposition 2.1. Let q be a ﬁlter with sum rules of order α ∈ Z2+ . Then for a smooth function F (x) on R2 , we have

1 ∂α
q[k] F (x + εk) = C α F (x) + O (ε ), as ε → 0. (4)
ε|α | ∂ xα
k∈Z2

If, in addition, q has total sum rules of order K \{|α | + 1} for some K > |α |, then
1 ∂α
|α |
q[k] F (x + εk) = C α α F (x) + O (ε K −|α | ), as ε → 0. (5)
ε ∂x
k∈Z2

According to Proposition 2.1, an α th order differential operator can be approximated by the convolution of a filter with α
order of sum rules. Furthermore, according to (5), one can obtain a high order approximation of a given differential operator
if the corresponding filter has an order of total sum rules with K > |α | + k, k 1. For example, consider filter
⎛ ⎞
1 0 −1
q = ⎝ 2 0 −2 ⎠ .
1 0 −1
It has a sum rules of order (1, 0), and a total sum rules of order 3\{2}. Thus, up to a constant and a proper scaling, q
corresponds to a discretization of ∂∂x with second order accuracy.
For an N × N filter q, define the moment matrix of q as

M (q) = (mi , j ) N × N , (6)

where
N −1
1
2
j
mi , j = k1i k2 q[k1 , k2 ], i , j = 0, 1, . . . , N − 1. (7)
i! j!
k1 ,k2 =− N −
2
1

We shall call the (i , j )-element of M (q) the (i , j )-moment of q for simplicity. For any smooth function f : R2 → R, we
apply convolution on the sampled version of f with respect to the ﬁlter q. By Taylor’s expansion, one can easily obtain the
following formula
N −1

2

q[k1 , k2 ] f (x + k1 δ x, y + k2 δ y )
k1 ,k2 =− N −
2
1

N −1

N −1 j
2
∂ i + j f k1i k2 i j
= q[k1 , k2 ] δ x δ y + o(|δ x| N −1 + |δ y | N −1 )
∂ i x∂ j y (x, y ) i ! j !
k1 ,k2 =− N − 1 i , j =0
2

N −1
∂ i + j f
= i j
mi , j δ x δ y · i j + o(|δ x| N −1 + |δ y | N −1 ). (8)
∂ x∂ y (x, y )
i , j =0
6 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Fig. 3. The schematic diagram of S ymNet.

From (8) we can see that ﬁlter q can be designed to approximate any differential operator with prescribed order of accuracy
by imposing constraints on M (q).
For example, if we want to approximate ∂∂ux (up to a constant) by convolution q u where q is a 3 × 3 ﬁlter, we can
consider the following constrains on M (q):
⎛ ⎞ ⎛ ⎞
0 0 0 0 0
⎝1 ⎠ or ⎝ 1 0 ⎠. (9)
0

Here, means no constraint on the corresponding entry. The constraints described by the moment matrix on the left of (9)
guarantee the approximation accuracy is at least ﬁrst order, and the one on the right guarantees an approximation of at
least second order. In particular, when all entries of M (q) are constrained, e.g.
⎛ ⎞
0 0 0
M (q) = ⎝ 1 0 0 ⎠ ,
0 0 0

the corresponding ﬁlter can be uniquely determined. In the PDE-Net 2.0, all ﬁlters are learned subjected to partial constraints
on their associated moment matrices, with at least second order accuracy.

2.3. Design of S ymNet: a symbolic neural network

k
The symbolic neural network S ymNetm of the PDE-Net 2.0 is introduced to approximate the multivariate nonlinear
response function F of (1). Neural networks have recently been proven effective in approximating multivariate functions
in various scenarios [36–40]. For the problem we have in hand, we not only require the network to have good expressive
power, but also good transparency so that the analytic form of F can be readily inferred after training. Our design of
S ymNetm k
is motivated by EQL/EQL÷ proposed by [41,42].
k
The S ymNetm , as illustrated in Fig. 3, is a network that takes an m dimensional vector as input and has k hidden
2
layers. Fig. 3 shows the symbolic neural network with two hidden layers, i.e. S ymNetm , where f is a dyadic operation
unit, e.g. multiplication or division. In this paper, we focus on multiplication, i.e. we take f (a, b) = a × b. Different from
EQL/EQL÷ , each hidden layer of the S ymNetm k
directly takes the outputs from the preceding layer as inputs, rather than
a linear combination of them. Furthermore, it adds one additional variable (i.e. f (·, ·)) at each hidden layer. To better
understand S ymNetm k
, we present an example in Algorithm 1 showing how S ymNet 62 is constructed. In particular, when
b = 0, i = 1, 2, 3,
i

1 0 0 0 0 0 0 0 1 0 0 0 0
W1 = ,W2 =
0 1 0 0 0 0 0 0 0 1 0 0 0

and W 3 = (0, 0, 0, 0, 0, 0, −1, −1), then S ymNet 62 (u , u x , u y , v , v x , v y ) = −uu x − vu y which is the right-hand-side of ∂∂ut of
the Burgers’ equation without viscosity.
k
The S ymNetm can represent all polynomials of variables (x1 , x2 , . . . , xm ) with the total number of multiplications not
k
exceeding k. If needed, one can add more operations to the S ymNetm to increase the capacity of the network.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 7

Algorithm 1 S ymNet 62 .
Input: u , u x , u y , v , v x , v y ∈ R,
(η1 , ξ1 )
= W 1 · (u , u x , u y , v , v x , v y )
+ b1 , W 1 ∈ R2×6 , b1 ∈ R2 ;
f 1 = η1 · ξ1 ;
(η2 , ξ2 )
= W 2 · (u , u x , u y , v , v x , v y , f 1 )
+ b2 , W 2 ∈ R2×7 , b2 ∈ R2 ;
f 2 = η2 · ξ2 ;
Output: F = W 3 · (u , u x , u y , v , v x , v y , f 1 , f 2 )
+ b3 ∈ R, W 3 ∈ R1×8 , b3 ∈ R.

k
Now, we show that S ymNetm is more compact than the dictionaries of SINDy. For that, we ﬁrst introduce some notions.

Deﬁnition 2.2. Deﬁne the set of all polynomials of m variables (x1 , · · · , xm ) with the total number of multiplications not
exceeding k as P k [x1 , · · · , xm ]. Here, the total number of multiplications of P k [x1 , · · · , xm ] is counted as follows:

• For any monomial of degree k, if k ≥ 2, then the number of multiplications of the monomial is counted as k − 1. When
k = 1 or 0, the count is 0.
• For any polynomial P , the total number of multiplications is counted as the sum of the number of multiplications of its
monomials.

m
k k+1
i =1 xi + i =1 xi with k < m are all members of P [x1 , · · · , xm ]. The elements in
k
For example, i =1 xi xi +1 and
P [x1 , · · · , xm ] are of simple forms when k is relatively small. The following proposition shows that S ymNetm
k k
can represent
all polynomials of variables (x1 , x2 , . . . , xm ) with the total number of multiplications not exceeding k. Note that the actual
capacity of S ymNetm k
is larger than P k [x1 , · · · , xm ], i.e. P k [x1 , · · · , xm ] is a subset of the set of functions that S ymNetm
k
can
represent.

Proposition 2.2. For any P ∈ P k [x1 , · · · , xm ], there exists a set of parameters for S ymNetm
k
such that

k
P = S ymNetm (x1 , · · · , xm ).

Proof. We prove this proposition by induction. When k = 1, the conclusion obviously holds. Suppose the conclusion holds
for k. For any polynomial P ∈ P k+1 [x1 , · · · , xm ], we only need to consider the cases when P has a total number of multipli-
cations greater than 1.
We take any monomial of P that has degree greater than 1, which we suppose take the form x1 x2 · A where A is a
monomial of variable (x1 , . . . , xm ). Then, P can be written as P = x1 x2 A + Q . Deﬁne new variable xm+1 = x1 x2 . Then, we
have

P = xm+1 A + Q ∈ P k [x1 , · · · , xm , xm+1 ].

By the induction hypothesis, there exists a set of parameters such that P = S ymNetm
k
+1 (x1 , · · · , xm+1 ).
k+1
We take the linear transform between the input layer and the ﬁrst hidden layer of S ymNetm as

W 1 · (x1 , · · · , xm )
+ b1 = (x1 , x2 )
.

Then, the output of the ﬁrst hidden layer is x1 , x2 , · · · , xm , x1 x2 . If we use it as the input of S ymNetm
k
+1 , we have

k
P (x1 , · · · , xm ) = S ymNetm +1 (x1 , · · · , xm , x1 x2 )
k +1
= S ymNetm (x1 , · · · , xm ),
which concludes the proof. 2

m+SINDy
l
constructs a dictionary that includes all possible monomials up to a certain degree. Observe that there are totally
l
monomials with m variables and a degree not exceeding l(∈ N). Our symbolic network, however, is more compact
k
than SINDy. The following proposition compares the complexity of S ymNetm and SINDy, whose proof is straightforward.

Proposition 2.3. Let P ∈ P k [x1 , · · · , xm ] and suppose P have monomials of degree ≤ l.

• The memory load of S ymNetm k

that approximates P is O (m + k). The number of ﬂops for evaluating S ymNetm
k
is O (k(m + k)).
m+l
• Constructing a dictionary with all possible polynomials of degree l requires a memory load of l , and evaluation of a linear
m+l
combination of dictionary members requires O ( l ) ﬂops.
8 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

We use the following example to show the advantage of S ymNet over SINDy. Consider two variables u , v and all of their
derivatives of order ≤ 2:

(u , v , u x , u y , · · · , v y y ) ∈ R12 .
Suppose the polynomial to be approximated is P = −uu x − vu y + u xx . For k = l = 3, the size of the dictionary of SINDy is
15
3
= 455 and the computation of linear combination of the elements requires 909 flops. The memory load of S ymNet 12
3
,
however, is 15 and an evaluation of the network requires 180 flops. Therefore, S ymNet can significantly reduce memory
k
load and computation cost when input data is large. Note that when k is large and l small, S ymNetm is worse than SINDy.
However, for system identification problems, we normally wish to obtain a compact representation (i.e. smaller k). Thus,
S ymNet takes full advantage of this prior knowledge and can significantly save on memory and computation cost which is
crucial in the training of the PDE-Net 2.0.

2.4. Loss function and regularization

We adopt the following loss function to train the proposed PDE-Net 2.0:

L = L data + λ1 L moment + λ2 L S ymNet ,

where the hyper-parameters λ1 and λ2 are chosen as λ1 = 0.001 and λ2 = 0.005. Now, we present details on each of the
term of the loss function and introduce pseudo-upwind as an additional constraint on PDE-Net 2.0.

2.4.1. Data approximation L data

Consider the data set {U j (t i , ·) : 1 ≤ i ≤ n, 1 ≤ j ≤ N }, where n is the number of δt-blocks and N is the total number
of samples. The index j indicates the j-th solution path with a certain initial condition of the unknown dynamics. Note
that one can split a long solution path into multiple shorter ones. We would like to train the PDE-Net 2.0 with n δt-blocks.
For a given n ≥ 1, every pair of the data {U j (t 0 , ·), U j (t i , ·)}, for each j and i ≤ n, is a training sample, where U j (t 0 , ·) is
the input and U j (t i , ·) is the label that we need to match with the output from the network. For that, we deﬁne the data
approximation term L data as:

1
n N
L data = li j /δt 2 , where li j = ||U j (t i , ·) − Ũ j (t i , ·)||22 ,
n
i =1 j =1

where Ũ j (t i , ·) is the output of the PDE-Net 2.0 with Ũ j (t 0 , ·) = U j (t 0 , ·) as the input.

2.4.2. Regularization: L moment and L S ymNet

For a given threshold s, deﬁned Huber’s loss function 1s as
s
|x| − if |x| > s
1s (x) = 1 2
2
2s
x else.

Then, we deﬁne L moment as

L moment = 1s ( M (qi j )[i 1 , j 1 ]),
i, j i1 , j1

where q i j are the filters of PDE-Net 2.0 and M (q) is the moment matrix of q. We use this loss function to regularize the
learnable filters to reduce overfitting. In our numerical experiments, we will use s = 0.01.
Given 1s as defined above, we use it to enforce sparsity on the parameters of S ymNet. This will help to reduce overfitting
and enable more stable prediction. The loss function L S ymNet is defined as

L S ymNet = 1s ( p ).
p ∈parameters of S ymNet

We set s = 0.001 in our numerical experiments.

2.4.3. Pseudo-upwind
In numerical PDEs, to ensure stability of a numerical scheme, we need to design conservation schemes or use upwind
schemes [43–45]. This is also important for PDE-Net 2.0 during inferencing. However, the challenge we face is that we do
not know apriori the form or the type of the PDE. Therefore, we introduce a method called pseudo-upwind to help with
maintaining stability of the PDE-Net 2.0.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 9

Given a 2D filter q, define the flipping operators flipx (q) and flip y (q) as

flipx (q) [k1 , k2 ] = −q[−k1 , k2 ], k1 , k2 = −( N − 1)/2, · · · , ( N − 1)/2

and

ﬂip y (q) [k1 , k2 ] = −q[k1 , −k2 ], k1 , k2 = −( N − 1)/2, · · · , ( N − 1)/2.

In each δt-block of the PDE-Net 2.0, before we apply convolution with a filter, we first use S ymNet to determine whether
we should use the filter or flip it first. We use the following univariate PDE as an example to demonstrate our idea. Given
PDE ut = F (u , · · · ), suppose the input of a δt-block is u. The algorithm of pseudo-upwind is described by the following
Algorithm 2.

Algorithm 2 Pseudo-upwind in δt-block.

Input: u
u 01 = q01 u , u 10 = q10 u
F̃ 0 = S ymNet 6k (u , u 01 , u 10 , q02 u , q11 u , q20 u )
if ∂∂uF̃ 0 > 0 then
01
u x = u 01
else
u x = ﬂipx (q01 ) u
end if
if ∂∂uF̃ 0 > 0 then
10
u y = u 10
else
u y = ﬂip y (q10 ) u
end if
F̃ = S ymNet 6k (u , u x , u y , q02 u , q11 u , q20 u )
Return: u + δt F̃

Remark 2.1. Note that the algorithm does not exactly enforce upwind in general. This is why we call it pseudo-upwind. We
further note that:

• Given a PDE of the form ut = G (u )u x + H (u )u y + λ(u xx + u y y ), we can use G (u ) and H (u ) to determine whether we
should ﬂip a ﬁlter or not.
• For a vector PDE, such as U = (u , v )
, we can use, for instance, ∂ F̃ 0 ∂ F̃ 0
∂ u 01 , ∂ v 10 to determine how we should approximate u x
and u y in the δt-block [44].

2.5. Initialization and training

In the PDE-Net 2.0, parameters can be divided into three groups: 1) moment matrices to generate convolution kernels;
2) the parameters of S ymNet; and 3) hyper-parameters, such as the number of filters, the size of filters, the number of
δt-Blocks and number of hidden layers of S ymNet, regularization weights λ1 , λ2 , λ3 , etc. The parameters of the S ymNet
are shared across the computation domain , and are initialized by random sampling from a Gaussian distribution. For
the filters, we initialize D 01 , D 10 with second order pseudo-upwind scheme and central difference for all other filters. For
example, if the size of the filters were set to be 5 × 5, then the initial values of the convolution kernels q01 , q02 ∈ R5×5
are
⎛ ⎞ ⎛ ⎞
0 0
⎜ ⎟ ⎜ ⎟
q01 = ⎝ 0 0 −3 4 −1 ⎠ , q02 = ⎝ 0 1 −2 1 0 ⎠ .
0 0

We use layer-wise training to train the PDE-Net 2.0. We start with training the PDE-Net 2.0 on the first δ t-block with a
batch of data, and then use the results of the first δ t-block as the initialization and restart training on the first two δ t-blocks
with another batch. Repeat this procedure until we complete all n δ t-blocks. Note that all the parameters in each of the
δ t-block are shared across layers. In addition, we add a warm-up step before the training of the first δ t-block by fixing
filters and setting regularization term to be 0 (i.e. λ1 = λ2 = 0). The warm-up step is to obtain a good initial guess of the
parameters of S ymNet.
To demonstrate the necessity of having learnable filters, we will compare the PDE-Net 2.0 containing learnable filters
with the PDE-Net 2.0 having fixed filters. To differentiate the two cases, we shall call the PDE-Net 2.0 with fixed filters the
10 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Table 1
PDE model identiﬁcation.

Correct PDE ut = −uu x − vu y + 0.05(u xx + u y y )

v t = −uv x − v v y + 0.05( v xx + v y y )

Frozen-PDE-Net 2.0 ut = −0.906uu x − 0.901vu y + 0.033u xx + 0.037u y y

v t = −0.907v v y − 0.902uv x + 0.039v xx + 0.032v y y

PDE-Net 2.0 ut = −0.979uu x − 0.973u y v + 0.052u xx + 0.051u y y

v t = −0.973uv x − 0.977v v y + 0.053v xx + 0.051v y y

“Frozen-PDE-Net 2.0”. Note that for Frozen-PDE-Net 2.0, the filters are fixed to be the initial values we choose to train the
regular PDE-Net 2.0. This is a natural choice since when we know apriori that the PDE is Burgers’ equation, it would be a
stable finite difference scheme. However, intuitively speaking, freezing any finite difference approximations of the differential
operators during training of PDE-Net 2.0 is not ideal, because you cannot possibly know which numerical scheme is best to
use without knowing the form of the PDE. Therefore, for inverse problem, it is better to learn both the PDE model and the
discretization of the PDE model simultaneously. This assertion is supported by our empirical comparisons between frozen
and regular PDE-Net 2.0 in Tables 1 and 3.

3. Numerical studies: Burgers’ equation

Burgers’ equation is a fundamental partial differential equation in many areas such as fluid mechanics and traffic flow
modeling. It has a lot in common with the Navier-Stokes equation, e.g. the same type of advective nonlinearity and the
presence of viscosity.

3.1. Simulated data, training and testing

In this section we consider a 2-dimensional Burger’s equation with periodic boundary condition on = [0, 2π ] × [0, 2π ],
∂U
∂t = −U · ∇ U + ν U , U = (u , v )

(10)
U |t =0 = U 0 (x, y ),

with (t , x, y ) ∈ [0, 4] × , where ν = 0.05.

The training data is generated by a ﬁnite difference scheme on a 128 × 128 mesh and then restricted to a 32 × 32 mesh.
1
The temporal discretization is 2nd order Runge-Kutta with time step δt = 1600 , the spatial discretization uses a 2nd order
upwind scheme for ∇ and the central difference scheme for . The initial value u 0 (x, y ), v 0 (x, y ) takes the following form,

2w 0 (x, y )
w (x, y ) = +c (11)
maxx, y | w 0 |

where w 0 (x, y ) = |k|,|l|≤4 λk,l cos(kx + ly ) + γk,l sin(kx + ly ), λk,l , γk,l ∼ N (0, 1), c ∼ U (−2, 2). Here, N (0, 1), U (−2, 2) repre-
sents the standard normal distribution and uniform distribution on [−2, 2] respectively. We also add noise to the generated
data:

U (t , x, y ) = U (t , x, y ) + 0.001 × M W (12)

where M = maxx, y ,t {U (t , x, y )}, W ∼ N (0, 1).

5
Suppose we know a priori that the order of the underlying PDE is no more than 2, we can use two S ymNet 12 to
approximate the right-hand-side nonlinear response function of (10) component-wise. Let U = (u , v )
. We denote the two
5
S ymNet 12 as Net u and Net v respectively. Then, each δt-block of the PDE-Net 2.0 can be written as

ũ (t i +1 , ·) = ũ (t i , ·) + δt · Net u ( D 00 ũ , D 01 ũ , · · · , D 20 ũ , D 00 ṽ , · · · , D 20 ṽ )

ṽ (t i +1 , ·) = ṽ (t i , ·) + δt · Net v ( D 00 ũ , D 01 ũ , · · · , D 20 ũ , D 00 ṽ , · · · , D 20 ṽ )

where { D i j : 0 ≤ i + j ≤ 2} are convolution operators.

During training and testing, the data is generated on-the-fly. The size of the filters that will be used is 5 × 5. The total
number of parameters in Net u and Net v is 336, and the number of trainable parameters in moment matrices is 105 for 5 × 5
filters (6 × 5 × 5 − 45 (constraint on moment)). During training, we use BFGS, instead of SGD, to optimize the parameters.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 11

Fig. 4. Burgers’ equation: Prediction errors of the PDE-Net 2.0(orange) and Frozen-PDE-Net 2.0(blue) with 5 × 5 ﬁlters. In each plot, the horizontal axis
indicates the time of prediction in the interval (0, 400 × δt ] = (0, 4], and the vertical axis shows the relatively errors. The banded curves indicate the
25%-100% percentile of the relative errors among 1000 test samples. The darker regions indicate the 25%-75% percentile of the relative errors, which shows
that PDE-Net 2.0 can predict very well in most cases. (For interpretation of the colors in the ﬁgure(s), the reader is referred to the web version of this
article.)

Fig. 5. Burgers’ equation: The ﬁrst row shows the images of the true dynamics. The last two rows show the images of the predicted dynamics using the
Frozen-PDE-Net 2.0 (the second row) and PDE-Net 2.0 with 9 δt-blocks (the third row). Time step δt = 0.01.

We use 28 data samples per batch to train each δt-block and we only construct the PDE-Net 2.0 up to 9 layers, which
requires totally 420 data samples during the whole training procedure.

3.2. Results and discussions

We first demonstrate the ability of the trained PDE-Net 2.0 to recover the analytic form of the unknown PDE model. We
use the symbolic math tool in python to obtain the analytic form of S ymNet. Results are summarized in Table 1. As one
can see from Table 1 that we can recover the terms of the Burgers’ equation with good accuracy, and using learnable filters
helps with the identification of the PDE model. Furthermore, the terms that are not included in the Burgers’ equation all
have relatively small weights in the S ymNet (see Fig. 7).
We also demonstrate the ability of the trained PDE-Net 2.0 in prediction, i.e. the ability to generalize. After the PDE-Net
2.0 with n δt-blocks (1 ≤ n ≤ 9) is trained, we randomly generate 1000 initial guesses based on (11) and (12), feed them to
the PDE-Net 2.0, and measure the relative error between the predicted dynamics (i.e. the output of the PDE-Net 2.0) and
the actual dynamics (obtained by solving (10) using high precision numerical scheme). The relative error between the true
Ũ −U 22
data U and the predicted data Ũ is defined as = U −Ū 22
, where Ū is the spatial average of U . The error plots are shown
in Fig. 4. Some of the images of the predicted dynamics are presented in Fig. 5 and the errors maps are presented in Fig. 6.
As we can see that, even trained with noisy data, the PDE-Net 2.0 is able to perform long-term prediction. Having multiple
δt-blocks indeed improves predict accuracy. Furthermore, PDE-Net 2.0 performs significantly better than Frozen-PDE-Net
2.0. This suggests that when we do not know the PDE, we cannot possibly know how to properly discretize it. Therefore,
for inverse problems, it is better to learn both the PDE model and its discretization simultaneously.
12 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Fig. 6. Burgers’ equation: The error maps of the Frozen-PDE-Net 2.0 (the ﬁrst row) and PDE-Net 2.0 (second row) having 9 δt-blocks. Time step δt = 0.01.

Fig. 7. Burgers’ equation: The coeﬃcients of the remainder terms for the u-component (left) and v-component (right) with the sparsity constraint on the
S ymNet (orange) and without the sparsity constraint (blue). The bands for both cases are computed based on 20 independent training.

Fig. 8. Burgers’ equation: Prediction errors of the PDE-Net 2.0 with (orange) and without (blue) sparsity constraint on the S ymNet. In each plot, the
horizontal axis indicates the time of prediction in the interval (0, 400 × δt ] = (0, 4], and the vertical axis shows the relatively errors. The banded curves
indicate the 25%-100% percentile of the relative errors among 1000 test samples.

3.3. Importance of L S ymNet and pseudo-upwind

This subsection demonstrates the importance of enforcing sparsity on the S ymNet and using pseudo-upwind. As we can
see from Fig. 7 that having sparsity constraints on the S ymNet helps with suppressing the weights on the terms that do
not exist in the Burgers’ equation. Furthermore, Fig. 8 and Fig. 9 show that having sparsity constraint on the S ymNet or
using pseudo-upwind can signiﬁcantly reduce prediction errors.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 13

Fig. 9. Burgers’ equation: Prediction errors of the PDE-Net 2.0 with (orange) and without (blue) pseudo-upwind in each δt-block. In each plot, the horizontal
axis indicates the time of prediction in the interval (0, 400 × δt ] = (0, 4], and the vertical axis shows the relatively errors. The banded curves indicate the
25%-100% percentile of the relative errors among 1000 test samples.

Table 2
PDE model identiﬁcation. Note that the largest term of the
remainders (i.e. the ones that are not included in the table)
is 8e-4u y (for Frozen-PDE-Net 2.0) and 6e-5u (for PDE-Net
2.0).

Correct PDE ut = 0.1(u xx + u y y )

Frozen-PDE-Net 2.0 ut = 0.103u xx + 0.103u y y
PDE-Net 2.0 ut = 0.999u xx + 0.998u y y

4. Numerical studies: diffusion equation

Diffusion phenomenon has been studied in many applications in physics e.g. the collective motion of micro-particles in
materials due to random movement of each particle, or modeling the distribution of temperature in a given region over
time.

4.1. Simulated data, training and testing

Consider the 2-dimensional heat equation with periodic boundary condition on = [0, 2π ] × [0, 2π ]

∂u
∂t = c u , (t , x, y ) ∈ [0, 1] × ,
(13)
u |t =0 = u 0 (x, y ),
1
where c = 0.1. The training data of the heat equation is generated by 2nd order Runge-Kutta in time with δt = 1600 , and
central difference scheme in space on a 128 × 128 mesh. We then restrict the data to a 32 × 32 mesh. The initial value
u 0 (x, y ) is also generated from (11).

4.2. Results and discussions

The demonstration on the ability of the trained PDE-Net 2.0 to identify the PDE model is given in Table 2. As one can
see from Table 2 that we can recover the terms of the heat equation with good accuracy. Furthermore, all the terms that
are not included in the heat equation have much smaller weights in the S ymNet.
We also demonstrate the ability of the trained PDE-Net 2.0 in prediction. The testing method is exactly the same as the
method described in Section 3. Comparisons between PDE-Net 2.0 and Frozen-PDE-Net 2.0 are shown in Fig. 10, where we
can clearly see the advantage of learning the ﬁlters. Visualization of the predicted dynamics is given in Fig. 11. All these
results show that the learned PDE-Net 2.0 performs well in prediction.

5. Numerical studies: convection diffusion equation with a reactive source

Convection diffusion systems are mathematical models which correspond to the transferring of some physical quantities
such as energy or materials due to diffusion and convection. Speciﬁcally, a convection diffusion system with a reactive source
can be used to model a large range of chemical systems in which the transferring of materials competes with productions
of materials induced by several chemical reactions.
14 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Fig. 10. Diffusion equation: Prediction errors of the PDE-Net 2.0 (orange) and Frozen-PDE-Net 2.0 (blue). In each plot, the horizontal axis indicates the time
of prediction in the interval (0, 150 × δt ] = (0, 1.5], and the vertical axis shows the relative errors. The banded curves indicate the 25%-100% percentile of
the relative errors among 1000 test samples.

Fig. 11. Diffusion equation: The ﬁrst row shows the images of the true dynamics. The second row shows the predicted dynamics using the Frozen-PDE-Net
2.0 with 9 δt-blocks. The third row shows the predicted dynamics using the PDE-Net 2.0 with 9 δt-blocks. Here, δt = 0.01.

5.1. Simulated data, training and testing

Consider a 2-dimensional convection diffusion equation with a reactive source and the periodic boundary condition on
= [0, 2π ] × [0, 2π ]:

ut = −uu x − vu y + ν u + λ( A )u − ω( A ) v ,
v t = −uv x − v v y + ν v + ω( A )u + λ( A ) v , (14)
2 2 2 2 2
A = u + v , ω = −β A , λ = 1 − A ,

where (t , x, y ) ∈ [0, 1.5] × , and ν = 0.1, β = 1. Training data is generated the same way as what we did for Burgers’
equation in Section 3. A 2nd order Runge-Kutta with time step δt = 1/10000 is adopted for temporal discretization. We
choose a 2nd order upwind scheme for the convection terms and the central difference scheme for on a 128 × 128 mesh.
We then restrict the data to a 32 × 32 mesh. Noise is added the same way as the Burgers’ equation. The initial values
u 0 (x, y ), v 0 (x, y ) are also generated from (11).

5.2. Results and discussions

The capability of the trained PDE-Net 2.0 to identify the underlying PDE model is demonstrated in Table 3. As one can
see that we can recover the terms of the reaction convection diffusion equation with good accuracy. Furthermore, all the
terms that are not included in this equation have relatively small weights in the S ymNet.
We also demonstrate the ability of the trained PDE-Net 2.0 in prediction. The testing method is exactly the same as
the method described in Section 3. Comparisons between PDE-Net 2.0 and Frozen-PDE-Net 2.0 are shown in Fig. 12. Vi-
sualization of the predicted dynamics and errors maps are given in Fig. 13 and Fig. 14. Similar to what we observed in
Section 3, we can clearly see the beneﬁt from learning discretizations. PDE-Net 2.0 obtains more accurate estimations of the
coeﬃcients for the nonlinear convection terms (i.e. the term −uu x − vu y in Table 3) and makes more accurate predictions
(Fig. 12) than Frozen-PDE-Net 2.0.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 15

Table 3
PDE model identiﬁcation.

Correct PDE ut = −uu x − vu y + 0.1u + ( v − u )(u 2 + v 2 ) + u

v t = −uv x − v v y + 0.1 v − ( v + u )(u 2 + v 2 ) + v

Frozen-PDE-Net 2.0 ut = −0.86uu x − 0.90vu y + 0.09u xx + 0.09u y y

+1.02u 2 v − 1.02u 3 − 1.01uv 2 + 1.01u + 0.99v 3
v t = −0.87uv x − 0.85v v y + 0.09v xx + 0.09v y y
+1.04u 2 v − 1.02uv 2 − 1.01v 3 + 0.99v − 0.99u 3

PDE-Net 2.0 ut = −0.98vu y − 0.93uu x + 0.10u xx + 0.10u y y

−1.05uv 2 + 0.99v 3 − 0.98u 3 + 0.98u + 0.97u 2 v
v t = −0.99uv x − 0.96v v y + 0.10v y y + 0.10v xx
−1.04u 2 v − 1.02v 2 − 1.02uv 2 + 1.01v − 1.00u 3

Fig. 12. Reaction convection diffusion: Prediction errors of the PDE-Net 2.0 (orange) and Frozen-PDE-Net 2.0 (blue). In each plot, the horizontal axis indicates
the time of prediction in the interval (0, 200 × δt ] = (0, 2], and the vertical axis shows the relative errors. The banded curves indicate the 25%-100%
percentile of the relative errors among 1000 test samples.

Fig. 13. Reaction convection diffusion: The ﬁrst row shows the images of the true dynamics. The second row shows the predicted dynamics using the
Frozen-PDE-Net 2.0 with 24 δt-blocks. The third row shows the predicted dynamics using the PDE-Net 2.0 with 24 δt-blocks. Here, δt = 0.01.

6. Conclusions and future work

In this paper, we proposed a numeric-symbolic hybrid deep network, called PDE-Net 2.0, for PDE model recovery from
observed dynamic data. PDE-Net 2.0 is able to recover the analytic form of the PDE model with minor assumptions on the
mechanisms of the observed dynamics. For example, it is able to recover the analytic form of Burgers’ equation with good
conﬁdence without any prior knowledge on the type of the equation. Therefore, PDE-Net 2.0 has the potential to uncover
potentially new PDEs from observed data. Furthermore, after training, the network can perform accurate long-term pre-
diction without re-training for new initial conditions. The limitations and possible future extensions of the current version
of PDE-Net 2.0 is twofold: 1) having only addition and multiplication in the S ymNet may still be too restrictive, and one
may include division as an additional operation in S ymNet to further improve its expressive power; 2) using forward Euler
as temporal discretization is the most straightforward treatment, while a more sophisticated temporal scheme may help
with the model recovery and prediction. Both of these worth further exploration. Furthermore, we would like to apply the
16 Z. Long et al. / Journal of Computational Physics 399 (2019) 108925

Fig. 14. Reaction convection diffusion: The error maps of the Frozen-PDE-Net 2.0 (the ﬁrst row) and PDE-Net 2.0 (second row) having 24 δt-blocks. Time
step δt = 0.01.

network to real biological dynamic data. We will further investigate the reliability of the network and explore the possibility
to uncover new dynamical principles that have meaningful scientiﬁc explanations.

Declaration of competing interest

The authors declare that they have no known competing ﬁnancial interests or personal relationships that could have
appeared to inﬂuence the work reported in this paper.

Acknowledgements

Zichao Long is supported by The Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking
University. Yiping Lu is supported by the Elite Undergraduate Training Program of the School of Mathematical Sciences at
Peking University. Bin Dong is supported in part by Beijing Natural Science Foundation (No. Z180001) and Beijing Academy
of Artiﬁcial Intelligence (BAAI).

References

[1] Z. Long, Y. Lu, X. Ma, B. Dong, Pde-net: learning pdes from data, in: International Conference on Machine Learning, 2018, pp. 3214–3222.
[2] C.E. Brennen, C.E. Brennen, Fundamentals of Multiphase Flow, Cambridge University Press, 2005.
[3] M. Efendiev, Mathematical modeling of mitochondrial swelling.
[4] J. Bongard, H. Lipson, Automated reverse engineering of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA 104 (24) (2007) 9943–9948.
[5] M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data, Science 324 (5923) (2009) 81–85.
[6] M. Raissi, G.E. Karniadakis, Hidden physics models: machine learning of nonlinear partial differential equations, J. Comput. Phys. 357 (2018) 125–141.
[7] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics informed deep learning (part II): data-driven discovery of nonlinear partial differential equations, arXiv
preprint, arXiv:1711.10566.
[8] S.L. Brunton, J.L. Proctor, J.N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad.
Sci. USA (2016) 201517384.
[9] H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proc. R. Soc. A 473 (2197) (2017) 20160446.
[10] S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations, Sci. Adv. 3 (4) (2017) e1602614.
[11] H. Chang, D. Zhang, Identification of physical processes via combined data-driven and data-assimilation methods, arXiv preprint, arXiv:1810.11977.
[12] H. Schaeffer, G. Tran, R. Ward, L. Zhang, Extracting structured dynamical systems using sparse optimization with very few samples, arXiv preprint,
arXiv:1805.04158.
[13] Z. Wu, R. Zhang, Learning physics by data for the motion of a sphere falling in a non-Newtonian fluid, Commun. Nonlinear Sci. Numer. Simul. 67
(2019) 577–593.
[14] E. de Bezenac, A. Pajot, P. Gallinari, Deep learning for physical processes: incorporating prior scientific knowledge, arXiv preprint, arXiv:1711.07970.
[15] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 770–778.
[16] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: European Conference on Computer Vision, Springer, 2016, pp. 630–645.
[17] Y. Chen, W. Yu, T. Pock, On learning optimized reaction diffusion processes for effective image restoration, in: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2015, pp. 5261–5269.
[18] E. Weinan, A proposal on machine learning via dynamical systems, Commun. Math. Stat. 5 (1) (2017) 1–11.
[19] E. Haber, L. Ruthotto, Stable architectures for deep neural networks, Inverse Probl. 34 (1) (2017) 014004.
[20] S. Sonoda, N. Murata, Double continuum limit of deep neural networks, in: ICML Workshop Principled Approaches to Deep Learning, 2017.
[21] Y. Lu, A. Zhong, Q. Li, B. Dong, Beyond finite layer neural networks: bridging deep architectures and numerical differential equations, in: International
Conference on Machine Learning, 2018, pp. 3282–3291.
[22] B. Chang, L. Meng, E. Haber, F. Tung, D. Begert, Multi-level residual networks from dynamical systems view, in: ICLR, 2018.
[23] T.Q. Chen, Y. Rubanova, J. Bettencourt, D. Duvenaud, Neural ordinary differential equations, arXiv preprint, arXiv:1806.07366.
[24] T. Qin, K. Wu, D. Xiu, Data driven governing equations approximation using deep neural networks, arXiv preprint, arXiv:1811.05537.
[25] S. Wiewel, M. Becher, N. Thuerey, Latent-space physics: towards learning the temporal evolution of fluid flow, arXiv preprint, arXiv:1802.10123.
Z. Long et al. / Journal of Computational Physics 399 (2019) 108925 17

[26] B. Kim, V.C. Azevedo, N. Thuerey, T. Kim, M. Gross, B. Solenthaler, Deep fluids: a generative network for parameterized fluid simulations, arXiv preprint,
arXiv:1806.02071.
[27] K. Duraisamy, Z.J. Zhang, A.P. Singh, New approaches in turbulence and transition modeling using data-driven techniques, in: 53rd AIAA Aerospace
Sciences Meeting, 2015, p. 1284.
[28] K. Duraisamy, G. Iaccarino, H. Xiao, Turbulence modeling in the age of data, Annu. Rev. Fluid Mech. 51 (2019) 357–377.
[29] C. Ma, J. Wang, et al., Model reduction with memory and the machine learning of dynamical systems, arXiv preprint, arXiv:1808.04258.
[30] W. Noid, J.-W. Chu, G.S. Ayton, V. Krishna, S. Izvekov, G.A. Voth, A. Das, H.C. Andersen, The multiscale coarse-graining method. I. A rigorous bridge
between atomistic and coarse-grained models, J. Chem. Phys. 128 (24) (2008) 244114.
[31] L. Zhang, J. Han, H. Wang, R. Car, E. Weinan, Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics, Phys. Rev.
Lett. 120 (14) (2018) 143001.
[32] J.-F. Cai, B. Dong, S. Osher, Z. Shen, Image restoration: total variation, wavelet frames, and beyond, J. Am. Math. Soc. 25 (4) (2012) 1033–1089.
[33] B. Dong, Q. Jiang, Z. Shen, Image restoration: wavelet frame shrinkage, nonlinear evolution pdes, and beyond, Multiscale Model. Simul. 15 (1) (2017)
606–660.
[34] I. Daubechies, Ten Lectures on Wavelets, vol. 61, SIAM, 1992.
[35] S. Mallat, A Wavelet Tour of Signal Processing, Elsevier, 1999.
[36] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. Liao, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review,
Int. J. Autom. Comput. 14 (5) (2017) 503–519.
[37] U. Shaham, A. Cloninger, R.R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal. 44 (3) (2018)
537–557.
[38] H. Montanelli, Q. Du, Deep relu networks lessen the curse of dimensionality, arXiv preprint, arXiv:1712.08688.
[39] Q. Wang, et al., Exponential convergence of the deep neural network approximation for analytic functions, arXiv preprint, arXiv:1807.00297.
[40] J. He, L. Li, J. Xu, C. Zheng, Relu deep neural networks and linear finite elements, arXiv preprint, arXiv:1807.03973.
[41] G. Martius, C.H. Lampert, Extrapolation and learning equations, arXiv preprint, arXiv:1610.02995.
[42] S. Sahoo, C. Lampert, G. Martius, Learning equations for extrapolation and control, in: International Conference on Machine Learning, 2018,
pp. 4439–4447.
[43] X.-D. Liu, S. Osher, T. Chan, Weighted essentially non-oscillatory schemes, J. Comput. Phys. 115 (1) (1994) 200–212.
[44] C.-W. Shu, Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws, in: Advanced Numerical
Approximation of Nonlinear Hyperbolic Equations, Springer, 1998, pp. 325–432.
[45] R.J. LeVeque, Finite Volume Methods for Hyperbolic Problems, vol. 31, Cambridge University Press, 2002.

Linear Equation 1
100% (1)
Linear Equation 1
25 pages
Math 9 Quart 1
100% (1)
Math 9 Quart 1
14 pages
5.1. Initial Value Problem - IVP
No ratings yet
5.1. Initial Value Problem - IVP
49 pages
Computational Fluid Dynamics Prof. Dr. Suman Chakraborty Department of Mechanical Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Computational Fluid Dynamics Prof. Dr. Suman Chakraborty Department of Mechanical Engineering Indian Institute of Technology, Kharagpur
18 pages
Problem Set 5: Problem 1: Steady Heat Equation Over A 1D Fin With A Finite Volume Method
No ratings yet
Problem Set 5: Problem 1: Steady Heat Equation Over A 1D Fin With A Finite Volume Method
5 pages
MTH101 Complete Handouts
No ratings yet
MTH101 Complete Handouts
301 pages
DLP No.2 MNHS
100% (1)
DLP No.2 MNHS
3 pages
NCERT Solutions For CBSE Class 8 Maths Chapter 9 Algebraic Expressions and Identities
No ratings yet
NCERT Solutions For CBSE Class 8 Maths Chapter 9 Algebraic Expressions and Identities
20 pages
0709 2071 PDF
No ratings yet
0709 2071 PDF
20 pages
Linear Equation in Two Variables - 1
No ratings yet
Linear Equation in Two Variables - 1
5 pages
Local Discontinuous Galerkin Methods For High-Order Time-Dependent Partial Differential Equations
No ratings yet
Local Discontinuous Galerkin Methods For High-Order Time-Dependent Partial Differential Equations
57 pages
Unit Learning Plan Q1
No ratings yet
Unit Learning Plan Q1
6 pages
Central Library, Doon University: New Arrivals (December-2016)
No ratings yet
Central Library, Doon University: New Arrivals (December-2016)
9 pages
MATHEMATICS T Muar Marking SchemeTrialSem22021
No ratings yet
MATHEMATICS T Muar Marking SchemeTrialSem22021
9 pages
Problem Set 1: 2.29 / 2.290 Numerical Fluid Mechanics - Spring 2021
No ratings yet
Problem Set 1: 2.29 / 2.290 Numerical Fluid Mechanics - Spring 2021
7 pages
Documentation 229 FV Framework
No ratings yet
Documentation 229 FV Framework
42 pages
MBE 2036 Finding Roots: Dr. Yajing Shen
No ratings yet
MBE 2036 Finding Roots: Dr. Yajing Shen
43 pages
Deferred Correction Methods For Ordinary Differential Equations
No ratings yet
Deferred Correction Methods For Ordinary Differential Equations
33 pages
Edexcel Further Maths Scheme of Work
No ratings yet
Edexcel Further Maths Scheme of Work
93 pages
DL For Mesh Relaxation
No ratings yet
DL For Mesh Relaxation
17 pages
Partial: Implicit-Explicit Methods For Time-Dependent Differential Equations Ascher, Wetton
No ratings yet
Partial: Implicit-Explicit Methods For Time-Dependent Differential Equations Ascher, Wetton
27 pages
First Order Optimization Algorithms Via Discretization of Finite Time Convergent Flows
No ratings yet
First Order Optimization Algorithms Via Discretization of Finite Time Convergent Flows
24 pages
Geometric Algebra For Physicists - Errata PDF
No ratings yet
Geometric Algebra For Physicists - Errata PDF
6 pages
M.Tech - Computer Vision and Image Processing
No ratings yet
M.Tech - Computer Vision and Image Processing
21 pages
1 - Partial Derivatives Lec5
No ratings yet
1 - Partial Derivatives Lec5
9 pages
Distdgl: Distributed Graph Neural Network Training For Billion-Scale Graphs
No ratings yet
Distdgl: Distributed Graph Neural Network Training For Billion-Scale Graphs
9 pages
Contaminant Transport and Biodegradation A Numerical Model For Reactive Transport in Porous Media
No ratings yet
Contaminant Transport and Biodegradation A Numerical Model For Reactive Transport in Porous Media
8 pages
18CE310 Differential Equations and Fourier Series Syll
No ratings yet
18CE310 Differential Equations and Fourier Series Syll
6 pages
Nit Hamirpur Syllabus
No ratings yet
Nit Hamirpur Syllabus
37 pages
Stochastic Physics-Informed Neural Ordinary Differential Equations
No ratings yet
Stochastic Physics-Informed Neural Ordinary Differential Equations
35 pages
2.29 / 2.290 Numerical Fluid Mechanics - Spring 2021 Problem Set 2
No ratings yet
2.29 / 2.290 Numerical Fluid Mechanics - Spring 2021 Problem Set 2
5 pages
Crystal GCN
No ratings yet
Crystal GCN
7 pages
Paper 39
No ratings yet
Paper 39
10 pages
Exponential Equation
No ratings yet
Exponential Equation
13 pages
REVIEW Class 7 Math T3 W7
No ratings yet
REVIEW Class 7 Math T3 W7
2 pages
DL-PDE: Deep-Learning Based Data-Driven Discovery of Partial Differential Equations From Discrete and Noisy Data
No ratings yet
DL-PDE: Deep-Learning Based Data-Driven Discovery of Partial Differential Equations From Discrete and Noisy Data
28 pages
Problem Set 3: Notes: Please Provide A Hard-Copy (Handwritten/printed) of Your Solutions. Please Do Not Print Your Codes
No ratings yet
Problem Set 3: Notes: Please Provide A Hard-Copy (Handwritten/printed) of Your Solutions. Please Do Not Print Your Codes
5 pages
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
No ratings yet
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
67 pages
Pabalate-Math G9 Q4 WT 2
No ratings yet
Pabalate-Math G9 Q4 WT 2
5 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
(Berg, Jens, and Kaj Nystrom), Data-Driven Discovery of PDEs in Complex Datasets, Journal of Computational Physics 384 (2019)
No ratings yet
(Berg, Jens, and Kaj Nystrom), Data-Driven Discovery of PDEs in Complex Datasets, Journal of Computational Physics 384 (2019)
14 pages
S0021999119306783
No ratings yet
S0021999119306783
1 page
Math g7 m2 Topic C Lesson 21 Student
No ratings yet
Math g7 m2 Topic C Lesson 21 Student
5 pages
16992-Article Text-20486-1-2-20210518
No ratings yet
16992-Article Text-20486-1-2-20210518
9 pages
UC Merced Previously Published Works
No ratings yet
UC Merced Previously Published Works
40 pages
GrADE A Graph Based Data Driven Solver F
No ratings yet
GrADE A Graph Based Data Driven Solver F
20 pages
2304.00388 Multilevel CNNs For Parametric PDEs
No ratings yet
2304.00388 Multilevel CNNs For Parametric PDEs
42 pages
NaikerMaths A-level-Best-Guess-Paper-2-2023
No ratings yet
NaikerMaths A-level-Best-Guess-Paper-2-2023
10 pages
PDE-LEARN - Using Deep Learning To Discover PDE From Noisy, Limited Data
No ratings yet
PDE-LEARN - Using Deep Learning To Discover PDE From Noisy, Limited Data
25 pages
pde 微分方程与神经网络
No ratings yet
pde 微分方程与神经网络
16 pages
Can Physics Informaed Neural Networks Beat Finite Element Method
No ratings yet
Can Physics Informaed Neural Networks Beat Finite Element Method
27 pages
2023-24 Xii Ut2
No ratings yet
2023-24 Xii Ut2
3 pages
New Bridges Between Deep Learning and Partial Differential Equations
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
5 pages
2403.12938neural Differential Algebraic Equations
No ratings yet
2403.12938neural Differential Algebraic Equations
8 pages
PINN
100% (1)
PINN
22 pages
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
No ratings yet
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
19 pages
Neural Operator Graph Kernel Network For Partial Differential Equations
No ratings yet
Neural Operator Graph Kernel Network For Partial Differential Equations
21 pages
PDE1
No ratings yet
PDE1
29 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
High Precision Differentiation Techniques For Data-Driven Solution of Nonlinear Pdes by Physics-Informed Neural Networks
No ratings yet
High Precision Differentiation Techniques For Data-Driven Solution of Nonlinear Pdes by Physics-Informed Neural Networks
28 pages
PDE2
No ratings yet
PDE2
20 pages
Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty qu-已压缩
No ratings yet
Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty qu-已压缩
26 pages
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
12 pages
Raissi - PIDL Part 2
No ratings yet
Raissi - PIDL Part 2
19 pages
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
No ratings yet
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
12 pages
Continuity - Differentiability - JEE Mains PYQ 2020-2022
No ratings yet
Continuity - Differentiability - JEE Mains PYQ 2020-2022
123 pages
1812 11285v4
No ratings yet
1812 11285v4
46 pages
Deep XDE
No ratings yet
Deep XDE
21 pages
Gelbrecht Et Al. - 2021 - Neural Partial Differential Equations For Chaotic
No ratings yet
Gelbrecht Et Al. - 2021 - Neural Partial Differential Equations For Chaotic
11 pages
DeepXDE A Deep Learning Library For Solving Differ
No ratings yet
DeepXDE A Deep Learning Library For Solving Differ
17 pages
MMMMPR
No ratings yet
MMMMPR
25 pages
PDE-Net - Learning PDEs From Data
No ratings yet
PDE-Net - Learning PDEs From Data
9 pages
A Neural Network Based PDE Solving Algorithm With High Precision
No ratings yet
A Neural Network Based PDE Solving Algorithm With High Precision
12 pages
An Analysis of Universal Differential Equations For
No ratings yet
An Analysis of Universal Differential Equations For
10 pages
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
No ratings yet
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
18 pages
Sciadv Abi8605
No ratings yet
Sciadv Abi8605
10 pages
Deepxde: A Deep Learning Library For Solving Differential Equations
No ratings yet
Deepxde: A Deep Learning Library For Solving Differential Equations
21 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
NeurIPS2024论文
No ratings yet
NeurIPS2024论文
23 pages
Pde Homework Solutions
100% (1)
Pde Homework Solutions
5 pages
TISCO Apprentice Exam Pattern
No ratings yet
TISCO Apprentice Exam Pattern
3 pages
Connections Between Deep Learning and Partial Differential Equations
No ratings yet
Connections Between Deep Learning and Partial Differential Equations
2 pages
Three Ways To Solve Partial Differential Equations With Neural
No ratings yet
Three Ways To Solve Partial Differential Equations With Neural
32 pages
Approximation of Solution Operators For High-Dimensional Pdes
No ratings yet
Approximation of Solution Operators For High-Dimensional Pdes
15 pages
Neural Networks For Bifurcation and Linear Stability Analysis of Steady States in Partial Differential Equations
No ratings yet
Neural Networks For Bifurcation and Linear Stability Analysis of Steady States in Partial Differential Equations
34 pages
BBBBBBBBB
No ratings yet
BBBBBBBBB
2 pages
Ai Lorenz Pinn
No ratings yet
Ai Lorenz Pinn
28 pages
Xu, Zhang, & Zeng (DLGA-PDE 2021)
No ratings yet
Xu, Zhang, & Zeng (DLGA-PDE 2021)
18 pages
Approximation of Solution Operators for High-dimensional PDEs部分1
No ratings yet
Approximation of Solution Operators for High-dimensional PDEs部分1
2 pages
Sun Et Al (SGA)
No ratings yet
Sun Et Al (SGA)
15 pages
Xu, Chang, & Zhang (DLGA-PDE 2020)
No ratings yet
Xu, Chang, & Zhang (DLGA-PDE 2020)
14 pages
PDENet
No ratings yet
PDENet
9 pages
Sensors 23 02792 v2
No ratings yet
Sensors 23 02792 v2
20 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
Neural Differential Equations: A Comprehensive Review and Applications
No ratings yet
Neural Differential Equations: A Comprehensive Review and Applications
14 pages
Strong Form and Weak Form of Fea
No ratings yet
Strong Form and Weak Form of Fea
2 pages
The Role of Differential Equations in Mathematical Modeling and The Use of Neural Networks To Solve Differential Equations
No ratings yet
The Role of Differential Equations in Mathematical Modeling and The Use of Neural Networks To Solve Differential Equations
2 pages
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
No ratings yet
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
51 pages
Geometry-Dependent Solution Operator
No ratings yet
Geometry-Dependent Solution Operator
17 pages
NeurIPS 2022 Physics Embedded Neural Networks Graph Neural Pde Solvers With Mixed Boundary Conditions Paper Conference
No ratings yet
NeurIPS 2022 Physics Embedded Neural Networks Graph Neural Pde Solvers With Mixed Boundary Conditions Paper Conference
12 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Computational Physics: Basic Concepts
From Everand
Computational Physics: Basic Concepts
Devang Patil
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet

Journal of Computational Physics: Zichao Long, Yiping Lu, Bin Dong

Uploaded by

Journal of Computational Physics: Zichao Long, Yiping Lu, Bin Dong

Uploaded by

Journal of Computational Physics 399 (2019) 108925

Contents lists available at ScienceDirect

Journal of Computational Physics

PDE-Net 2.0: Learning PDEs from data with a

1.1. Existing work and motivations

1.2. Our approach

1.3. Relation with model reduction

2. PDE-Net 2.0: architecture, constraints and training

Given a series of measurements of some physical quantities {U (t , x, y ) : t = t 0 , t 1 , · · · , (x, y ) ∈ ⊂ R2 } ⊂ Rd with

here U (t , ·) : → Rd , F (U , U x , U y , U xx , U xy , U y y , . . .) ∈ Rd , (x, y ) ∈ ⊂ R2 , t ∈ [0, T ]. Our objective is to design a feed-

2.1. Architecture of PDE-Net 2.0

Fig. 1. The schematic diagram of a δt-block.

Fig. 2. The schematic diagram of the PDE-Net 2.0.

2.1.2. Multiple δt-blocks:

2.2. Convolutions and differentiations

lβ1 mβ2 q[l, m] = 0.

M (q) = (mi , j ) N × N , (6)

Fig. 3. The schematic diagram of S ymNet.

2.3. Design of S ymNet: a symbolic neural network

P = xm+1 A + Q ∈ P k [x1 , · · · , xm , xm+1 ].

Proposition 2.3. Let P ∈ P k [x1 , · · · , xm ] and suppose P have monomials of degree ≤ l.

• The memory load of S ymNetm k

2.4. Loss function and regularization

L = L data + λ1 L moment + λ2 L S ymNet ,

2.4.1. Data approximation L data

where Ũ j (t i , ·) is the output of the PDE-Net 2.0 with Ũ j (t 0 , ·) = U j (t 0 , ·) as the input.

2.4.2. Regularization: L moment and L S ymNet

Then, we deﬁne L moment as

We set s = 0.001 in our numerical experiments.

Algorithm 2 Pseudo-upwind in δt-block.

2.5. Initialization and training

Correct PDE ut = −uu x − vu y + 0.05(u xx + u y y )

Frozen-PDE-Net 2.0 ut = −0.906uu x − 0.901vu y + 0.033u xx + 0.037u y y

PDE-Net 2.0 ut = −0.979uu x − 0.973u y v + 0.052u xx + 0.051u y y

3. Numerical studies: Burgers’ equation

3.1. Simulated data, training and testing

with (t , x, y ) ∈ [0, 4] × , where ν = 0.05.

where M = maxx, y ,t {U (t , x, y )}, W ∼ N (0, 1).

where { D i j : 0 ≤ i + j ≤ 2} are convolution operators.

3.2. Results and discussions

3.3. Importance of L S ymNet and pseudo-upwind

Correct PDE ut = 0.1(u xx + u y y )

4. Numerical studies: diffusion equation

4.1. Simulated data, training and testing

4.2. Results and discussions

5. Numerical studies: convection diffusion equation with a reactive source

5.1. Simulated data, training and testing

5.2. Results and discussions

Correct PDE ut = −uu x − vu y + 0.1u + ( v − u )(u 2 + v 2 ) + u

Frozen-PDE-Net 2.0 ut = −0.86uu x − 0.90vu y + 0.09u xx + 0.09u y y

PDE-Net 2.0 ut = −0.98vu y − 0.93uu x + 0.10u xx + 0.10u y y

6. Conclusions and future work

Declaration of competing interest

You might also like

here U (t , ·) : → Rd , F (U , U x , U y , U xx , U xy , U y y , . . .) ∈ Rd , (x, y ) ∈ ⊂ R2 , t ∈ [0, T ]. Our objective is to design a feed-

Correct PDE ut = −uu x − vu y + 0.1u + ( v − u )(u 2 + v 2 ) + u