PINN
PINN
Abstract
We introduce physics informed neural networks – neural networks that
are trained to solve supervised learning tasks while respecting any given law
of physics described by general nonlinear partial differential equations. In
this two part treatise, we present our developments in the context of solving
two main classes of problems: data-driven solution and data-driven discovery
of partial differential equations. Depending on the nature and arrangement
of the available data, we devise two distinct classes of algorithms, namely
continuous time and discrete time models. The resulting neural networks
form a new class of data-efficient universal function approximators that nat-
urally encode any underlying physical laws as prior information. In this first
part, we demonstrate how these networks can be used to infer solutions to
partial differential equations, and obtain physics-informed surrogate models
that are fully differentiable with respect to all input coordinates and free
parameters.
Keywords: Data-driven scientific computing, Machine learning, Predictive
modeling, Runge-Kutta methods, Nonlinear dynamics
1. Introduction
With the explosive growth of available data and computing resources, re-
cent advances in machine learning and data analytics have yielded transfor-
mative results across diverse scientific disciplines, including image recognition
[1], natural language processing [2], cognitive science [3], and genomics [4].
2
Secondly, the Bayesian nature of Gaussian process regression requires certain
prior assumptions that may limit the representation capacity of the model
and give rise to robustness/brittleness issues, especially for nonlinear prob-
lems [11].
The general aim of this work is to set the foundations for a new paradigm
in modeling and computation that enriches deep learning with the long-
standing developments in mathematical physics. These developments are
presented in the context of two main problem classes: data-driven solution
and data-driven discovery of partial differential equations. To this end, let
us consider parametrized and nonlinear partial differential equations of the
general form
ut + N [u; λ] = 0,
where u(t, x) denotes the latent (hidden) solution and N [·; λ] is a nonlinear
operator parametrized by λ. This setup encapsulates a wide range of prob-
lems in mathematical physics including conservation laws, diffusion processes,
advection-diffusion-reaction systems, and kinetic equations. As a motivating
example, the one dimensional Burgers’ equation [14] corresponds to the case
3
where N [u; λ] = λ1 uux − λ2 uxx and λ = (λ1 , λ2 ). Here, the subscripts denote
partial differentiation in either time or space. Given noisy measurements of
the system, we are interested in the solution of two distinct problems. The
first problem is that of predictive inference, filtering and smoothing, or data
driven solutions of partial differential equations [9, 5] which states: given
fixed model parameters λ what can be said about the unknown hidden state
u(t, x) of the system? The second problem is that of learning, system identi-
fication, or data-driven discovery of partial differential equations [10, 6, 15]
stating: what are the parameters λ that best describe the observed data?
where u(t, x) denotes the latent (hidden) solution, N [·] is a nonlinear differ-
ential operator, and Ω is a subset of RD . In what follows, we put forth two
distinct classes of algorithms, namely continuous and discrete time models,
and highlight their properties and performance through the lens of different
benchmark problems. All code and data-sets accompanying this manuscript
are available at https://github.com/maziarraissi/PINNs.
4
the viscosity parameters, Burgers’ equation can lead to shock formation that
is notoriously hard to resolve by classical numerical methods. In one space
dimension, the Burger’s equation along with Dirichlet boundary conditions
reads as
Correspondingly, the physics informed neural network f (t, x) takes the form
def f(t, x):
u = u(t, x)
u_t = tf.gradients(u, t)[0]
u_x = tf.gradients(u, x)[0]
u_xx = tf.gradients(u_x, x)[0]
f = u_t + u*u_x - (0.01/tf.pi)*u_xx
return f
The shared parameters between the neural networks u(t, x) and f (t, x) can
be learned by minimizing the mean squared error loss
Two neural networks??
M SE = M SEu + M SEf , (4)
5
where
Nu
1 X
M SEu = |u(tiu , xiu ) − ui |2 ,
Nu i=1
and
Nf
1 X
M SEf = |f (tif , xif )|2 .
Nf i=1
N
and {tif , xif }i=1 specify the collocations points for f (t, x). The loss M SEu
f
corresponds to the initial and boundary data while M SEf enforces the struc-
ture imposed by equation (3) at a finite set of collocation points.
Figure 1 summarizes our results for the data-driven solution of the Burg-
ers equation. Specifically, given a set of Nu = 100 randomly distributed
initial and boundary data, we learn the latent solution u(t, x) by training all
3021 parameters of a 9-layer deep neural network using the mean squared
error loss of (4). Each hidden layer contained 20 neurons and a hyperbolic
tangent activation function. In general, the neural network should be given
sufficient approximation capacity in order to accommodate the anticipated
6
complexity of u(t, x). However, in this example, our choice aims to highlight
the robustness of the proposed method with respect to the well known issue
of over-fitting. Specifically, the term in M SEf in equation (4) acts as a reg-
ularization mechanism that penalizes solutions that do not satisfy equation
(3). Therefore, a key property of physics informed neural networks is that
they can be effectively trained using small data sets; a setting often encoun-
tered in the study of physical systems for which the cost of data acquisition
may be prohibitive.
7
u(t, x)
1.0
Data (100 points) 0.75
0.5 0.50
0.25
0.0 0.00
x
−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.25 t = 0.50 t = 0.75
1 1 1
u(t, x)
u(t, x)
u(t, x)
0 0 0
−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
x x x
Exact Prediction
Figure 1: Burgers’ equation: Top: Predicted solution u(t, x) along with the initial and
boundary training data. In addition we are using 10,000 collocation points generated using
a Latin Hypercube Sampling strategy. Bottom: Comparison of the predicted and exact
solutions corresponding to the three temporal snapshots depicted by the white vertical
lines in the top panel. The relative L2 error for this case is 6.7 · 10−4 . Model training took
approximately 60 seconds on a single NVIDIA Titan X GPU card.
ical law through the collocation points Nf , one can obtain a more accurate
and data-efficient learning algorithm.1 Finally, table 2 shows the resulting
relative L2 for different number of hidden layers, and different number of
neurons per layer, while the total number of training and collocation points
is kept fixed to Nu = 100 and Nf = 10, 000, respectively. As expected, we
observe that as the number of layers and neurons is increased (hence the
capacity of the neural network to approximate more complex functions), the
1
Note that the case Nf = 0 corresponds to a standard neural network model, i.e., a
neural network that does not take into account the underlying governing equation.
8
Nf
2000 4000 6000 7000 8000 10000
Nu
20 2.9e-01 4.4e-01 8.9e-01 1.2e+00 9.9e-02 4.2e-02
40 6.5e-02 1.1e-02 5.0e-01 9.6e-03 4.6e-01 7.5e-02
60 3.6e-01 1.2e-02 1.7e-01 5.9e-03 1.9e-03 8.2e-03
80 5.5e-03 1.0e-03 3.2e-03 7.8e-03 4.9e-02 4.5e-03
100 6.6e-02 2.7e-01 7.2e-03 6.8e-04 2.2e-03 6.7e-04
200 1.5e-01 2.3e-03 8.2e-04 8.9e-04 6.1e-04 4.9e-04
Table 1: Burgers’ equation: Relative L2 error between the predicted and the exact solution
u(t, x) for different number of initial and boundary training data Nu , and different number
of collocation points Nf . Here, the network architecture is fixed to 9 layers with 20 neurons
per hidden layer.
Neurons
10 20 40
Layers
2 7.4e-02 5.3e-02 1.0e-01
4 3.0e-03 9.4e-04 6.4e-04
6 9.6e-03 1.3e-03 6.1e-04
8 2.5e-03 9.6e-04 5.6e-04
Table 2: Burgers’ equation: Relative L2 error between the predicted and the exact solution
u(t, x) for different number of hidden layers and different number of neurons per layer.
Here, the total number of training and collocation points is fixed to Nu = 100 and Nf =
10, 000, respectively.
9
with periodic boundary conditions is given by
where
N0
1 X
M SE0 = |h(0, xi0 ) − hi0 |2 ,
N0 i=1
Nb
1 X
|hi (tib , −5) − hi (tib , 5)|2 + |hix (tib , −5) − hix (tib , 5)|2 ,
M SEb =
Nb i=1
and
Nf
1 X
M SEf = |f (tif , xif )|2 .
Nf i=1
i Nb
Here, {xi0 , hi0 }N
i=1 denotes the initial data, {tb }i=1 corresponds to the colloca-
0
Nf
tion points on the boundary, and {tif , xif }i=1 represents the collocation points
on f (t, x). Consequently, M SE0 corresponds to the loss on the initial data,
M SEb enforces the periodic boundary conditions, and M SEf penalizes the
Schrödinger equation not being satisfied on the collocation points.
10
(5) using conventional spectral methods to create a high-resolution data set.
Specifically, starting from an initial state h(0, x) = 2 sech(x) and assuming
periodic boundary conditions h(t, −5) = h(t, 5) and hx (t, −5) = hx (t, 5), we
have integrated equation (5) up to a final time t = π/2 using the Chebfun
package [22] with a spectral Fourier discretization with 256 modes and a
fourth-order explicit Runge-Kutta temporal integrator with time-step ∆t =
π/2 · 10−6 . Under our data-driven setting, all we observe are measurements
{xi0 , hi0 }N
i=1 of the latent function h(t, x) at time t = 0. In particular, the train-
0
Here our goal is to infer the entire spatio-temporal solution h(t, x) of the
Schrödinger equation (5). We chose to jointly represent the latent func-
tion h(t, x) = [u(t, x) v(t, x)] using a 5-layer deep neural network with
100 neurons per layer and a hyperbolic tangent activation function. Fig-
ure 2 summarizes the results of our experiment. Specifically, the top panel
of figure 2 pshows the magnitude of the predicted spatio-temporal solution
|h(t, x)| = u2 (t, x) + v 2 (t, x), along with the locations of the initial and
boundary training data. The resulting prediction error is validated against
the test data for this problem, and is measured at 1.97 · 10−3 in the rela-
tive L2 -norm. A more detailed assessment of the predicted solution is pre-
sented in the bottom panel of Figure 2. In particular, we present a compar-
ison between the exact and the predicted solutions at different time instants
t = 0.59, 0.79, 0.98. Using only a handful of initial data, the physics informed
neural network can accurately capture the intricate nonlinear behavior of the
Schrödinger equation.
11
|h(t, x)|
5
Data (150 points) 3.5
3.0
2.5
0 2.0
x
1.5
1.0
0.5
−5
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
t
|h(t, x)|
|h(t, x)|
0 0 0
−5 0 5 −5 0 5 −5 0 5
x x x
Exact Prediction
Figure 2: Shrödinger equation: Top: Predicted solution |h(t, x)| along with the initial and
boundary training data. In addition we are using 20,000 collocation points generated using
a Latin Hypercube Sampling strategy. Bottom: Comparison of the predicted and exact
solutions corresponding to the three temporal snapshots depicted by the dashed vertical
lines in the top panel. The relative L2 error for this case is 1.97 · 10−3 .
12
Here, un+cj (x) = u(tn + cj ∆t, x) for j = 1, . . . , q. This general form en-
capsulates both implicit and explicit time-stepping schemes, depending on
the choice of the parameters {aij , bj , cj }. Equations (7) can be equivalently
expressed as
un = uni , i = 1, . . . , q,
(8)
un = unq+1 ,
where
Pq
uni := un+ci + ∆t j=1 aij N [un+cj ], i = 1, . . . , q,
Pq (9)
unq+1 := un+1 + ∆t j=1 bj N [u
n+cj
].
We proceed by placing a multi-output neural network prior on
n+c
u 1 (x), . . . , un+cq (x), un+1 (x) .
(10)
This prior assumption along with equations (9) result in a physics informed
neural network that takes x as an input and outputs
n
u1 (x), . . . , unq (x), unq+1 (x) .
(11)
and the shared parameters of the neural networks (10) and (11) can be
learned by minimizing the sum of squared errors
where
q+1 Nn
X X
SSEn = |unj (xn,i ) − un,i |2 ,
j=1 i=1
13
and
q
X
|un+ci (−1)|2 + |un+ci (1)|2 + |un+1 (−1)|2 + |un+1 (1)|2 .
SSEb =
i=1
scheme now allows us to infer the latent solution u(t, x) in a sequential fash-
ion. Starting from initial data {xn,i , un,i }N n
i=1 at time t and data at the
n
2
To be precise, it is only the number of parameters in the last layer of the neural
network that increases linearly with the total number of stages.
14
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x
−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.10 t = 0.90
1.0
0.5
0.5
u(t, x)
u(t, x)
0.0 0.0
−0.5
−0.5
−1.0
−1 0 1 −1 0 1
x x
Data Exact Prediction
Figure 3: Burgers equation: Top: Solution u(t, x) along with the location of the initial
training snapshot at t = 0.1 and the final prediction snapshot at t = 0.9. Bottom: Initial
training data and final prediction at the snapshots depicted by the white vertical lines in
the top panel. The relative L2 error for this case is 8.2 · 10−4 .
of that high-order has ever been used. Remarkably, starting from smooth
initial data at t = 0.1 we can predict the nearly discontinuous solution at
t = 0.9 in a single time-step with a relative L2 error of 8.2·10−4 . This error is
two orders of magnitude lower that the one reported in [9], and it is entirely
attributed to the neural network’s capacity to approximate u(t, x), as well as
to the degree that the sum of squared errors loss allows interpolation of the
training data. The network architecture used here consists of 4 layers with
50 neurons in each hidden layer.
15
Neurons
10 25 50
Layers
1 4.1e-02 4.1e-02 1.5e-01
2 2.7e-03 5.0e-03 2.4e-03
3 3.6e-03 1.9e-03 9.5e-04
Table 3: Burgers’ equation: Relative final prediction error measure in the L2 norm for
different number of hidden layers and neurons in each layer. Here, the number of Runge-
Kutta stages is fixed to 500 and the time-step size to ∆t = 0.8.
The key parameters controlling the performance of our discrete time al-
gorithm are the total number of Runge-Kutta stages q and the time-step size
∆t. In table 4 we summarize the results of an extensive systematic study
where we fix the network architecture to 4 hidden layers with 50 neurons
per layer, and vary the number of Runge-Kutta stages q and the time-step
size ∆t. Specifically, we see how cases with low numbers of stages fail to
yield accurate results when the time-step size is large. For instance, the case
q = 1 corresponding to the classical trapezoidal rule, and the case q = 2
corresponding to the 4th -order Gauss-Legendre method, cannot retain their
predictive accuracy for time-steps larger than 0.2, thus mandating a solu-
tion strategy with multiple time-steps of small size. On the other hand, the
ability to push the number of Runge-Kutta stages to 32 and even higher
allows us to take very large time steps, and effectively resolve the solution
in a single step without sacrificing the accuracy of our predictions. More-
over, numerical stability is not sacrificed either as implicit Runge-Kutta is
the only family of time-stepping schemes that remain A-stable regardless of
their order, thus making them ideal for stiff problems [24]. These properties
are unprecedented for an algorithm of such implementation simplicity, and
illustrate one of the key highlights of our discrete time approach.
16
∆t
0.2 0.4 0.6 0.8
q
1 3.5e-02 1.1e-01 2.3e-01 3.8e-01
2 5.4e-03 5.1e-02 9.3e-02 2.2e-01
4 1.2e-03 1.5e-02 3.6e-02 5.4e-02
8 6.7e-04 1.8e-03 8.7e-03 5.8e-02
16 5.1e-04 7.6e-02 8.4e-04 1.1e-03
32 7.4e-04 5.2e-04 4.2e-04 7.0e-04
64 4.5e-04 4.8e-04 1.2e-03 7.8e-04
100 5.1e-04 5.7e-04 1.8e-02 1.2e-03
500 4.1e-04 3.8e-04 4.2e-04 8.2e-04
Table 4: Burgers’ equation: Relative final prediction error measured in the L2 norm for
different number of Runge-Kutta stages q and time-step sizes ∆t. Here, the network
architecture is fixed to 4 hidden layers with 50 neurons in each layer.
and the shared parameters of the neural networks (10) and (11) can be
learned by minimizing the sum of squared errors
17
where
q+1 Nn
X X
SSEn = |unj (xn,i ) − un,i |2 ,
j=1 i=1
and
q
X
SSEb = |un+ci (−1) − un+ci (1)|2 + |un+1 (−1) − un+1 (1)|2
i=1
q
X
+ |un+c
x
i
(−1) − uxn+ci (1)|2 + |un+1 n+1 2
x (−1) − ux (1)| .
i=1
a training and test data-set set by simulating the Allen-Cahn equation (13)
using conventional spectral methods. Specifically, starting from an initial
condition u(0, x) = x2 cos(πx) and assuming periodic boundary conditions
u(t, −1) = u(t, 1) and ux (t, −1) = ux (t, 1), we have integrated equation (13)
up to a final time t = 1.0 using the Chebfun package [22] with a spectral
Fourier discretization with 512 modes and a fourth-order explicit Runge-
Kutta temporal integrator with time-step ∆t = 10−5 .
In this example, we assume Nn = 200 initial data points that are ran-
domly sub-sampled from the exact solution at time t = 0.1, and our goal
is to predict the solution at time t = 0.9 using a single time-step with size
∆t = 0.8. To this end, we employ a discrete time physics informed neural
network with 4 hidden layers and 200 neurons per layer, while the output
layer predicts 101 quantities of interest corresponding to the q = 100 Runge-
Kutta stages un+ci (x), i = 1, . . . , q, and the solution at final time un+1 (x).
Figure 4 summarizes our predictions after the network has been trained using
the loss function of equation (14). Evidently, despite the complex dynamics
leading to a solution with two sharp internal layers, we are able to obtain an
accurate prediction of the solution at t = 0.9 using only a small number of
scattered measurements at t = 0.1.
18
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x
−0.25
−0.5 −0.50
−0.75
−1.0 −1.00
0.0 0.2 0.4 0.6 0.8 1.0
t
t = 0.10 t = 0.90
1.0
0.00
0.5
−0.25
u(t, x)
u(t, x)
−0.50 0.0
−0.75 −0.5
−1.00 −1.0
−1 0 1 −1 0 1
x x
Data Exact Prediction
Figure 4: Allen-Cahn equation: Top: Solution u(t, x) along with the location of the initial
training snapshot at t = 0.1 and the final prediction snapshot at t = 0.9. Bottom: Initial
training data and final prediction at the snapshots depicted by the white vertical lines in
the top panel. The relative L2 error for this case is 6.99 · 10−3 .
physical laws that govern a given data-set, and can be described by par-
tial differential equations. In this work, we design data-driven algorithms for
inferring solutions to general nonlinear partial differential equations, and con-
structing computationally efficient physics-informed surrogate models. The
resulting methods showcase a series of promising results for a diverse collec-
tion of problems in computational science, and open the path for endowing
deep learning with the powerful capacity of mathematical physics to model
the world around us. As deep learning technology is continuing to grow
rapidly both in terms of methodological and algorithmic developments, we
believe that this is a timely contribution that can benefit practitioners across
19
a wide range of scientific domains. Specific applications that can readily en-
joy these benefits include, but are not limited to, data-driven forecasting of
physical processes, model predictive control, multi-physics/multi-scale mod-
eling and simulation.
We must note however that the proposed methods should not be viewed
as replacements of classical numerical methods for solving partial differen-
tial equations (e.g., finite elements, spectral methods, etc.). Such methods
have matured over the last 50 years and, in many cases, meet the robustness
and computational efficiency standards required in practice. Our message
here, as advocated in Section 3, is that classical methods such as the Runge-
Kutta time-stepping schemes can coexist in harmony with deep neural net-
works, and offer invaluable intuition in constructing structured predictive
algorithms. Moreover, the implementation simplicity of the latter greatly
favors rapid development and testing of new ideas, potentially opening the
path for a new era in data-driven scientific computing. This will be further
highlighted in the second part of this paper in which physics informed neural
networks are put to the test of data-driven discovery of partial differential
equations.
Acknowledgements
This work received support by the DARPA EQUiPS grant N66001-15-
2-4055, the MURI/ARO grant W911NF-15-1-0562, and the AFOSR grant
FA9550-17-1-0013. All data and codes used in this manuscript are publicly
available on GitHub at https://github.com/maziarraissi/PINNs.
References
[1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with
deep convolutional neural networks, in: Advances in neural information
processing systems, pp. 1097–1105.
20
[2] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015)
436–444.
21
[13] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Au-
tomatic differentiation in machine learning: a survey, arXiv preprint
arXiv:1502.05767 (2015).
[17] D. C. Liu, J. Nocedal, On the limited memory BFGS method for large
scale optimization, Mathematical programming 45 (1989) 503–528.
[21] R. Shwartz-Ziv, N. Tishby, Opening the black box of deep neural net-
works via information, arXiv preprint arXiv:1703.00810 (2017).
22