GPINN
GPINN
com/science/article/pii/S0045782522001438
Manuscript_a63fe04b003235eeef843d8d1f819116
Abstract
Deep learning has been shown to be an effective tool in solving partial differential equations (PDEs)
through physics-informed neural networks (PINNs). PINNs embed the PDE residual into the
loss function of the neural network, and have been successfully employed to solve diverse forward
and inverse PDE problems. However, one disadvantage of the first generation of PINNs is that
they usually have limited accuracy even with many training points. Here, we propose a new
method, gradient-enhanced physics-informed neural networks (gPINNs), for improving the accuracy
of PINNs. gPINNs leverage gradient information of the PDE residual and embed the gradient into
the loss function. We tested gPINNs extensively and demonstrated the effectiveness of gPINNs
in both forward and inverse PDE problems. Our numerical results show that gPINN performs
better than PINN with fewer training points. Furthermore, we combined gPINN with the method
of residual-based adaptive refinement (RAR), a method for improving the distribution of training
points adaptively during training, to further improve the performance of gPINN, especially in PDEs
with solutions that have steep gradients.
Keywords: deep learning, partial differential equations, physics-informed neural networks,
gradient-enhanced, residual-based adaptive refinement
1. Introduction
Deep learning has achieved remarkable success in diverse applications; however, its use in solving
partial differential equations (PDEs) has emerged only recently [1]. As an alternative method to
traditional numerical PDE solvers, physics-informed neural networks (PINNs) [2, 3] solve a PDE
via embedding the PDE into the loss of the neural network using automatic differentiation. The
∗
Corresponding author. Email: [email protected]
© 2022 published by Elsevier. This manuscript is made available under the Elsevier user license
https://www.elsevier.com/open-access/userlicense/1.0/
PINN algorithm is mesh-free and simple, and it can be applied to different types of PDEs, including
integro-differential equations [4], fractional PDEs [5], and stochastic PDEs [6, 7]. Moreover, one
main advantage of PINNs is that from the implementation point of view, PINNs solve inverse PDE
problems as easily as forward problems. PINNs have been successfully employed to solve diverse
problems in different fields, for example in optics [8, 9], fluid mechanics [10], systems biology [11],
and biomedicine [12].
Despite promising early results, there are still some issues in PINNs to be addressed. One open
problem is how to improve the PINN accuracy and efficiency. There are a few aspects of PINNs that
can be improved and have been investigated by researchers. For example, the residual points are
usually randomly distributed in the domain or grid points on a lattice, and other methods of training
point sampling and distribution have been proposed to achieve better accuracy when using the same
number of training points, such as the residual-based adaptive refinement (RAR) [4] and importance
sampling [13]. The standard loss function in PINNs is the mean square error, and Refs. [14, 15]
show that a properly-designed non-uniform training point weighting can improve the accuracy. In
PINNs, there are multiple loss terms corresponding to the PDE and initial/boundary conditions,
and it is critical to balance these different loss terms [16, 17]. Domain decomposition can be used
for problems in a large domain [18, 19, 20]. Neural network architectures can also be modified to
satisfy automatically and exactly the required Dirichlet boundary conditions [21, 22, 23], Neumann
boundary conditions [24, 25], Robin boundary conditions [26], periodic boundary conditions [27, 9],
and interface conditions [26]. In addition, if some features of the PDE solutions are known a-
priori, it is also possible to encode them in network architectures, for example, multi-scale and
high-frequency features [28, 29, 30, 11, 31]. Moreover, the constraints in PINNs are usually soft
constraints, and hard constraints can be imposed by using the augmented Lagrangian method [9].
In PINNs, we aim to train a neural network to minimize the PDE residual for each PDE,
and thus we only use the PDE residual as the corresponding loss for each PDE. This idea is
straightforward and used by many researchers in the area, and no attention has been paid to other
types of losses for a PDE yet. However, if the PDE residual is zero, then it is clear that the gradient
of the PDE residual should also be zero.
In this work, we develop the gradient-enhanced PINN (gPINN), which uses a new type of loss
functions by leveraging the gradient information of the PDE residual to improve the accuracy of
PINNs. The general idea of using gradient information has been demonstrated to be useful for
function regression such as via Gaussian process [32] and neural networks (see the review article
2
[33]). However, these methods were only developed for function regression by using the function
gradients in addition to function values, and they cannot be used to solve forward or inverse
PDEs. The only related technique is the input gradient regularization, which uses the gradient
of the loss with respect to the network input as an additional loss [34]. However, input gradient
regularization is an alternative technique of L1 and L2 regularization used to improve classification
accuracy [34], adversarial robustness [35, 36, 37], and interpretability [38, 35, 39]. Hence, as we
use the gradient of the PDE residual term with respect to the network inputs, our method is
fundamentally different from these prior approaches which use gradient information for solving
different objectives. In addition, we combine gPINN with the aforementioned RAR method to
further improve the performance.
The paper is organized as follows. In Section 2, after introducing the algorithm of PINN,
we present the extension to gPINN and gPINN with RAR. In Section 3, we demonstrate the
effectiveness of gPINN and RAR for eight different problems, including function approximation,
forward problems of PDEs, and inverse PDE-based problems. We systematically compare the
performance of PINN, gPINN, PINN with RAR, and gPINN with RAR. Finally, we conclude the
paper in Section 5.
2. Methods
We first provide a brief overview of physics-informed neural networks (PINNs) for solving for-
ward and inverse partial differential equations (PDEs) and then present the method of gradient-
enhanced PINNs (gPINNs) to improve the accuracy of PINNs. Next we discuss how to use the
residual-based adaptive refinement (RAR) method to further improve gPINNs.
We consider the following PDE for the solution u(x, t) parametrized by the parameters λ defined
on a domain Ω:
∂2u ∂2u
∂u ∂u
f x; ,..., ; ,..., ; . . . ; λ = 0, x = (x1 , · · · , xd ) ∈ Ω, (1)
∂x1 ∂xd ∂x1 ∂x1 ∂x1 ∂xd
We note that in PINNs the initial condition is treated in the same way as the Dirichlet boundary
condition.
3
To solve the PDE via PINNs, we first construct a neural network û(x; θ) with the trainable
parameters θ to approximate the solution u(x). We then use the constraints implied by the PDE
and the boundary conditions to train the network. Specifically, we use a set of points inside the
domain (Tf ) and another set of points on the boundary (Tb ). The loss function is then defined
as [3, 4]
L(θ; T ) = wf Lf (θ; Tf ) + wb Lb (θ; Tb ) ,
where 2
∂ 2 û ∂ 2 û
1 X ∂ û ∂ û
Lf (θ; Tf ) = f x; ,..., ; ,..., ;...;λ , (2)
|Tf | ∂x1 ∂xd ∂x1 ∂x1 ∂x1 ∂xd
x∈Tf
1 X
Lb (θ; Tb ) = |B(û, x)|2 , (3)
|Tb |
x∈Tb
to learn the unknown parameters simultaneously with the solution u. Our new loss function is then
defined as
L(θ, λ; T ) = wf Lf (θ, λ; Tf ) + wb Lb (θ, λ; Tb ) + wi Li (θ, λ; Ti ) .
In PINNs, we only enforce the PDE residual f to be zero; because f (x) is zero for any x, we
know that the derivatives of f are also zero. Here, we assume that the exact solution of the PDE
is smooth enough such that the gradient of the PDE residual ∇f (x) exist, and then propose the
gradient-enhanced PINNs to enforce the derivatives of the PDE residual to be zero as well, i.e.,
∂f ∂f ∂f
∇f (x) = , ,··· , = 0, x ∈ Ω.
∂x1 ∂x2 ∂xd
4
Then the loss function of gPINNs is:
d
X
L = wf Lf + wb Lb + wi Li + wgi Lgi (θ; Tgi ) ,
i=1
∂f
Here, Tgi is the set of residual points for the derivative ∂xi . Although Tf and Tgi (i = 1, · · · , d) can
be different, in this study we choose Tgi to be the same as Tf . In this study, we choose the weights
wg1 = wg2 = · · · = wgd . We determine the optimal value of the weight by grid search. We find
empirically from our numerical results that the performance of gPINN is sensitive to the weight
value in some PDEs (e.g., the Poisson equation in Section 3.2.1), while it is not sensitive for other
PDEs (e.g., the diffusion-reaction equation in Section 3.2.2).
For example, for the Poisson’s equation ∆u = f in 1D, the additional loss term is
2
1 X d3 u df
Lg = wg 3
− .
|Tg | dx dx
x∈Tg
For the Poisson’s equation in 2D, there are two additional loss terms:
2
1 X ∂3u ∂3u ∂f
Lg1 = wg 1 3
+ 2
− ,
|Tg1 | ∂x ∂x∂y ∂x
x∈Tg1
2
1 X ∂3u ∂ 3 u ∂f
Lg2 = wg 2 + − .
|Tg2 | ∂x2 ∂y ∂y 3 ∂y
x∈Tg2
As we will show in our numerical examples, by enforcing the gradient of the PDE residual,
gPINN improves the accuracy of the predicted solutions for u and requires less training points.
∂u
Moreover, gPINN improves the accuracy of the predicted solutions for ∂xi . One motivation of
gPINN is that the PDE residual of PINNs usually fluctuates around zero, and penalizing the slope
of the residual would reduce the fluctuation and make the residual closer to zero.
However, theoretical analysis of gPINN and the benefits of incorporating gradient information
requires extensive work, which is beyond the scope of this study. Here we provide some justification
from the works of input gradient regularization [34] for classification problems. Although gPINN
and input gradient regularization are developed in different setups for different objectives, they
both use the gradient of the original loss with respect to the network input as an additional loss
term. It has been observed empirically that the input gradient regularization improves the network
5
generalization [34]. It has also been demonstrated both empirically [35, 36, 37] and theoretically
[37] that input gradient regularization improves the network robustness against adversarial samples.
The benefits of generalization and robustness may also apply to gPINN. For PINNs, using more
training points leads to a smaller generalization error and thus a better accuracy [40]. Compared
to PINNs, better generalization of gPINN means that gPINN would be more generalizable to the
regions without training points and thus have a better accuracy by using less training points. While
there has not been theoretical study of the robustness of PINNs, intuitively if gPINN has better
robustness, then for inverse problems, it would tolerate larger noise of the data measurements,
especially for a small number of training points.
Step 3 Add m new points to the training set T where the residual is the largest.
Step 4 Repeat Steps 1, 2, and 3 for n times, or until the mean residual falls below a threshold E.
3. Results
We apply our proposed gPINNs and gPINNs with RAR to solve several forward and inverse
PDE problems. In all examples, we use the tanh as the activation function, and the other hyper-
parameters for each example are listed in Table 1. All the codes in this study are implemented
by using the library DeepXDE [4] with TensorFlow 1 backend and will be deposited in GitHub at
https://github.com/lu-group/gpinn.
6
Section number and Problem Depth Width Optimizer Learning rate # Iterations
Table 1: Hyperparameters for the 9 problems tested in this study. The Brinkman-Forchheimer problem in
Section 3.3.1 has 3 sub-problems.
from the training dataset {(x1 , u(x1 )), (x2 , u(x2 )), · · · , (xn , u(xn ))}, where (x1 , x2 , · · · , xn ) are eq-
uispaced points in [0, 1]. The standard loss function to train a NN is
n
1X
L= |u(xi ) − û(xi )|2 ,
n
i=1
and we also consider the following gradient-enhanced NN with the extra loss function of the gradient
as
n n
1X 1X
L= |u(xi ) − û(xi )|2 + wg |∇u(xi ) − ∇û(xi )|2 .
n n
i=1 i=1
We performed the network training using different values of the weight wg , including 1, 0.1, and
0.01, and found that the accuracy of gNN is insensitive to the value of wg . Hence, here we will only
show the results of wg = 1.
When we use more training points, both NN and gNN have smaller L2 relative error of the
prediction of u, and gNN performs significantly better than NN with about one order of magnitude
smaller error (Fig. 1A). In addition, gNN is more accurate than NN for the prediction of the
du du
derivative dx (Fig. 1B). As an example, the prediction of u and dx from NN and gNN using 15
training data points are shown in Figs. 1C and D, respectively. The standard NN has more than
du
10% error for u and dx , while gNN reaches about 1% error.
7
Figure 1: Example in Section 3.1: Comparison between NN and gNN. (A and B) L2 relative error of NN
du
and gNN for (A) u and (B) dx
using different number of training points. The line and shaded region represent the
du
mean and one standard deviation of 10 independent runs. (C and D) Example of the predicted (C) u and (D) dx
,
respectively. The black dots show the locations of the 15 training data points.
8
3.2. Forward PDE problems
After demonstrating the effectiveness of adding the gradient loss on the function approximation,
we apply gPINN to solve PDEs.
with the Dirichlet boundary conditions u(x = 0) = 0 and u(x = π) = π. The analytic solution is
4
X sin(ix) sin(8x)
u(x) = x + + .
i 8
i=1
Instead of using a loss function Lb for the Dirichlet boundary conditions, we enforce it by choosing
the surrogate of the solution as
L = Lf + wLg ,
9
Figure 2: Example in Section 3.2.1: Comparisons between PINN and gPINNs with the loss weight
w = 1 and 0.01 (A) L2 relative error of u. (B) L2 relative error of u0 . (C) The mean value of the PDE residual
after training. (D and E) Example of the predicted u and u0 , respectively, when using 15 training points. The black
dots show the locations of the residual points for training. (F and G) L2 relative errors of gPINN for u and u0
with different values of the weight w when using 20 training points. The shaded regions represent the one standard
deviation of 10 random runs. (H, I, J) Comparison between hard and soft constraints for PINN and gPINN. (H) L2
relative error of u. (I) L2 relative error of u0 . (J) The mean value of the PDE residual after training.
10
using 20 residual points (Figs. 2F and G). When w is small and close to 0, then gPINN becomes
a standard PINN (the black horizontal line in Figs. 2F and G). When w is very large, the error of
PINN increases. There exists an optimal weight at around w = 0.01. When w is smaller than 1,
gPINN always outperforms PINN.
In the results above, we enforce the boundary condition exactly as a hard constraint through
the network architecture of Eq. (5). Here we also compare PINN and gPINN for soft constraints
of boundary conditions by choosing û(x) = N (x) and using the additional loss term of Eq. (3).
du
Compared to hard constraints, by using soft constraints, the errors of PINN for both u and dx
increases (Figs. 2H and I), although the PDE residual is similar (Fig. 2J). However, the errors of
gPINN with hard or soft constraints are almost the same (Figs. 2H, I and J). Therefore, when using
soft constraints, gPINN is even more accurate than PINN.
u(−π, t) = u(π, t) = 0,
Similar as the previous example, we also choose a proper surrogate of the solution to satisfy the
initial and boundary conditions automatically:
where N (x) is a neural network. Here, we have two loss terms of the gradient, and the total loss
function is
L = Lf + wLgx + wLgt ,
11
where Lgx and Lgt are the derivative losses with respect to x and t, respectively.
In the Poisson equation, the performance of gPINN depends on the value of the weight, but in
this diffusion-reaction system, gPINN is not sensitive to the value of w. gPINN with the values of
w = 0.01, 0.1, and 1 all outperform PINN by up to two orders of magnitude for the L2 relative
du du
errors of u, dx and dt , and the mean absolute error of the PDE residual (Fig. 3). gPINN reaches
1% L2 relative error of u by using only 40 training points, while PINN requires more than 100
points to reach the same accuracy.
Figure 3: Example in Section 3.2.2: Comparison between PINN and gPINN. (A) L2 relative error of u for
PINN and gPINN with w =1, 0.1, and 0.01. (B) Mean absolute value of the PDE residual. (C) L2 relative error of
du
dx
. (D) L2 relative error of du
dt
.
As an example, we show the exact solution, the predictions, and the error of PINN and gPINN
with w = 0.01 in Fig. 4 when the number of the residual points is 50. The PINN prediction has a
large error of about 100%. However, the gPINN prediction’s largest absolute error is around 0.007
and the L2 relative error of 0.2%.
We note that gPINN appears to plateau at around 130 training points (Fig. 3), which is due to
network optimization. We show that a better accuracy is achieved using a smaller learning rate of
12
Figure 4: Example in Section 3.2.2: Comparison between PINN and gPINN using 50 residual points
for training. (A) The exact solution in Eq. (6). (B and C) The (B) prediction and (C) absolute error of PINN. (D
and E) The (D) prediction and (E) absolute error of gPINN with w = 0.1.
13
10−6 and more iterations (5 × 106 ) in Fig. 5. The L2 relative error of gPINN for u and du
dx decreases
to less than 0.01% using 140 training points and does not saturate. In this case, the accuracy of
PINN is also worse than gPINN, especially when the number of training points is small.
Figure 5: Example in Section 3.2.2: PINN and gPINN are trained with a smaller learning rate and
more iterations. L2 relative error of (A) u and (B) du
dx
for PINN and gPINN with w =0.1.
with zero Dirichlet boundary conditions. We choose f according to the following exact solution:
with a = 10.
We use hard boundary constraints for both PINN and gPINN, and train PINN and gPINN
with 400 residual points. The weight of the additional loss terms in gPINN is chosen as 10−5 . The
solutions and errors of PINN and gPINN are shown in Fig. 6. The solution of PINN is accurate in
most part of the domain. However, because the solution has a peak in the center of the domain,
PINN has a large error near the center. By contrast, gPINN is very accurate in the entire domain.
In addition to solving forward PDE problems, we also apply gPINN for solving inverse PDE
problems.
14
Figure 6: Example in Section 3.2.3: Comparison between PINN and gPINN for a 2D Poisson problem.
3.3.1. Inferring the effective viscosity and permeability for the Brinkman-Forchheimer model
The Brinkman-Forchheimer model can be viewed as an extended Darcy’s law and is used to
describe wall-bounded porous media flows:
νe 2 νu
− ∇ u+ = g, x ∈ [0, H],
K
where the solution u is the fluid velocity, g denotes the external force, ν is the kinetic viscosity of
fluid, is the porosity of the porous medium, and K is the permeability. The effective viscosity,
νe , is related to the pore structure and hardly to be determined. A no-slip boundary condition is
imposed, i.e., u(0) = u(1) = 0. The analytic solution for this problem is
" #
gK cosh r x − H2
u(x) = 1−
cosh rH
ν 2
15
the example using only 10 PDE residual points. While PINN failed to predict u near the boundary
with a steep gradient, gPINN can still have a good accuracy (Fig. 7D). During the training, the
predicted νe in PINN did not converge to the true value, while in gPINN the predicted νe is more
accurate (Fig. 7E).
Figure 7: Example in Section 3.3.1: Inferring νe by using PINN and gPINN from 5 measurements of
u. (A) Relative error of νe . (B) L2 relative error of u. (C) L2 relative error of du
dx
. (D) Example of the predicted u
using 5 observations of u and 10 PDE residual points. The black squares in D show the observed locations. (E) The
convergence of the predicted value for νe throughout training.
To further test the performance of gPINN, we next infer both νe and K still from 5 measurements
of u. Similarly, the PINN solution of u struggles with the regions near the boundary, but gPINN
can still achieve a good accuracy (Fig. 8A). Both PINN and gPINN converge to an accurate value
of K (Fig. 8C), but gPINN converges to much more accurate value for νe than PINN (Fig. 8B).
Next, we add Gaussian noise (mean 0 and standard deviation 0.05) to the observed values and
infer both νe and K using 12 measurements of u (Fig. 9). Both PINN and gPINN converge to
an accurate value for K. Whereas PINN struggles with the added noise and struggles to learn u
and νe , gPINN performs very well in both. However, after doubling the number of PDE training
points from 15 to 30 (“PINN 2x” in Fig. 9), PINN can also perform well at inferring νe , though
the performance is still slightly worse than that of gPINN.
16
Figure 8: Example in Section 3.3.1: Inferring both νe and K. (A) The predicted u from PINN and gPINN.
(B) The convergence of the predicted value for νe throughout training. (C) The convergence of the predicted value
for K. The black squares in A show the observed locations of u.
17
Figure 9: Example in Section 3.3.1: Inferring both νe and K from noise data. (A) The predicted u from
PINN and gPINN. “PINN 2x” is PINN with twice more PDE training points. The black squares show the observed
measurements of u. (B) The convergence of the predicted value for νe throughout training. (C) The convergence of
the predicted value for K.
18
3.3.2. Inferring the space-dependent reaction rate in a diffusion-reaction system
We consider a one-dimensional diffusion-reaction system in which the reaction rate k(x) is a
space-dependent function:
∂2u
λ − k(x)u = f, x ∈ [0, 1],
∂x2
where λ = 0.01 is the diffusion coefficient, u is the solute concentration, and f = sin(2πx) is the
source term. The objective is to infer k(x) given measurements on u. The exact unknown reaction
rate is
(x − 0.5)2
k(x) = 0.1 + exp −0.5 .
0.152
In addition, the condition u(x) = 0 is imposed at x = 0 and 1.
As the unknown parameter k is a function of x instead of just one constant, in addition to the
network of u, we use another network to approximate k. We choose the weight w = 0.01 in gPINN.
We test the performance of PINN and gPINN by using 8 observations of u and 10 PDE residual
points for training. Both PINN and gPINN perform well in learning the solution u, though the
PINN solution slightly deviates from the exact solution around x = 0.8 (Fig. 10A). However, for
the inferred function k, gPINN’s prediction was much more accurate than PINN (Fig. 10B). Also,
du
the prediction of dx by gPINN is more accurate than the prediction of PINN.
To further improve the accuracy and training efficiency of gPINN for solving PDEs with a
stiff solution, we apply RAR to adaptively improve the distribution of residual points during the
training process.
with ν = 0.01/π.
We first test PINN and gPINN for this problem. PINN converges very slowly and has a large
L2 relative error, while gPINN achieves one order of magnitude smaller error (∼0.2%) (the blue
and red lines in Fig. 11), as we expected.
19
Figure 10: Example in Section 3.3.2: Comparison between PINN and gPINN. (A) The prediction of u.
du
(B) The prediction of k. (C) The prediction of dx
. We used 8 observations of u (the black squares in A) and 10
residual points for training.
Figure 11: Example in Section 3.4.1: L2 relative errors of PINN, gPINN, PINN with RAR, and gPINN
with RAR. For RAR, we started from 1500 uniformly-distributed residual points and added 400 extra points.
20
The solution to this 1D Burgers’ equation is very steep near x = 0, so intuitively there should
be more residual points around that region. We first show the effectiveness of PINN with RAR
proposed in Ref. [4]. For RAR, we first train the network using 1500 uniformly-distributed residual
points and then gradually add 400 more residual points during training. We added 10 new points
at a time, i.e., m = 10 in Algorithm 1. For the Burgers’ equation, the solution has a steep gradient
around x = 0, and after the initial training of 1500 residual points, the region around x = 0
has the largest error of u and the PDE residual (Fig. 12). RAR automatically added new points
near the largest error, as shown in the left column in Fig. A.17, and then the errors of u and the
PDE residual consistently decreases as more points are added. By using RAR, the error of PINN
decreases very fast, and PINN achieves the L2 relative error of ∼0.3% by using only 1900 residual
points for training (the green line in Fig. 11).
We also used gPINN together with RAR. gPINN with RAR also added new points near x =
0, and the errors of u and the PDE residual consistently decrease when more points are added
adaptively (Fig. 12), similarly to PINN with RAR. The final accuracy of PINN with RAR and
gPINN with RAR is similar, but the error of gPINN with RAR drops much faster than PINN with
RAR when using only 100 extra training points. Therefore, by pairing gPINN together with the
RAR, we can achieve the best performance.
u(x, 0) = x2 cos(πx),
where D = 0.001. The solution to this Allen–Cahn equation has multiple very steep regions similar
to that of the Burgers’ equation.
First, comparing PINN and gPINN, we can once again observe that gPINN has better accuracy
than PINN (the blue and red lines in Fig. 13). gPINN requires around 2000 training points to
reach 1% error, while PINN requires around 4000 training points to reach that same accuracy.
We next show the behavior and effectiveness of PINN with RAR and gPINN with RAR again.
We first train the network using 500 uniformly-distributed residual points and then gradually add
21
Figure 12: Example in Section 3.4.1: Comparison between PINN with RAR and gPINN with RAR.
(First column) The absolute error of PINN for u. (Second column) The absolute error of gPINN for u. (Third
column) The absolute error of PINN for the PDE residual. (Fourth column) The absolute error of gPINN for the
PDE residual. (First row) No extra points have been added. (Second row) 100 extra points have been added. (Third
row) 200 extra points have been added. (Fourth row) 300 extra points have been added. (Fifth row) 400 extra points
have been added.
22
Figure 13: Example in Section 3.4.2: L2 relative errors of PINN, gPINN, PINN with RAR, and gPINN
with RAR. For RAR, there were 500 initial points and 3000 added points.
3000 more residual points during training with 30 training points added at a time. The solution u
has two peaks around x = −0.5 and x = 0.5, where the largest error occurs (Fig. 14B). The added
points by RAR also fall on these two regions of high error, as shown in Fig. 14G. The error becomes
nearly uniform with 1200 added points (Fig. 14H), and then the added points start to become more
and more uniform (Figs. 14J and M). By using RAR, the error of gPINN decreases drastically fast,
and by adding only 200 additional points (i.e., 700 in total), gPINN reaches 1% error. However,
gPINN with RAR begins to plateau at about 1500 training points with approximately 0.1% error,
which could be resolved by using a smaller learning rate as we show in Section 3.2.2.
4. Discussion
We have showed that gPINN achieves improved accuracy over PINN using the same number
of training points for forward problems and learns the unknown parameters more accurately in
inverse problems. Here we discuss the computational cost of gPINN and a further extension by
using higher-order derivatives of the PDE residual.
When using the same number of residual points (each loss term is evaluated with the entire
training set), gPINN achieves better accuracy than PINN, but the computational cost of gPINN is
higher than PINN because of the additional loss terms with higher order derivatives. To quantify
the computational overhead of gPINN, we define the relative computational cost of gPINN to PINN
as the training time of gPINN divided by the training time of PINN, which depends on the specific
problem such as the highest derivative order and the complexity of the PDE.
23
Figure 14: Example in Section 3.4.2: gPINN with RAR. (A, B, C) No extra points have been added. (A)
The initial distribution of the 500 residual points. (B) The absolute error of u. (C) The absolute error of the PDE
residual. (D, E, F) 300 extra points (point locations shown in D) have been added. (G, H, I) 1200 extra points
(point locations shown in G) have been added. (J, K, L) 2100 extra points (point locations shown in J) have been
added. (M, N, O) 3000 extra points (point locations shown in M) have been added.
24
For example, for the Burgers’ equation in Section 3.4.1, the relative cost for different number of
training points in one trial is shown in Table 2, ranging from 1.85 to 2.01. The average relative cost
is ∼1.94, i.e., gPINN is about twice as computationally expensive as PINN. However, even if we
take the relative cost into consideration, gPINN is still more accurate than PINN in some cases. For
example, the cost of PINN with 4000 training points is twice the cost of PINN with 2000 points,
and thus the cost of PINN with 4000 training points is comparable to the cost of gPINN with
2000 training points, but gPINN with 2000 training points is more accurate than PINN with 4000
training points, as shown in Fig. 11. All computations are performed using GPU on a workstation
with an Intel Core i9-11900F CPU and a NVIDIA GeForce RTX 3090 GPU.
No. of training points 1500 2000 2500 3000 3500 4000 Average
Table 2: Relative computational cost of gPINN to PINN for the Burgers’ equation in Section 3.4.1.
The number of training points for both PINN and gPINN varies from 1500 to 4000.
We list the relative computational cost of gPINN to PINN for all the problems in Table 3. The
relative cost is between ∼1.5 to ∼2.0. We note that due to the computational overhead, gPINN
is not always more computationally efficient than PINN. For example, for the Poisson equation
(Figs. 2A and B), the relative cost is 1.69, and gPINN with 12 training points is less accurate than
PINN with 20 training points.
Table 3: Relative computational cost of gPINN to PINN for all the problems.
We also note that in this work, we compute higher-order derivatives by applying automatic
differentiation (AD) of first-order derivative recursively. However, this nested approach results in
25
combinatorial amounts of redundant work and is not efficient [41, 42], and other methods, e.g.,
Taylor-mode AD, can be used for better computational performance. As shown in [43], if we
implement higher-order derivatives efficiently, we can reduce the computational cost by at least one
order of magnitude, by using which the computational overhead of gPINN will be greatly reduced.
A more efficient implementation of gPINN needs to be conducted in the future, but this heavily
relies on the development of the deep learning libraries such as TensorFlow and PyTorch.
We have used the first-order derivatives of the PDE residual as additional loss terms. As an
extension, we may also consider higher-order derivatives. For example, in addition to the first-order
derivatives, we can also add the second-order derivatives of the PDE residual
2
∂2f
,
∂xi ∂xj
i.e., both first-order and second-order derivatives are in the loss. In this section, we show that
higher-order derivatives have the following two effects:
• Accuracy (Section 4.2.1): Adding higher-order derivatives does not always improve accuracy.
• Computational cost (Section 4.2.2): The cost of gPINN increases exponentially with respect
to the order of derivatives.
26
Figure 15: Comparisons between gPINN with first-, second- and third-order derivatives for the Poisson
equation in Section 3.2.1. (A) L2 relative error of u. (B) L2 relative error of u0 .
Therefore, for the Poisson equation, the highest order of derivative that we can use is the second
order.
We also test the Burgers’ equation in Section 3.4.1 by adding the second-order derivatives.
Because the weight of the loss term of the first-order derivatives is 10−4 , we try different weights of
the second-order derivatives from 10−4 to 10−7 . We find that by adding the second-order derivatives,
gPINN does not converge at all (Fig. 16). The reason is that gPINN with second-order derivatives
is very difficult to train — the training loss is only decreased by one order of magnitude.
Figure 16: L2 relative error of PINN and gPINN with first-order and second-order derivatives for the
Burgers’ equation in Section 3.4.1. “gPINN2” denotes gPINN with the second-order derivatives. The standard
deviations for gPINN2 are not shown, to keep the figure readable.
From the results of the Poisson equation and the Burgers’ equation, we find that adding higher-
order derivatives does not always improve accuracy. One possible reason could be that the higher-
27
order derivatives of the PDE residual may have bad regularity, which makes the network optimiza-
tion more difficult.
Table 4: Relative computational cost of gPINN with different order of the highest derivative to PINN.
5. Conclusion
In this paper, we proposed a new version of physics-informed neural networks (PINNs) with gra-
dient enhancement (gPINNs) for improved accuracy. We demonstrated the effectiveness of gPINN
in both forward and inverse PDE problems, including Poisson equation, diffusion-reaction equation,
Brinkman-Forchheimer model, Burgers’ equation, and Allen-Cahn equation. Our numerical results
from all of the examples show that gPINN clearly outperforms PINN with the same number of
training points in terms of the L2 relative errors of the solution and the derivatives of the solution.
For the inverse problems, gPINN learned the unknown parameters more accurately than PINN. In
addition, we combined gPINN with residual-based adaptive refinement (RAR) to further improve
the performance. For the PDEs with solutions that had especially steep gradients, such as Burg-
ers’ equation and the Allen-Cahn equation, RAR allowed gPINN to perform well with much fewer
residual points.
In our numerical results, we showed the improved accuracy of gPINN compared to PINN, and
future theoretical work should be done to understand the benefits of incorporating gradient infor-
mation. Compared to PINN, in gPINN we have an extra hyperparameter—the weight coefficient
of the gradient loss. In some problems, the performance of gPINN is not sensitive to this weight,
28
but in some cases, there exists an optimal weight to achieve the best accuracy, and thus we need to
tune the weight. In this study, we determine the optimal value of the weight by grid search, and it
is an interesting research topic in the future to automatically determine an optimal weight such as
through the analysis of neural tangent kernel [17]. Moreover, it is possible to combine gPINN with
other extensions of PINN to further improve the performance, such as extended PINN (XPINN) [19]
and parareal PINN (PPINN) [18]. In this paper, we have not tested gPINN for high-dimensional
problems, which will be a future work. We note that the number of the additional loss terms is the
same as the dimension, and thus gPINN becomes more expensive for high-dimensional problems.
Acknowledgements
This work was supported by the DOE PhILMs project (no. DE-SC0019453) and OSD/AFOSR
MURI grant FA9550-20-1-0358. J.Y. and L.L. thank MIT’s PRIMES-USA program.
References
[4] L. Lu, X. Meng, Z. Mao, G. E. Karniadakis, DeepXDE: A deep learning library for solving
differential equations, SIAM Review 63 (1) (2021) 208–228.
29
[7] D. Zhang, L. Guo, G. E. Karniadakis, Learning in modal space: Solving time-dependent
stochastic PDEs using physics-informed neural networks, SIAM Journal on Scientific Comput-
ing 42 (2) (2020) A639–A665.
[8] Y. Chen, L. Lu, G. E. Karniadakis, L. Dal Negro, Physics-informed neural networks for inverse
problems in nano-optics and metamaterials, Optics Express 28 (8) (2020) 11618–11633.
[10] M. Raissi, A. Yazdani, G. E. Karniadakis, Hidden fluid mechanics: Learning velocity and
pressure fields from flow visualizations, Science 367 (6481) (2020) 1026–1030.
[11] A. Yazdani, L. Lu, M. Raissi, G. E. Karniadakis, Systems biology informed deep learning
for inferring parameters and hidden dynamics, PLoS Computational Biology 16 (11) (2020)
e1007575.
[14] Y. Gu, H. Yang, C. Zhou, Selectnet: Self-paced learning for high-dimensional partial differen-
tial equations, arXiv preprint arXiv:2001.04860 (2020).
[16] S. Wang, Y. Teng, P. Perdikaris, Understanding and mitigating gradient pathologies in physics-
informed neural networks, arXiv preprint arXiv:2001.04536 (2020).
[17] S. Wang, X. Yu, P. Perdikaris, When and why PINNs fail to train: A neural tangent kernel
perspective, arXiv preprint arXiv:2007.14527 (2020).
[18] X. Meng, Z. Li, D. Zhang, G. E. Karniadakis, PPINN: Parareal physics-informed neural net-
work for time-dependent PDEs, Computer Methods in Applied Mechanics and Engineering
370 (2020) 113250.
30
[19] A. D. Jagtap, G. E. Karniadakis, Extended physics-informed neural networks (XPINNs): A
generalized space-time domain decomposition based deep learning framework for nonlinear
partial differential equations, Communications in Computational Physics 28 (5) (2020) 2002–
2041.
[20] V. Dwivedi, B. Srinivasan, Physics informed extreme learning machine (PIELM)–a rapid
method for the numerical solution of partial differential equations, Neurocomputing 391 (2020)
96–118.
[21] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and partial
differential equations, IEEE Transactions on Neural Networks 9 (5) (1998) 987–1000.
[22] H. Sheng, C. Yang, PFNN: A penalty-free neural network method for solving a class of second-
order boundary-value problems on complex geometries, arXiv preprint arXiv:2004.06490
(2020).
[23] N. Sukumar, A. Srivastava, Exact imposition of boundary conditions with distance functions
in physics-informed deep neural networks, arXiv preprint arXiv:2104.08426 (2021).
[24] K. S. McFall, J. R. Mahan, Artificial neural network method for solution of boundary value
problems with exact satisfaction of arbitrary boundary conditions, IEEE Transactions on Neu-
ral Networks 20 (8) (2009) 1221–1233.
[25] R. S. Beidokhti, A. Malek, Solving initial-boundary value problems for systems of partial dif-
ferential equations using neural networks and optimization techniques, Journal of the Franklin
Institute 346 (9) (2009) 898–913.
[27] S. Dong, N. Ni, A method for representing periodic functions and enforcing exactly periodic
boundary conditions with deep neural networks, arXiv preprint arXiv:2007.07442 (2020).
[28] W. Cai, X. Li, L. Liu, A phase shift deep neural network for high frequency approximation
and wave problems, SIAM Journal on Scientific Computing 42 (5) (2020) A3285–A3312.
31
[29] B. Wang, W. Zhang, W. Cai, Multi-scale deep neural network (MscaleDNN) methods for
oscillatory Stokes flows in complex domains, arXiv preprint arXiv:2009.12729 (2020).
[30] Z. Liu, W. Cai, Z. J. Xu, Multi-scale deep neural network (MscaleDNN) for solving Poisson-
Boltzmann equation in complex domains, arXiv preprint arXiv:2007.11207 (2020).
[31] S. Wang, H. Wang, P. Perdikaris, On the eigenvector bias of fourier feature networks: From
regression to solving multi-scale PDEs with physics-informed neural networks, arXiv preprint
arXiv:2012.10047 (2020).
[32] Y. Deng, G. Lin, X. Yang, Multifidelity data fusion via gradient-enhanced gaussian process
regression, arXiv preprint arXiv:2008.01066 (2020).
[35] A. S. Ross, F. Doshi-Velez, Improving the adversarial robustness and interpretability of deep
neural networks by regularizing their input gradients, in: Thirty-second AAAI conference on
artificial intelligence, 2018.
[36] A. G. Ororbia II, D. Kifer, C. L. Giles, Unifying adversarial training algorithms with data
gradient regularization, Neural computation 29 (4) (2017) 867–887.
[37] C. Finlay, A. M. Oberman, Scaleable input gradient regularization for adversarial robustness,
Machine Learning with Applications 3 (2021) 100017.
[38] A. S. Ross, M. C. Hughes, F. Doshi-Velez, Right for the right reasons: Training differentiable
models by constraining their explanations, arXiv preprint arXiv:1703.03717 (2017).
[39] A. Ross, I. Lage, F. Doshi-Velez, The neural LASSO: Local linear sparsity for interpretable
explanations, in: Workshop on Transparent and Interpretable Machine Learning in Safety
Critical Environments, 31st Conference on Neural Information Processing Systems, 2017.
32
[40] Y. Shin, J. Darbon, G. E. Karniadakis, On the convergence of physics informed neural net-
works for linear second-order elliptic and parabolic type pdes, arXiv preprint arXiv:2004.01806
(2020).
[42] C. C. Margossian, A review of automatic differentiation and its efficient implementation, Wiley
interdisciplinary reviews: data mining and knowledge discovery 9 (4) (2019) e1305.
33
Appendix A. RAR for Burgers’ equation
Figure A.17: Locations of added points in PINN with RAR and gPINN with RAR for Burgers’ equation.
(Left) PINN. (Right) gPINN. (First row) The initial distribution of the 1500 residual points. No extra points have
been added. (Second row) 100 extra points have been added. (Third row) 200 extra points have been added. (Fourth
row) 300 extra points have been added. (Fifth row) 400 extra points have been added.
34