A Hands-On Introduction To Physics-Informed Neural Networks For Solving PDE
A Hands-On Introduction To Physics-Informed Neural Networks For Solving PDE
March 1, 2024
Abstract
I provide an introduction to the application of deep learning and neural networks for solving
partial differential equations (PDEs). The approach, known as physics-informed neural networks
(PINNs), involves minimizing the residual of the equation evaluated at various points within the
domain. Boundary conditions are incorporated either by introducing soft constraints with corre-
sponding boundary data values in the minimization process or by strictly enforcing the solution
with hard constraints. PINNs are tested on diverse PDEs extracted from two-dimensional physi-
cal/astrophysical problems. Specifically, we explore Grad-Shafranov-like equations that capture
magnetohydrodynamic equilibria in magnetically dominated plasmas. Lane-Emden equations
that model internal structure of stars in sef-gravitating hydrostatic equilibrium are also con-
sidered. The flexibility of the method to handle various boundary conditions is illustrated
through various examples, as well as its ease in solving parametric and inverse problems. The
corresponding Python codes based on PyTorch/TensorFlow libraries are made available.
∗
Corresponding author: [email protected]
1
Contents
1 Introduction 3
3 Laplace equation 10
3.1 Using vanilla-PINNs on Dirichlet BCs problem . . . . . . . . . . . . . . . . . . . . . 11
3.2 Using hard-PINNs on Dirichlet BCs problems . . . . . . . . . . . . . . . . . . . . . . 12
4 Poisson equations 14
4.1 Using vanilla-PINNs on Dirichlet problems . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Using vanilla-PINNs on Neumann problems . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Using vanilla-PINNs on Neumann-Dirichlet problems . . . . . . . . . . . . . . . . . . 17
4.4 Using vanilla-PINNs on Cauchy problems . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Helmholtz equations 18
5.1 Specific problem related to astrophysics . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Solutions using vanilla-PINNs with Dirichlet BCs . . . . . . . . . . . . . . . . . . . . 20
5.3 More general Helmholtz problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Grad-Shafranov equations 20
6.1 Example of Soloviev equlibrium: the drop-like structure . . . . . . . . . . . . . . . . 21
6.2 Examples of Soloviev equilibria: toroidal fusion devices . . . . . . . . . . . . . . . . . 22
6.3 Other examples of more general equilibria in a rectangular domain . . . . . . . . . . 23
7 Lane-Emden equations 24
7.1 A mathematical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.2 A physical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
11 Appendix 35
2
1 Introduction
Since the introduction of Physics-Informed Neural Networks (PINNs) by Raissi et al. (2019), there
has been a significant upsurge in interest in the PINNs technique, spanning various scientific fields.
Notably, this technique offers several advantages, such as numerical simplicity compared to conven-
tional schemes. Despite not excelling in terms of performance (accuracy and training computing
time), PINNs present a compelling alternative for addressing challenges that prove difficult for tra-
ditional methods, such as inverse problems or parametric partial differential equations (PDEs). For
comprehensive reviews, refer to Cuomo et al. (2022) and Karniadakis et al. (2021).
In this article, I introduce a tutorial on the PINNs technique applied to PDEs containing terms
based on the Laplacian in two dimensions (2D), extending the previous tutorial work applied to
ordinary differential equations (ODEs) by Baty & Baty (2023). More precisely, I focus on solving
different Poisson-type equations called Grad-Shafranov equations and representing magnetic equi-
libria in magnetically dominated plasmas (e.g. the solar corona), following some examples presented
in Baty & Vigon (2024) and also other well known examples (see Cerfon et al. 2011). Additionally,
PDEs representative of two dimensional internal star structures are also solved, extending a previ-
ous work in a one dimensional approximation (Baty 2023a, Baty 2023b). The latter equations are
generally called Lane-Emden equations in the literature. Note that, not only direct problems are
considered but also inverse problems for which we seek to obtain an unknown term in the equation.
Of course, for these latter problems, additional data on the solutions is required.
I demonstrate how the PINNs technique is particularly well-suited when non Dirichlet-like con-
ditions are imposed at boundaries. This is evident in scenarios involving mixed Dirichlet-Neumann
conditions, especially those relevant to the Cauchy problem.
The distinctive feature of the PINNs technique lies in minimizing the residual of the equation
at a predefined set of data known as collocation points. At these points, the predicted solution is
obligated to satisfy the differential equation. To achieve this, a physics-based loss function associated
with the residual is formulated and employed. In the original method introduced by Raissi et al.
(2019), often referred to as vanilla-PINNs in the literature, the initial and boundary conditions
necessary for solving the equations are enforced through another set of data termed training points,
where the solution is either known or assumed. These constraints are integrated by minimizing a
second loss function, typically a measure of the error like the mean-squared error. This loss function
captures the disparity between the predicted solution and the values imposed at the boundaries. The
integration of the two loss functions results in a total loss function, which is ultimately employed in
a gradient descent algorithm. An advantageous aspect of PINNs is its minimal reliance on extensive
training data, as only knowledge of the solution at the boundary is required for vanilla-PINNs. It’s
worth noting that, following the approach initially proposed by Lagaris (1998), there is another
option to precisely enforce boundary conditions to eliminate the need for a training dataset (see for
example Baty 2023b and Urban et al. 2023). This involves compelling the neural networks (NNs) to
consistently assign the prescribed value at the boundary by utilizing a well-behaved trial function.
The use of this latter second option, also referred as hard-PINNs below, is illustrated in this paper.
This paper is structured as follows. In Section 2, we begin by introducing the fundamentals of the
PINNs approach for solving partial differential equations (PDEs). Section 3 focus on the application
to solve a simple Laplace equation in a rectangular domain, with the aim to compare the use of
vanilla-PINNs versus hard-PINNs on problems involving Dirichlet BCs. The use of PINNs solvers
on Poisson equations with different types of BCs in rectangular domains are illustrated in Sect. 4.
Section 5 focus on the application to a particular Helmholtz equation, representative of magnetic
3
arcade equilibrium structures in the solar corona. Another application for computing magnetic
structures representative of curved loops is also considered as shown by the obtained solution of
Grad-Shafranov equations in Section 6. Section 7 is devoted to another astrophysical problem, that
is solving Lane-Emden equations representative of the internal equilibrium structures of polytropic
seff-gravitating gaz spheres (first order approximation for the structure of stars). Finally, the use
of PINNs for solving parametric differential equation and inverse problem are illustrated in Section
8 and Section 9 respectively, by considering a stationary advection-diffusion problem. Conclusions
are drawn in Section 10.
where u(x, y) denotes the desired solution and ux , uy , ... are the required associated partial deriva-
tives of different orders with respect to x and y. Specific conditions must be also imposed at the
domain boundary ∂Ω depending on the problem (see below in the paper).
Note that the (x, y) space variables can also include non-cartesian coordinates (see Lane-Emden
equation).
where the desired solution is now u(x, µ), with x being the space variable associated to the one
dimensional domain Ω and µ is a scalar parameter taking different values in Ωp . For parametric
problems, µ is treated exactly as a second variable in a 2D direct problem, but for inverse problems
µ is consequently considered as an unknown. Boundary conditions (BC) are again necessary for
parametric problems, but additional conditions, such as knowledge of the solution at some x values
must be added for inverse problems.
Note that, for the sake of simplicity, we have considered only one dimensional space variable
in this work for parametric and inverse problems. The extension to higher spatial dimensions is
however straightforward.
2.3 Classical deep learning approach with neural networks using training data
In the classical deep learning approach with neural networks (NNs), the model is trained exclusively
using available training data. This method involves feeding input data into the neural network,
which then adjusts its internal parameters through a training process to minimize the difference
between predicted and actual output values. The model learns patterns and relationships within
the training dataset to make predictions on new, unseen data. This approach is common in various
machine learning applications, where the emphasis is on leveraging labeled training examples to
achieve accurate predictions. In this way, NNs serve as non-linear approximators.
4
Figure 1: Schematic representation of the structure for a Neural Network (NN) applied to a non-
linear approximation problem. The input layer has two input variables (i.e. two neurons) for
the two space coordinate variables x and y. Three hidden layers with five neurons per layer are
connected with the input and the output layer, where the latter has a single variable (one neuron)
representing the predicted solution uθ (x, y). The minimization procedure using the loss function
Ldata (θ) is obtained by comparing uθ to a training data set of values udata taken in the 2D domain
Ω. In this simplified example, θ represents a total number of 81 scalar parameters.
Approximating the solution with a neural network. For any input x representing either
the spatial coordinates (x, y), or the combination of variables (x, µ), or only x, depending on the
problems, we want to be able to compute an approximation of the solution value u(x) and eventually
the parameter value µ (for inverse problems).
For this, we introduce what is called a multi-layer perceptron, which is one of the most common
kind of neural networks. Note that any other statistical model could alternatively be used. The
goal is to calibrate its parameters θ such that uθ approximates the target solution u(x). uθ is a
non-linear approximation function, organized into a sequence of L + 1 layers. The first layer N 0 is
called the input layer and is simply:
N 0 (x) = x. (3)
Each subsequent layer ℓ is parameterized by its weight matrix W ℓ ∈ Rdℓ−1 ×dℓ and a bias vector
bℓ ∈ Rdℓ , with dℓ defined as the output size of layer ℓ. Layers ℓ with ℓ ∈ J1, L − 1K are called hidden
layers, and their output value can be defined recursively:
σ is a non-linear function, generally called activation function. While the most commonly used one
is the ReLU (ReLU(x) = max(x, 0)), we use the hyperbolic tangent tanh in this work, which is
5
more suited than ReLU for building PINNs. The final layer is the output layer, defined as follows:
Finally, the full neural network uθ is defined as uθ (x) = N L (x). It can be also written as a sequence
of non-linear functions
uθ (x) = (N L ◦ N L−1 ... N 0 )(x), (6)
where ◦ denotes the function composition and θ = {W l , bl }l=1,L represents the parameters of the
network.
Supervised learning approach using training data. The classical supervised learning ap-
proach assumes that we have at our disposal a dataset of Ndata known input-output pairs (x, u):
D = {(xdata
i , udata
i )}N data
i=1 ,
for i ∈ J1, Ndata K. uθ is considered to be a good approximation of u if predictions uθ (xi ) are close
to target outputs udata
i for every data samples i. We want to minimize the prediction error on the
dataset, hence it’s natural to search for a value θ⋆ solution of the following optimization problem:
with
NX
data
1 2
Ldata (θ) = (uθ (xi ) − udata
i . (8)
Ndata
i=1
Ldata is called the loss function, and equation (7) the learning problem. It’s important to note that
the defined loss function relies on the mean squared error formulation, but it’s worth mentioning that
alternative formulations are also possible. Solving equation (7) is typically accomplished through
a (stochastic) gradient descent algorithm. This algorithm depends on automatic differentiation
techniques to compute the gradient of the loss Ldata with respect to the network parameters θ. The
algorithm is iteratively applied until convergence towards the minimum is achieved, either based on
a predefined accuracy criterion or a specified maximum iteration number as,
with L = Ldata , for the j-th iteration also called epoch in the literature, where lr is called the learn-
ing rate parameter. In this work, we choose the well known Adam optimizer. This algorithm likely
involves updating the network parameters (θ) iteratively in the opposite direction of the gradient to
reduce the loss. The standard automatic differentiation technique is necessary to compute deriva-
tives with respect to the NN parameters, i.e. weights and biases (Baydin et al. 2018). This technique
consists of storing the various steps in the calculation of a compound function, then calculating its
gradient using the chaine rule. In practice, the learning process is significantly streamlined by lever-
aging open-source software libraries such as TensorFlow or PyTorch, especially when working with
Python. These libraries provide pre-implemented functions and tools for building, training, and op-
timizing neural network models. TensorFlow and PyTorch offer user-friendly interfaces, extensive
documentation, and a wealth of community support, making them popular choices for researchers
6
Figure 2: Schematic representation of the structure for a Physics-Informed Neural Network applied
for solving a PDE associated to a 2D direct problem with Dirichlet-like BCs (soft constraints). The
input layer has two input variables (i.e. two neurons) for the two space coordinate variables x and
y. Three hidden layers with five neurons per layer are connected with the input and the output
layer, where the latter has a single variable (one neuron) representing the predicted solution uθ (x, y).
Automatic Differentiation (AD) is used in the procedure in order to evaluate the partial derivatives
(i.e. ux,θ , uxyθ ...) necessary to form the PDE loss function LP DE (θ). The loss function Ldata (θ) is
obtained with soft constraints (i.e. via the training data set) imposed on the boundary domain ∂Ω.
and practitioners in the field of deep learning. Note that, in this work TensorFlow library is used
for direct problems while PyTorch is preferred for parametric and inverse problems.
A schematic representation of the architecture of a neural network designed to approximate a
learned function u(x, y) with a supervised learning approach using a training data set Ndata is visible
in Fig. 1. In this case, two input neurons (first layer) represent the two spatial variables, and the
output neuron (last layer) is the predicted solution uθ . The intermediate (hidden) layers between
the input and output layers where the neural network learns complex patterns and representations,
consists of 5 neurons per layer in this example of architecture. The total number of learned param-
eters is given by the formula (i × h1 + h1 × h2 + h2 × h3 + h3 × o + h1 + h2 + h3 + 0) that is therefore
equal to 81, as h1 = h2 = h3 = 5, i = 2 and o = 1 (i and o being the number of input and output
neurons respectively, and hi being the number of neurons of the i − th hidden layer).
7
Figure 3: Schematic representation variant of previous figure where Neumann or Neumann-Dirichlet
BCs are involved instead of purely Dirichlet BCs. The training data set includes additional knowl-
′
edge on the exact derivatives u data .
where the evaluation of the residual equation is performed on a set of Nc points denoted as xi .
These points are commonly referred to as collocation points. A composite total loss function is
typically formulated as follows
where ωdata and ωP DE are weights to be assigned to ameliorate potential imbalances between the
two partial losses. These weights can be user-specified or automatically tuned. In this way, the
previously described gradient descent algorithm given by equation (9) is applied to iteratively reduce
the total loss. By including boundary data in the training dataset, the neural network can learn
to approximate the solution not only within the domain but also at the boundaries where the
known solutions are available. PINNs are thus well-suited for solving PDEs in inverse problems in
a data-driven manner.
In the context of solving PDEs in direct and parametric problems, for cases involving purely
Dirichlet BCs, the training data set is generally reduced to the sole solution value at the boundary.
However, for problems involving Neumann BCs or a combination of Neumann and Dirichlet BCs
(mixed Neumann-Dirichlet) the training dataset must extend beyond simple solution values. In these
cases (Neumann BCs), since Neumann BCs involve the specification of perpendicular derivative
values at the boundaries, the training dataset needs to include information about these derivatives.
8
Figure 4: Schematic representation of a hard constraints BCs problem corresponding to previous
figure (see text). The boundary constraints are enforced via trial function.
This ensures that the neural network learns to capture the behavior of the solution with respect to
the normal derivative at the boundary. In other cases (Mixed Neumann-Dirichlet BCs), the training
dataset should incorporate solution values at Dirichlet boundaries and also derivative information
at Neumann boundaries.
A schematic representation of the architecture of a neural network designed to solve a PDE using
PINNs is visible in Fig. 2. For simplicity, the weights are taken to be unity in this example.This
scheme corresponds to a direct problem involving Dirichlet BCs imposed with soft constraints, as
the solution on the boundary is not exactly enforced in this way. A similar problem involving
Neumann BCs or mixed Neumann-Dirichlet BCs is schematized in Fig. 3. Note that, the procedure
used to impose the derivative values for Neumann BCs is slightly different from the one proposed
in Baty (2023b). Indeed, in the latter work the first collocation point was used (see Eq. 5 in the
manuscript), contrary to the present work where the derivative values are part of the training data
set.
As explained in introduction, an alternative option to exactly enforce BCs is to employ a well-
behaved trial function (Lagaris 1998), uθ (x, y) = A(x, y) + B(x, y)u∗θ (x, y), where now u∗θ (x, y) is
the output value resulting from the NN transformation to be distinguished from the final predicted
solution uθ (x, y). In this way the use of a training data set is not necessary, and only LP DE (θ)
survives to form the used loss function L(θ). The latter variant using thus hard boundary constraints
is schematized in Fig. 4.
For a 1D spatial (i.e. x) parametric problem, y must be replaced by µ, and the problem is thus
similar to the 2D spatial problems schematized in Figs 2-4.
Finally, for a 1D inverse problem, only one input neuron is needed for the space coordinate
(x), and the unknown parameter µ is learned exactly as an additional network parameter like the
weights and biases θ (see further in the paper), as can be seen in the schematic representation of
Fig. 5.
9
Figure 5: Schematic representation of an inverse problem. A 1D differential equation is considered
(ODE in fact) parametrized with an unknown coefficient µ to be discovered. The training data set
must include data from the whole domain Ω (not only at boundaries).
3 Laplace equation
In order to illustrate a first practical implementation of the method, we consider a simple Laplace
equation in 2D cartesian coordinates:
uxx + uyy = 0, (12)
with uxx and uyy being the second order derivatives of u with respect to x and y respectively. The
integration domain is a square with Ω = [−1, 1] × [−1, 1]. We also focus on a case having the exact
1
https://github.com/hubertbaty/PINNS-PDE
10
solution u(x, y) = x2 − y 2 . Of course, the Dirichlet BCs must match the correct exact solution
values.
For Laplace equation, I propose to use the two variants of PINNs technique on a Dirichlet BCs
problem, namely the vanilla-PINNs and hard-PINNs, and comparing their results.
Figure 6: (Left panel) Distribution of data sets showing the space localization of training data points
at the four boundaries (i.e. Ndata = 120) and collocation points (i.e. Nc = 400) inside the domain,
for solving Laplace problem using vanilla-PINNs. (Right panel) Evolution of the two partial losses
Ldata and LP DE as functions of the number of iterations (i.e. epochs).
First, we need to generate a data set of training data with points localized at the 4 boundaries,
i.e. x ± 1 and y ± 1. In this example we choose 120 points (30 per boundary) with a randomly
distribution. We also need to generate the set of collocation points inside the domain, e.g. 400 ran-
domly distributed points are used. Note that, I use a pseudo-random distribution (Latin-hypercube
strategy) in order to avoid empty regions. The resulting data generation is illustrated in Fig. 6 (left
panel).
We apply the PINN algorithm schematized in Fig. 2. The evolution of the two loss functions
with the training epochs that are reported in Fig. 6 (right panel), shows the convergence toward the
predicted solution, as very small values are obtained at the end. Note that the training is stopped
after 60000 epochs. For this problem, I have chosen a network architecture having 5 hidden layers
with 20 neurons per layer. This represents 1761 learned parameters. A learning rate of lr = 2×10−4
is also chosen. The latter parameters choice slightly influences the results but is not fundamental
as long as the number of layers/neurons is not too small (Baty 2023a). A faster convergence can
be also obtained by taking a variable learning rate with a decreasing value with the advance of the
training process.
The solution and the error distribution at the end of the training are plotted in Fig. 7 exhibiting
a maximum absolute error of order 0.0008, This is well confirmed by inspecting corresponding one
dimensional cuts comparing predicted and exact solutions, as can be seen in Fig. 8. Note that the
predicted PINNs solution and associated error distribution are obtained using a third set of points
11
Figure 7: Solution (left panel) and absolute error (right panel) distributions as colored iso-contours
corresponding to problem associated to the previous figure.
(different from the collocation points) that is taken to be a uniform grid of 100 × 100 points here,
otherwise the error could be artificially small (overfitting effect). One must also note that the error
is higher near the boundary due to the coexistence of data/collocation points in these regions. In
this way, once trained, the network allows to predict the solution quasi-instantaneously at any point
inside the integration domain, without the need for interpolation (as done e.g. with finite- difference
methods when the point is situated between two grid points). The precision of PINNs is known
to be very good but less than more traditional methods (e.g. like in finite-element codes). This is
a general property of minimization techniques based on gradient descent algorithms (Press et al.
2007; Baty 2023). However, a finer tuning of the network parameters together with the introduction
of optimal combinations for weights of the partial losses can generally ameliorate the results, which
is beyond the scope of this work.
Traditional numerical schemes are generally characterized by some convergence order due to the
space discretization. For example, a method of order two means that the associated truncation
error is divided by 4 when the space discretization factor is divided by 2, resulting in a precise given
decreasing parabolic scaling law. PINNs are statistical methods that are consequently not expected
to follow such law. More precisely, multiplying by 2 (for example) the number of collocation points
and/or the number of training data points does not necessarily lead to a dependence law for the
error. The only well established result is that, there is a minimum number of points (depending on
the problem) necessary for convergence towards an acceptable solution. There is also a minimum
number of hidden layers and number of neurons per layer. A too large architecture can even degrade
the precision of the results. One can refer to tests done on Lane-Emden ODEs on the latter property
(Baty 2023a, Baty 2023b).
12
Figure 8: One-dimensional solution obtained for different x and y particular values compared with
the exact analytical solution (plain line) corresponding to the previous figure.
where the function A(x, y) is designed to exactly satisfy the BCs without any adjustable parameter,
and the remaining form B(x, y)u∗θ is constructed so as to not contribute to the BCs. The ajustable
parameters are thus contained in the neural network output function only that is now u∗θ , and the
function B(x, y) must vanish at the boundaries. The choice of the two functions A and B is not
unique and can affect the efficiency of the algorithm.
In the present example, following prescriptions given by Lagaris (1998), we have B = (1 − x)(1 +
x)(1 − y)(1 + y) and A = (1 − y 2 ) + (x2 − 1). This is uθ that is used to evaluate the loss function
L(θ) = LP DE (θ) in the minimization procedure, as it should satisfy the PDE contrary to u∗θ . The
loss function is based on the residual that is simply F = uxx + uyy . Note that, it is not always
simple (or even possible) to define the two functions A and B, especially when the boundaries are
more complex or/and the BCs involve Neumann conditions (see Lagaris 1998).
Following the PINN algorithm schematized in Fig. 4, we can thus solve the same previous
Laplace equation. The results are plotted in Fig. 9 and Fig. 10. Note that only collocation points
are now necessary. The precision is better compared to results obtained using vanilla-PINNs, as the
maximum absolute error is now of order 5 × 10−5 . Moreover, the error is mainly localized inside the
domain. This was expected as the boundary values are exactly imposed with this variant. However,
as explained in the previous figure, this is not always easy/possible to use this method, contrary to
the vanilla-PINN variant.
13
Figure 9: (Left panel) Distribution of data sets showing collocation points (i.e. Nc = 400) inside
the domain, for solving Laplace problem using hard-PINNs. (Right panel) Evolution of the total
loss L = LP DE as function of the number of iterations (i.e. epochs).
Figure 10: Solution (left panel) and absolute error (right panel) distributions as colored iso-contours
corresponding to problem associated to the previous figure.
4 Poisson equations
In this section, we test the method on Poisson-type equations in 2D cartesian coordinates. Thus,
we consider the following form,
uxx + uyy = f (x, y), (14)
having the following 5 manufactured exact solutions taken from Nishikawa (2023):
14
for respectively,
(a) f (x, y) = exy (x2 + y 2 )
(b) f (x, y) = 1
(c) f (x, y) = sinh(x) (16)
2 2 x2 +y 2
(d) f (x, y) = 4(x + y + 1)e
(e) f (x, y) = exy (x2 + y 2 ) + sinh(x)
We consider the integration domain Ω = [0, 1] × [0, 1] with different types of boundary conditions
specified below.
In this section, I propose to compare the use of different BCs with vanilla-PINNs variant (only)
on Poisson problems. Now, the loss function for PDE is based on the residual equation F =
uxx + uyy − f (x, y).
Figure 11: Vanilla-PINNs solution (left panel) and associated absolute error (right panel) distri-
butions as colored iso-contours corresponding to Poisson-problem (case-a) equation with Dirichlet
BCs.
First, we need to generate a data set of training data with points localized at the 4 boundaries,
i.e. x = 0, x = 1, y = 0, and y = 1. As illustrated for Poisson problem, we choose 120 points (30
per boundary) with a randomly distribution. We also need to generate the set of collocation points
inside the domain, e.g. 400 randomly distributed points (with Latin-hypercube strategy) are used.
We apply the PINN algorithm schematized in Fig. 2, to solve the equation for the first case
(a). The evolution of the two loss functions with the training epochs that are reported in Fig.
11, shows the convergence toward the predicted solution, as very small values are obtained at the
end. Note that the training is stopped after 60000 epochs. For such Poisson PDE the PDE loss
function is taken to be based on F = uxx + uyy − f (x, y). For this problem, I have chosen a
network architecture having 6 hidden layers with 20 neurons per layer. This represents 2181 learned
parameters. A learning rate of lr = 3 × 10−4 is also chosen.
The solution and the error distribution at the end of the training are plotted in Fig. 12 exhibiting
a maximum absolute error of order 0.003 (the corresponding maximum relative error is similar to
the previous Poisson problem). Similar results are obtain for the other equations, as one can see on
results reported in appendix.
15
Figure 12: Evolution of the two partial losses Ldata and LP DE as functions of the number of iterations
(i.e. epochs) for Poisson equation (case-a) with Dirichlet BCs using vanilla-PINN, associated to
previous figure.
Of course, if the BCs are applied only on a part of the whole boundary ∂Ω (for example on 3 or
even on 2 of the four boundaries), the solution can be also obtained but with a lower accuracy.
Figure 13: Solution (left panel) and absolute error (right panel) distributions as colored iso-contours
corresponding to the Poisson problem (case-a) with Neumann BCs and vanilla-PINN.
16
Figure 14: Evolution of the two partial losses Ldata and LP DE as functions of the number of
iterations (i.e. epochs) for Poisson equation (case a) with Neumann BCs using vanilla-PINN.
Figure 15: Solution (left panel) and absolute error (right panel) distributions as colored iso-
contours corresponding to the Poisson problem (case a) with mixed Dirichlet-Neumann BCs and
vanilla-PINN (see text).
17
Figure 16: Solution (left panel) and absolute error (right panel) distributions as colored iso-contours
corresponding to the Poisson problem (case a) with Cauchy BCs and vanilla-PINN (see text).
5 Helmholtz equations
5.1 Specific problem related to astrophysics
Figure 17: Solutions (iso-contours of u(x, z)) predicted by PINNs solver for the Helmholtz equation
for two cases, i.e. for the parameters combination (a1 , a2 , a3 ) equal to (1, 0, 1) and (1, 0, −1) in left
and right panel respectively. The distribution of data sets (training and collocation points) is also
indicated using red and blue dots at the boundary and inside the domain respectively.
Following a previous work (Baty & Vigon 2024), we solve an Helmholtz equation,
∆u + c2 u = 0, (17)
∂ 2 ∂ 2
where c is a constant and ∆ = ∂x 2 + ∂z 2 is the cartesian Laplacian operator. The spatial integration
18
Figure 18: Evolution of the losses (training data and PDE) during the training as functions of
epochs, corresponding to the two cases respectively (see previous figure).
Figure 19: Predicted solution (left panel) and absolute error distribution (right panel) using colored
iso-contours, corresponding to the first case parameters combination (a1 , a2 , a3 ) equal to (1, 0, 1) of
the Helmholtz problem (see text and the two previous figures).
plane and magnetic field lines, via isocontour values of u. The sun surface is situated at z = 0 as
z represents the altitude. The remaining magnetic field component By being simply added to the
previous one in case of translational symmetry generally assumed in this invariant direction. On
other words, the total magnetic field may be written as,
where ey is the unit vector of the cartesian basis along the y direction. Exact solutions for triple
arcade structures can be obtained using Fourier-series as
3
X kπ
u(x, z) = exp(−νz) ak cos x . (19)
L
k=1
k2 π 2
The latter solution is periodic in x, and the relationship ν 2 = L2
− c2 applies as a consequence
19
of the above Helmholtz equation. More details about the context can be found in Baty & Vigon
(2024) and references therein.
6 Grad-Shafranov equations
Another equation represents a second important issue for approximating magnetic equilibria of
plasma in the solar corona. The latter that is known as Gad-Shafranov (GS) equation, is used to
model curved loop-like structures. It is also used to approximate the magnetohydrodynamic (MHD)
20
equilibria of plasma confined in toroidal magnetic devices that aim at achieving thermonuclear fusion
experiments like tokamaks.
The GS equation can be written as
2
∂2ψ
∂ ψ 1 ∂ψ
− + − = G(R, z, ψ), (20)
∂R2 ∂z 2 R ∂R
with a formulation using (R, ϕ, z) cylindrical like variables. The scalar function ψ(R, z) is the desired
solution allowing to deduce the poloidal magnetic field Bp (component in the (R, z) plane) via
1 F (ψ)
Bp = ∇ψ × eϕ + eϕ (21)
R R
in the axisymmetric approximation, where F (ψ) = RBϕ . The toroidal magnetic field component is
Bϕ = Bϕ eϕ oriented along the toroidal unit vector eϕ (perpendicular to the poloidal plane). The
right hand side source term G(R, z, ψ) includes a thermal pressure term and a second term involving
a current density, and must be specified in order to solve the equation 20 (see below).
The elliptic differential operator (left hand side of equation ) can be rewritten, as multiplying
the equation by R, leads to the following residual form
∂2ψ ∂2ψ
∂ψ
F = R 2 +R 2 − + RG(R, z, ψ) = 0, (22)
∂R ∂z ∂R
Figure 20: Predicted solution (left panel) and absolute error distribution (right panel) using colored
iso-contours, for the drop-like solution of GS equation and Soloviev equilibrium. The spatial loca-
tions for training and collocation data sets used on the boundary and interior domain respectively
are indicated with red and blue dots respectively.
Exact analytical solutions called Soloviev solutions are of particular importance for approximat-
ing the general solutions relevant of tokamaks and other variants of such toroidal magnetic devices
21
(Soloviev 1975). The latter are obtained by taking relatively simple expressions for the source term
G(R, z).
For example, assuming G = f0 (R2 + R02 ) leads to the exact analytical solution (see Deriaz et al.
2011)
f0 R02 2 R2 − R02
ψ= a − z2 − , (23)
2 4R02
with a simple boundary condition ψ = 0 on a closed contour ∂Ω defined by
s
2a cos(α)
∂Ω = R = R0 1 + , z = aR0 , α = [0 : 2π] , (24)
R0
where R0 , a, and f0 are parameters to be chosen. As can be seen below, the integration domain Ω
bounded by ∂Ω has a funny drop-like form with an X-point topology at z = R = 0, as ∂ψ ∂ψ
∂z = ∂R = 0
at this point.
Here, we present the results obtained with our PINN solver using parameter values, f0 = 1, a =
0.5, and R0 = 1. The network architecture is similar to the arcade problem where 7 hidden layers
with 20 neurons per layer were chosen, which consequently represent a number of 2601 trainable
parameters. We have used 80 training data points (i.e. Ndata = 80) with a distribution based on a
uniform α angle generator, and Nc = 870 collocation points inside the integration domain. Contrary
to the case reported in Baty & Vigon (2024), the distribution of collocation points is obtained using
a pseudo-random generator with an additional concentration close to the X-point in order to get a
better predicted solution there. The results are obtained after a training process with a maximum
of 60000 epochs.
22
Figure 21: Predicted solutions (colored iso-contours of ψ) for tokamak ITER-like (a panel), spher-
ical tokamak NSTX-like (b panel), and spheromak-like (c panel) devices. Only spatial locations of
training data points where ψ = 0 (at the boundary) is imposed as soft (Vanilla-PINN) constraints
are indicated with red dots.
spherical tokamak. And, a third combination with ϵ = 0.95, κ = 1, and δ = 0.2, together with
A = 1, allows to model magnetic equilibria representative of a spheromak. The results obtained are
plotted in Fig. 21. The particular choice for these parameter values is explained in details in Cerfon
et al. (2011) The choice for the neural network architecture and the numerical parameters for the
gradient descent algorithm is similar to what has been done previously. Finally, note that a similar
PINN solver tested on the same toroidal fusion devices have been developed by Jang et al. (2024).
23
Figure 22: Predicted solutions (iso-contours of ψ) of GS equation in a rectangular domain with
three source terms, G = 1, G = R2 z 3 , and G = R3 z 2 in left, middle, and right panel respectively.
Hard constraints are used to impose Dirichlet BCs.
vanilla-PINNs. Indeed, following the specification proposed by Lagaris (1998) and as expressed in
subsection 3.2, it is preferable to write
ψθ = (R − 0.5)(R − 1.5)(z − 0.5)(z + 0.5)ψθ∗ , (27)
where ψθ∗ is the output of the neural network. In this way, the predicted solution ψθ automatically
vanishes at the boundary and the training data set is not needed. Using PINNs solvers similar to
the previously described ones, predicted solutions for three different source terms are plotted in Fig.
22.
In order so show that PINN solver can also be used for non linear source terms G, we con-
sider a case taken from Peng et al. (2020) with G = 2R2 ψ[c2 (1 − exp(−ψ 2 /σ 2 ) + 1/σ 2 (c1 +
c2 ψ 2 ) exp(−ψ 2 /σ 2 )] with σ 2 = 0.005, c1 = 0.8, and c2 = 0.2 in a domain Ω = [0.1, 1.6]×[−0.75, 0.75].
The results obtained with a PINNs solver using hard constraints are plotted in Fig. 3. In this case,
the imposed Dirichlet condition value used is now ψ = 0.25. Consequently, we also use
ψθ = (R − 0.1)(R − 1.6)(z − 0.75)(z + 0.75)ψθ∗ , (28)
As there is no exact solution available, this is not possible to evaluate the error. However, our
solution compares well with the solution computed in Peng et al. (2020).
7 Lane-Emden equations
The Lane-Emden (LE) equations are widely employed in astrophysics and relativistic mechanics.
Before focusing on a precise case taken form astrophysics, it is instructive to introduce a more
general mathematical form.
24
Figure 23: Predicted solutions (iso-contours of ψ) of GS equation in a rectangular domain with a
non linear source term G (see text). The location of collocation points are indicated with blue dots.
There is no training data points as hard constraints (Lagaris method) are used to impose Dirichlet
BCs.
where second order derivatives (uxx and uyy ), and first order derivatives (ux and uy ) of the desired
solution are involved. The two scalars α and β are real shape parameters, f (x, y) is a given scalar
source function, and integration is done on a cartesian (x, y) domain.
The particularity of Lane-Emden equation lies in the singularities at x = 0 and y = 0, which
must be overcome by numerical integration method. As can be seen below, solvers based on PINNs
algorithm are excellent choice as classical discretization is not needed. We also focus on the search
for solutions in a rectangular domain with Cauchy-like conditions imposed at one or two of the
four boundaries. The conditions at the other boundaries are assumed free or of Neumann type.
Indeed, such problems are representative of physical examples in astrophysical context (see second
subsection in this section).
As an example, we take a case with α = β = 2, and f = −6(2 + x2 + y 2 ). The integration
domain is Ω = [0, 2] × [−1, 1]. The exact solution can be checked to be u = (1 + x2 )(1 + y 2 ). The
residual form used to evaluate the PDE loss function is taken to be
Using a PINNs solver similar to previously described ones, we can predict the solution for three
distinct problems involving Cauchy-type condition at (at least) one boundary. Indeed, when we
imposed Cauchy condition (i.e. using the exact solution and also the its perpendicular derivative)
at the two boundaries x = 0 and x = 1, the predicted PINN solution (plotted in Fig. 24) shows a
rather good agreement with the exact solution inside the domain, as the maximum absolute error
is 0.004. However, when the Cauchy condition is imposed only at one boundary, i.e. at x = 2,
the accuracy is significantly deteriorated compared to the previous case (see Fig. 25 exhibiting a
maximum absolute error of 0.30). This error is obviously mainly localized at the opposite boundary.
In astrophysical context, the prescription of the perpendicular derivative at the two other boundaries
(i.e. at y ± 1 can be a a reasonable hypothesis. Consequently, when we impose Cauchy condition at
x = 2 in addition to Neumann conditions at y ± 1 , one can see that the predicted PINN solution
25
Figure 24: Predicted solution (iso-contours of ψ in top-left panel) of mathematical LE equation in
a rectangular domain with Cauchy condition imposed at two boundaries (x = 0, and x = 2). The
absolute error is plotted in top-right panel. One dimensional cuts (at given y and x values) show
the predicted solution versus the exact one.
is improved again. The latter result is clearly in Fig. 26, where the maximum absolute error is
comparable to the first case with Cauchy condition imposed at the two opposite boundaries.
26
Figure 25: Same as in previous figure, but using Cauchy condition at only one boundary (i.e. at
x = 2).
and ω is a constant representative of the rotation of the star (assumed to be uniform). Physically
speaking, ψ is related to the mass density (see Baty 2023). For example, in the n = 1 case, it is
exactly the mass density normalized to the value at the center. This is a dimensionless equation
depending on two spatial coordinates (spherical geometry) with r = R/Rc (Rc being the star radius)
and with θ the co-latitude angle varying between 0 and π. The problem is assumed axisymmetric
and then does not depend on the remaining spherical angle. The latter can be also developed in
the equivalent form (that is closer to the previous mathematical equation),
∂ 2 ψ 2 ∂ψ 1 ∂2ψ 1 ∂ψ
2
+ + 2 2
+ 2 + ψ n = ω. (32)
∂r r ∂r r ∂θ r tan θ ∂θ
However, contrary to the mathematical form (see Eq. 28), this physical form displays an obvious
asymmetry between the two coordinates r and θ.
We can solve the previous LE equation in the particular case n = 1 without rotation (i.e. ω = 0).
In this case, an analytical solution exists that is purely radial as ψ = sin(r)/r. A PINN solver is
develped using a PDE loss function based on the residual equation,
∂2ψ ∂ψ ∂ 2 ψ 1 ∂ψ
F = r2 2
+ 2r + 2
+ + r2 ψ n = 0. (33)
∂r ∂r ∂θ tan θ ∂θ
∂ψ
The BCs used in this problem are Cauchy condition on the axis r = 0, that are ψ = 1 and ∂r = 0.
27
Figure 26: Same as in previous figure, but using Cauchy condition at only one boundary (i.e. at
x = 2) and Neumann condition at the two boundaries y ± 1.
Neumann BCs are also added at the two boundaries θ = 0, 2π, that are ∂ψ ∂θ = 0. Using 1000
collocation points randomly distributed in the domain Ω = [0, π] × [0, 2π] and 150 training data
points at the three boundaries (50 per boundary), our PINNs solver is able to predict the exact
solution with a precision similar to ones obtained for previous problems, as one can see in Figs
27-29. A total number of 70000 is used for this example.
Solutions for other index values (i.e. n) can be also easily obtained in the same way. Note
that, without rotation (i.e. ω = 0), the solutions are purely radial (see Baty 2023a). In principle
this is not the case when the rotation factor is not zero, and a physically relevant solution should
depend on the angle θ. Unfortunately, the latter solution cannot be obtained using the Lane-Emden
equation solely as some extra conditions are lacking (see Chandrasekhar & Milne 1938).
28
Figure 27: Predicted PINNs solution (left panel) and associated absolute error distribution (right
panel) obtained from the physical Lane-Emden problem for n = 1 and ω = 0 in the (r, θ) plane.
Figure 28: (Left panel) Predicted PINNs solution obtained from the physical Lane-Emden problem
for n = 1 and ω = 0 (see previous figure) plotted in the associated (z, x) plane. (Right panel)
Evolution of the losses during the training process.
∂u ∂2u
c − µ 2 − 1 = 0, (34)
∂x ∂x
where u(x) is the desired solution in a one dimensional spatial domain with x in the range [0, 1].
As µ is a dissipation coefficient (i.e. viscosity), and c is a constant coefficient analog to a velocity
(taken to be unity for simplicity), the latter equation represents a steady-state advection-diffusion
problem with the additional constant source term (i.e. unity). An example of exact solution with
29
Figure 29: One dimensional cuts (at given r and θ values) showing the predicted solution versus
the exact one.
Such solution is known to involve the formation of singular layers (i.e. at x = 1) when the viscosity
employed is too small. As we are interested by learning the solutions at different viscosities with
the same neural network, we can consider now variable µ values taken in the range [0.1, 1.1]. A
PINNs solver can thus be easily designed where the second neuron (see Fig. 5) corresponds now to
µ values, and the desired solution must be properly called now u(x, µ). The corresponding residual
equation form is now
F [u(x, µ), x, µ] = 0. (36)
We can generate random distributions of training boundary points (typically 20 points at x = 0
and 20 points at x = 1 corresponding to different viscosity values in the range indicate above) and
400 collocation points in the (x, µ) space Ω = [0, 1] × [0.1, 1.1] as one can see in Fig. 30. The exact
boundary values (i.e. with zero values in the present problem) are imposed at these boundaries in
order to minimize the training data loss function Ldata , and the residual equation is evaluated on
the collocation points for the variable viscosity in order to minimize the physics-based loss function
LP DE . The results are plotted in Fig. 31 (iso-contours in the trained plane) and Fig. 32 (cuts at
different viscosity value) where one can see the predicted PINNs solution agrees very well with the
exact one whatever the µ value, with an error similar to values reported for the previous problems.
30
Figure 30: Scatter plot of the collocation points (blue dots) and of the training data (red crosses at
the two boundaries x = 0, 1 in the (x, µ) plane.
Figure 31: Colored iso-contours of the predicted PINNs solution u(x, µ) and exact solution in the
left and right panel respectively.
Figure 32: One-dimensional solution (red color) obtained for four different µ viscosity values situated
in the range of learned values (see legend) compared with the exact analytical solution (blue color),
corresponding to one-dimensional cut obtained from previous figure.
31
Figure 33: (Left panel) Set of data generated at different x values for a viscosity parameter µ = 0.15
from the expression Eq. 34 with additional random noise amplitude value 0.01. (Right panel)
Predicted PINNs solution obtained at the end of the training process (after 40000 epochs).
As an interesting potential use of PINNs method, one can use hard-PINNs solver where a
formulation close to Lagaris one (Lagaris 1998) allows to well capture the boundary layer solution
at decreasingly small viscosity values. More explicitly, this is obtained by using a trial function
having a exponential decay as exp (−x/µ) as it is the case for the exact solution at the singular
boundary. The latter algorithm is named semi-analytic PINN method (Gie et al. 2024).
32
Figure 34: (Left panel) The viscosity parameter value µ estimated by the PINNs solver during
the training process, corresponding to the noisy data with a noise amplitude of 0.01 (see previous
figure). (Right panel) The viscosity parameter value µ estimated by the PINNs solver during the
training process, corresponding to training data without noise.
33
the precision is good but not as good compared to traditional schemes, and it can reveal insufficient
for some problems.
Finally, PINNs can be particularly useful when solving parametric PDEs (as illustrated on
parametric advection-diffusion ODE in this report). In a similar way, PINNs can solve inverse
problems for which a coefficient (or a term) is not known and must be discovered.
Acknowledgements
Hubert Baty thanks, V. Vigon (IRMA and INRIA TONUS tream, Strasbourg) for stimulating
discussions on PINNs technique, J. Pétri (Strasbourg observatory) for quoting the possibility to
solve Lane-Emden equations, and L. Baty (CERMICS, école des Ponts ENPC) for helping in the
use of Python libraries and impromptu general discussions on neural networks.
References
[Baty & Baty (2023)] Baty H., Baty L. 2023, Preprint, https://doi.org/10.48550/arXiv.2302.12260
[Baty & Vigon (2024)] Baty H., Vigon V. 2024, Monthly Notices of the Royal Astronomical Society,
Volume 527, Issue 2, Pages 2575–2584, https://doi.org/10.1093/mnras/stad3320
[Baydin et al. (2018)] Baydin A.G., Pearlmutter B.A., Radul A.A., & Siskin J.M. 2018, Journal of
Machine Learning Research, 18, 1, https://doi.org/10.48550/arXiv.1502.05767
[Bencheikh (2023)] Bencheikh A. 2023, Advances in Mathematics, Scientific Journal 12, 805
[Chandraskhar & Milne (1933)] Chandrasekhar S. , Milne E.A., Monthly Notices of the Royal As-
tronomical Society, Volume 93, Issue 5, March 1933, Pages 390–406
[Cerfon & Freidberg (2010)] Cerfon A.J., Freidberg J.P., Phys. Plasmas 17, 032502 (2010)
[Cuomo et al. (2022)] Cuomo S., Di Cola V.S., Giampaolo F., Rozza G. Raissi, M., & Piccialli F.
2022, Journal of Scientific Computing, 92, 88
[Deriaz et al. (2011)] Deriaz E., Despres B., Faccanoni G., Pichon Gostaf K., Imbert-Gérard L.M.,
Sadaka G., & Sart, R. 2011, ESAIM Proc., 32, 76
[Gie et al. (2024)] Gie G.-M., Hong Y., Jung C.-Y. 2024, to appear in Applicable Analysis,
https://doi.org/10.1080/00036811.2024.2302405
[Itakagi et al. (2004)] Itakaki M., Kamisawada J., Oikawa S., Nucl. Fusion 44 (2004) 427–437
[Jang et al. (2024)] Jang B., Kaptanoglu A., Pan R., Landreman M., Dorland W. 2024,
https://doi.org/10.48550/arXiv.2311.13491
34
[Karniadakis et al. (2021)] Karniadakis G.E., Kevrekidis I.G., Lu L, Perdikaris P., Wang S., & Yang
L. 2021, Nature reviews, 422, 440, https://doi.org/10.1038/s42254-021-00314-5
[(Lagaris(1998)] Lagaris E., Likas A., & DI Fotiadis L. 1998, IEEE transactions on neural networks,
9(5), 987
[Peng et al. (2020)] Peng Z., Tang Q., Pan R., Tang X.-Z. 2020, SIAM Journal on Scientific Com-
puting Vol. 42, 5
[Press et al. (2007)] Press W.H., Teukolsky S.A., Vetterling W.T., & Flannery B.P. 2007, Numerical
Recipes 3rd Edition
[Raissi et al. (2019)] Raissi M., Perdikaris P., & Karniadakis G.E. 2019, Journal of Computational
Physics, 378, 686, https://doi.org/10.1016/j.jcp.2018.10.045
[Urbán et al. (2023)] Urban J.F., Stefanou P., Dehman C., & Pons J.A. 2023, MNRAS, 524, 32,
https://doi.org/10.1093/mnras/stad1810
11 Appendix
We have plotted in Figures 35 and 36, the results obtained when solving Poisson problems (equations
b-e, see main text) with Dirichlet and Neumann BCs using vanilla-PINNs respectively.
35
Figure 35: PINNs solution and corresponding absolute/relative error distribution as colored iso-
contours corresponding to Poisson problem with Dirichlet BCs using vanilla-PINNs. The four
equations (b-e) examples are considered from top to bottom panels respectively (see text).
36
Figure 36: PINNs solution and corresponding absolute/relative error distribution as colored iso-
contours corresponding to Poisson problem with Neumann BCs using vanilla-PINNs. The two
equations (b-c) examples are considered from top to bottom panels respectively (see text).
37