0% found this document useful (0 votes)
25 views131 pages

05 Sciml PINN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views131 pages

05 Sciml PINN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

SciML - Physics Informed

ML/NN

Mark Asch - IMU/VLP/CSU

2023

SciML - PINN and co.


Program

1. Automatic differentiation for scientific machine learning:


(a) Differentiable programming with autograd and Py-
Torch.
(b) Gradients, adjoints, backpropagation.
(c) Adjoints and inverse problems.
(d) Neural networks for scientific machine learning.
(e) Physics-informed neural networks.
(f) The use of automatic differentiation in scientific ma-
chine learning.
(g) The challenges of applying automatic differentiation
to scientific applications.

SciML - PINN and co. 1


Recall: What is SciML?

SciML - PINN and co. 2


Recall: Differentiable programming

• Differential programming is a technique for automati-


cally computing the derivatives of functions.

• This can be done using a variety of techniques, including:


) Symbolic differentiation
) Numerical differentiation
) Automatic differentiation: This is a technique
that combines symbolic and numerical differentiation
to automatically compute the derivatives of functions.
This is the most powerful technique for differential
programming, and it is the most commonly used
technique in scientific machine learning.

• The mathematical theory of differential programming is


based on the concept of gradients.

• Differential programming can be used to solve a variety


of problems in scientific machine learning, including:

SciML - PINN and co. 3


) Calculating the gradients of loss functions for machine
learning models.
) Solving differential equations.
) Performing optimization.
) Solving inverse and data assimilation prob-
lems.

SciML - PINN and co. 4


Recall: Automatic Differentiation

• Automatic differentiation is an umbrella term for a va-


riety of techniques for efficiently computing accurate
derivatives of more or less general programs.

• Many algorithms in machine learning, computer vision,


physical simulation, and other fields require the calcula-
tion of gradients and other derivatives.

• Practitioners across many fields have built a wide set of


automatic differentiation tools, using different program-
ming languages, computational primitives, and interme-
diate compiler representations.

• AD can be readily and extensively used and is thus


applicable to many industrial and practical Digital Twin
contexts [9].

SciML - PINN and co. 5


AD for SciML

• Recent progress in machine learning (ML) technology


has been spectacular.

• At the heart of these advances is the ability to obtain


high-quality solutions to non-convex optimization prob-
lems for functions with billions—or even hundreds of
billions—of parameters.

• Incredible opportunity for progress in classical applied


mathematics problems.

SciML - PINN and co. 6


Automatic Differentiation—backprop,
autograd, etc.

• Backprop is a special case of autodiff.

• Autograd is a particular autodiff package.

• In practice, we will pricipally use PyTorch’s autodiff


functions.
Remark 1. Autodiff is NOT finite differences, nor symbolic
differentiation. Finite differences are too expensive (one
forward pass for each discrete point). They induce huge
numerical errors (truncation/approximation and roundoff)
and are very unstable in the presence of noise.

Remark 2. Autodiff is both efficient—linear in the cost of


computing the value—and numerically stable.

Remark 3. The goal of autodiff is not a formula, but a


procedure for computing derivatives.

SciML - PINN and co. 7


Tools for AD

• New opportunities that exist because of the widespread,


open-source deployment of effective software tools for
automatic differentiation.

• Efficient software frameworks that natively run on hard-


ware accelerators (GPUs).

• These frameworks inspired high-quality software libraries


such as

) JAX,
) PyTorch,
) TensorFlow.

• The technology’s key feature is: the computational


cost of computing derivatives of a target
loss function is independent of the number
of parameters;

SciML - PINN and co. 8


) this trait makes it possible for users to implement
gradient-based optimization algorithms for functions
with staggering numbers of parameters.

SciML - PINN and co. 9


AD Sayings

“Gradient descent can write code better than you,


I’m sorry.”
“Yes, you should understand backprop.”
“I’ve been using PyTorch a few months now and I’ve
never felt better. I have more energy. My skin is
clearer. My eye sight has improved.”

• Andrej Karpathy [~2017] (Tesla AI, OpenAI)

SciML - PINN and co. 10


Why use ML/Neural Networks for
SciML?

• Excellent, open-source tools and frameworks


) Autodiff
) PyTorch
) many, many others...

• Universal approximation property (UAP for NNs)

• Curse of dimensionality

SciML - PINN and co. 11


Recall: what is a NN?

• A Neural Network is a composition of nonlinear functions


(see Basic Course)

NN(x) = W3 2 (W2 1 (W1 x + b 1 ) + b2 ) + b 3

where we can add layers, and to each layer, add neurons.

• Training a NN: given observations y = f (x) of some


unknown function f, find the values of W that minimize
the loss function expressing the mismatch between the
predictions of NN(x) and the corresponding values of
y.
) hence the NN is just a function approximator

SciML - PINN and co. 12


Recall: why NNs?

• Neural Networks have two important properties:


) Universal Approximation property, which states that
for a given accuracy ✏, one can construct a large NN
such that it can approximate any (reasonable) func-
tion f, of arbitrary complexity, within the tolerance
✏.
) Avoidance of the Curse of Dimensionality. If we were
to make a polynomial approximation with n coeffi-
cients in each of d dimensions, then the complexity
of this approximator will exponential in d. However,
the growth of a NN to sufficiently approximate a
d-dimensional function, only grows as a polynomial
in d.

SciML - PINN and co. 13


Recall: Universal Approximation for
Functions

Theorem 1 (Cybenko 1989). If is any continuous


sigmoidal function, then finite sums

N
X
G(x) = ↵j (yj · x + ✓j )
j=1

are dense in C(Id).

Theorem 2 (Pinkus 1999). Let mi 2 Zd, i = 1, . . . , s,


and set m = maxi mi . Suppose that 2 C m(R), not
polynomial. Then the space of single hidden layer neural
nets,

M( ) = span (w · x + b) : w 2 Rd, b 2 R ,

m1 ,...,ms . s mi
is dense in C (R ) = \i=1C (Rd).
d

SciML - PINN and co. 14


Input Hidden Output
layer layer layer

Input x1
Input x2
Output y
Input x3
Input x4

SciML - PINN and co. 15


From ML to SciML...

• In scientific machine learning, neural networks and ma-


chine learning are used as the basis to solve problems in
CSE (Computational Science and Engineering)

• CSE is, in majority, driven by systems of (P)DEs, since


we are interested in how systems evolve/change.
) As a consequence, the use of ML for the solution of
differential eqations is an important topic.
) As we will see, ML can be used in other ways too.

SciML - PINN and co. 16


ML/NN Approaches

• Recall the “question of balance”

• Now, we are going,to decompose the SciML block into


classes... (there are several ways of doing this)

• 3 major classes of approaches


) architecture-based

SciML - PINN and co. 17


) loss function
) hybrid approaches

• 4 widely-used families of approaches:


) SUMO - surrogate modeling
) PCL - physics constrained learning
) PINN - physics informed neural networks
) DeepONet/PINO/FNO - neural operators

• Others:
) Neural ODEs
) Differentiable physics
) Koopman theory

• An alternative classification:
) constrained (including PINN and PCL)
) encoded (DL-type architectures)
) operator-based (DeepONet, FNO)

• SUMO then falls into the “data-driven” category

SciML - PINN and co. 18


Forward and Inverse Problems

• Forward simulation is a major CSE task: predict the


system’s evolution, given some input conditions.
) Usually we solve a system of diffferential equations
with some forcing and boundary conditions
) Challenges:
! major challenge is the extremely high computa-
tional costs of simulating complex, multi-scale,
multi-physics systems.

SciML - PINN and co. 19


! quantifying uncertainty for predictions/forecasts
) SciML is helping to alleviate these issues by allowing
the possibility to learn from previous simulations, pro-
viding more powerful computational shortcuts whilst
having less impact on the simulation fidelity

• Inverse problems are tightly related to simulation and


solving them is crucial for many real-world tasks.
) Here the goal is to estimate a set of latent, hidden,
or unobserved parameters of a system given a set of
real-world observations of the system.
) Challenges:
! Inversion algorithms often require many forward
simulations to be run in order to match the predic-
tions of the physical model to the set of observa-
tions.
! Given the potentially high computational costs of
forward simulation stated above, this can render
many IP applications unfeasible.
! Inversion problems generally suffer from ill-
posedness.
! In such cases, sophisticated regularization schemes
are required to restrict the space of possible latent

SciML - PINN and co. 20


parameters the inversion algorithm can explore.
! Finally, real-world inversion usually suffers from
noise/uncertainty. This is challenging to quantify
and will increase the ill-posedness of the problem.
! Often a fully probabilistic framework is required to
model such processes.

SciML - PINN and co. 21


EQUATION DISCOVERY

SciML - PINN and co. 22


Equation Discovery

• There are many contexts where we do not fully under-


stand the system itself.

• We are unsure how to define the model F—this is a


type of inverse problem.

• Being able to learn about a system, for example by


discovering its governing equations, is powerful as it can
provide a general model of the system.

• SciML is aiding this discovery by allowing us to automate


the process and/or learn about complex processes which
are hard to intuit

SciML - PINN and co. 23


Equation Discovery: theory

• Suppose we have a dynamic system

d
x(t) = f (x(t)), (1)
dt

where
) vector x(t) denotes the state of a system at time t,
) the function f represents the dynamic constraints
that define the equations of motion of the system,
such as Newton’s second law.
) The dynamics can be generalized to include parame-
terization, time dependence, and forcing.

• Key observation: for many systems of interest, the


function f consists of only a few terms, making it sparse
in the space of possible functions.

• To determine the function f from data,

SciML - PINN and co. 24


) we collect a time history of the state x(t)
) and either measure the derivative ẋ(t) or approximate
it numerically from x(t)
) The data are sampled at several times t1, . . . , tm and
arranged into two matrices:

2 3 2 !state ! 3
T
x (t1) x1(t1) x2(t1) · · · xn(t1)
6 xT(t2) 7 6 x1(t2) x2(t2) · · · xn(t2) 7 #
X=6
4 . 7=6
5 4 .. .. ... .. 7
5 t
.
xT(tm) x1(tm) x2(tm) · · · xn(tm) #
and similarly for Ẋ

• Next, we construct a library ⇥(X) made up of candidate


nonlinear functions of the columns of X
) For example, ⇥(X) may consist of constant, polyno-
mial, and trigonometric terms:
" #
| | | | | |
P2 P3
⇥(X) = 1 X X X ··· sin(X) cos(X) · · · .
| | | | | |

• Each column of ⇥(X) represents a candidate function


for the right-hand side of Eq. 1.

SciML - PINN and co. 25


• There is tremendous freedom in choosing the entries
in this matrix of nonlinearities, because we believe that
only a few of these nonlinearities are active in each row
of f

• We set up a sparse regression problem to determine the


sparse vectors of coefficients
⇥ ⇤
⌅= ⇠1 ⇠2 · · · ⇠n

that determine which nonlinearities are active,

Ẋ = ⇥(X)⌅

• Each column ⇠ k of ⌅ is a sparse vector of coefficients


determining which terms are active in the right-hand
side for one of the row equations

ẋk = fk (x)

in Eq. 1.

SciML - PINN and co. 26


• Once ⌅ has been determined, a model of each row of
the governing equations may be constructed as follows,

ẋk = fk (x) = ⇥(xT)⇠ k

• Note that ⇥(xT) is a vector of symbolic functions of


elements of x, as opposed to ⇥(X), which is a data
matrix.

• Thus,
T T T
ẋk = fk (x) = ⌅ ⇥(x )

SciML - PINN and co. 27


SINDy Scheme

SciML - PINN and co. 28


SINDy Application

SciML - PINN and co. 29


Equation Discovery: SINDy references

• Code:
https://faculty.washington.edu/kutz/page26/

• Paper:
https://www.pnas.org/doi/10.1073/pnas.1906995116

SciML - PINN and co. 30


ARCHITECTURE BASED
METHODS

SciML - PINN and co. 31


Architecture-based SciML

• Idea: change the architecture used in the ML algorithm


so that it incorporates scientific constraints.
) we open up the black box’s design and change parts
of it so that it obeys these constraints.
) Incorporating scientific principles in this way can re-
strict the range of models the algorithm can learn,
and result in more generalisable and interpretable
models.
) From a machine learning perspective, we are intro-
ducing a strong inductive bias into the model.
) These aspects will be more fully discussed in later
lectures on Bias and Ethics of ML.

• Approaches
) encode certain physical variables, for example using
an LSTM for intermediate variables
) encode symmetries, such as translational and rota-
tional invariance—this can be easily achieved with
convolutional neural networks (CNNs)

SciML - PINN and co. 32


) use Koopman theory [Brunton, Kutz]
) use physically constrained Gaussian processes

SciML - PINN and co. 33


SURROGATE MODELING

SciML - PINN and co. 34


Surrogate Modelling - SUMO

• Surrogate modeling is a technique used to approximate


a complex and expensive-to-evaluate function with a
simpler and cheaper-to-evaluate function.

• The surrogate model, also known as a metamodel or


emulator, is trained on a set of input-output data from
the complex function.

• Once trained, the surrogate model can be used to predict


the output of the complex function for any input value,
without having to evaluate the complex function directly.

SciML - PINN and co. 35


Surrogate Modelling - formulation

• The mathematical presentation of surrogate modeling


depends on the specific type of surrogate model being
used.

• However, there is a general framework that can be


applied to most surrogate models.
) Let f (x) be the complex function that we want to
approximate, and let s(x) be the surrogate model.

• The goal of surrogate modeling is to find a surrogate


models(x)that is as close to the complex function f (x)
as possible, given a limited amount of training data.

• One way to measure the similarity between the complex


function and the surrogate model is to use the mean
squared error (MSE):

N
1 X
MSE = (f (xi) s(xi))2
N i=1

SciML - PINN and co. 36


where N is the number of training data points, and xi
and f (xi) are the i-th input-output pair in the training
dataset.

• The surrogate model can be trained using a variety of


machine learning algorithms, such as kriging, radial basis
functions (RBFs), support vector machines (SVMs), and
artificial neural networks (ANNs).

SciML - PINN and co. 37


Surrogate Modelling - formulation II

• Given a set of training data points

(x1, y1), (x2, y2), ..., (xn, yn),

where xi is the input vector and yi is the output


value, the goal of surrogate modeling is to construct a
function ŷ(x) that approximates the true function y(x)
as accurately as possible.

• The most common approach to surrogate modeling is to


use a statistical model, such as a polynomial response
surface, kriging, or support vector machine. These
models can be trained on the training data points to
learn the relationship between the inputs and outputs.

• Once the surrogate model is trained, it can be used to


predict the output value for any input value x as follows:

ŷ(x) = Model(x)

SciML - PINN and co. 38


where Model(x) is the output of the surrogate model.

SciML - PINN and co. 39


Surrogate Modelling - examples

Surrogate modeling is used in a wide variety of applica-


tions, including:

• Engineering design: Surrogate models can be used to


accelerate the design process by providing cheaper and
faster predictions of the performance of different design
alternatives.

• Scientific computing: Surrogate models can be used to


reduce the computational cost of simulating complex
physical systems.

• Financial modeling: Surrogate models can be used to


predict the risk and return of financial investments.

• Machine learning: Surrogate models can be used to


interpret the predictions of complex machine learning
models.

SciML - PINN and co. 40


Here are some specific examples of surrogate modeling in
use:

• Aerospace engineering: Surrogate models are used to


design aircraft wings, engines, and other components.

• Automotive engineering: Surrogate models are used to


design car engines, suspensions, and other components.

• Chemical engineering: Surrogate models are used to de-


sign chemical reactors, pipelines, and other equipment.

• Oil and gas exploration: Surrogate models are used to


predict the flow of oil and gas through reservoirs.

• Pharmaceutical development: Surrogate models are


used to predict the properties of drugs and to design
clinical trials.

Conclusion

Surrogate modeling is a powerful technique for approx-


imating complex and expensive-to-evaluate functions. It is

SciML - PINN and co. 41


used in a wide variety of applications, including engineer-
ing design, scientific computing, financial modeling, and
machine learning.

SciML - PINN and co. 42


Surrogate Modelling - in practice

Definition 1. Surrogate models, also known as response


surfaces, black-box models, metamodels, or emulators, are
simplified approximations of more complex, higher order
models. These models are used to map input-data to
output-data, when the actual relationship between the two
is unknown or computationally too expensive to evaluate.

• Idea: a ML-trained model can subsitute all or part of a

SciML - PINN and co. 43


system of (P)DEs
) learn the complete, unknown input-output relation
(but no physics constraints)
) learn some sub-parametrization that is too compli-
cated to capture by classical methods

The surrogate modeling process, as depicted in the Figure,


consists of five stages:

1. Choice of the design parameters.

2. Generation of input-output pairs of data for training,


by simulations of the physics-based model, or from
experiments with varying parameter values. This is the
most expensive step.

3. Choice of the surrogate model, based on a suitable


supervised machine learning method.

4. Training of the machine learning model, using the train-


ing data. This can be an expensive step, but is usually
only performed once.

SciML - PINN and co. 44


5. Finally, use of the trained surrogate model to perform
computationally cheap inference for new values of the
design parameters, thus permitting
(a) an exhaustive search of the parameter space,
(b) an optimal parameter design, and/or
(c) a quantification of design uncertainties.

• Questions:
) But, can such a surrogate faithfully capture the com-
plex, nonlinear relationships between input and out-
put?
) And, if so, how can the surrogate do this?

• These two very important questions are fundamental for


any underlying system or process that we would like to
study using surrogate modeling.
) The answer to the first question is: “Yes, in the-
ory,” thanks to the universal approximation prop-
erty of very simple machine learning models—fully-
connected, feed-forward neural networks (FCNN) [Cy-
benko, Pinkus].

SciML - PINN and co. 45


) And the answer to the second question is: “Yes, in
practice,” with the aid of a large variety of supervised
learning techniques, of which FCNNs are just one
special case.

SciML - PINN and co. 46


Tecniques for SUMO

• Both supervised and unsupervised1 learning techniques


can be used for SUMO.

• Four of these techniques are recommended, since they


are robust and perform well:
) random forests and multi-layer perceptrons (FCNN)
in the case of regression, and
) support vector machines (SVM) and k-means clus-
tering in the case of classification.

• There is no single, perfect, globally applicable method


that will always do the best job.

• Usually, one should try a few, and then settle for the
one or two that are the simplest, but provide adequate
precision and especially robustness in the face of the
inherent uncertainties in the underlying processes—see
also the Ethics and Bias Lectures.
1
Adversarial and self-supervised are also possible, but far more complicated to
implement.

SciML - PINN and co. 47


SUMO Principles

• The principle behind the SUMO approach is the follow-


ing:
) if a multiscale, multiphysics relationship between
design parameters and output performance can be
learned from data, then we can forgo—at least to
some extent—the underlying ODE, PDE and popu-
lation dynamics models, as well as time-consuming,
expensive physical/clinical experiments and trials.
) Moreover, once this relationship has been learned,
its use—the so-called inference phase—is very inex-
pensive and one can then envisage the solution of
optimization and uncertainty quantification problems
) We are in fact, constructing a digital twin [Asch2022].

• The approach proposed here is not to seek a complex


machine learning model, but to favor simpler models
that are easier to interpret, trust and deploy—see Ethics
and Bias Lecture.

SciML - PINN and co. 48


• These low-complexity models will not suffer from brit-
tleness and will preserve a good bias-variance trade-off.
They will also have more favorable interpretabilty, ethics
and bias properties.

SciML - PINN and co. 49


SUMO - Training Data

• To learn a data-driven model, we need training data.

• This data is obtained from experimental observations,


or model-based simulations, or some combination of the
two.

) Here we will often, if possible, use model-based sim-


ulations, calculated for example by a SIR model.

• For any supervised machine learning method, we first


need to carefully select and define response variables—
or functions—that is an unknown function of the input
parameters.

• When defining the output parameters for the machine


learning, extreme care needs to be taken to extract reli-
able information describing adequately the phenomenon
that we seek to explore, analyze and forecast

SciML - PINN and co. 50


• Often, sampling techniques must be used to ensure
a space-filling parameter range with good projection
properties.

• The LHS method is a generalization to higher dimensions


of the Latin square which is an n ⇥ n array filled with n
different symbols, each occurring exactly once in each
row and exactly once in each column.
) Assuming a three dimensional parameter space and
ns the number of samples of each parameter, then
each sample is the only one in each axis-aligned
hyperplane containing it.
) On the other hand, building an LHS design with
the best maximin criterion on all projections provides
a space filling design in the whole space and on
projections.

• Hence, we obtain a training set that is well-balanced,


containing a wide range of behaviors, which in turn
will ensure the best possible training for the machine
learning models and then, good predictions.

SciML - PINN and co. 51


SUMO - EDA

• In order to identify useful preliminary information about


the data and investigate the relationships between the
features and the response variables, an exploratory data
analysis—see Basic Course Lectures—should always be
performed, before attempting any surrogate modeling.

• This study can help to detect the interactions among


different variables indiffering contexts, helping us to
understand the importance of their effect on the desired
performance of the system.

• The follwong techniques of EDA should, initially, be


applied (many more are psssible):
) Summary statistics.
) Scatter plots.
) Correlation and partial correlation tables that en-
sure the non-existence of nuisance information in the
database due to a confounding variable.

SciML - PINN and co. 52


SUMO - Concrete Workflow Example

Data

EDA

Regression Classification

RF NN k-means SVM

Optimal
Design

• The approach described above, proposes a universal


workflow for data analysis and surrogate modeling in
the light of optimal design choices.

SciML - PINN and co. 53


• This workflow, as illustrated in the Figure, can be applied
to almost any design/optimization problem.
) It suffices to replace our data by the reader’s data,
) our underlying model by the reader’s model,
) and then simply follow the steps of the workflow.

• Once the data is collected, exploratory data analysis


(EDA) is essential for:
1. Familiarization with the data and choice of response
variables.
2. Elimination of any unusual, or erroneous data points.
3. Preliminary identification of the most influential fea-
tures, or parameters, and the relations—or lack of
relations—between the features themselves, and be-
tween the features and the response variables.
4. Reduction of the complexity by identification of co-
linear variables.

• The next steps are regression and classification.

• Note that classification, especially unsupervised, can be


considered as being a part of EDA, since it can help us
in determining response variables.

SciML - PINN and co. 54


• In our proposed workflow, we have purposefully selected
simple approaches because of their
) broad applicability,
) ease of computation and
) facility of interpretation—no black boxes here.

• We highly recommend, for regression:


1. Random forests, because of their established robust-
ness and their capacity to rank explanatory variables
by their importance.
2. Neural networks, of FCNN type, for their extreme
versatility and their universal approximation proper-
ties.

• We highly recommend, for classification:


1. k-means for initial unsupervised clustering and iden-
tification of groups of properties.
2. SVM for refined, supervised clustering that provides
a surrogate model.

• In the final step, which of course will be context-


dependent, we can exploit all the surrogate models

SciML - PINN and co. 55


found above for optimal design and process planning.

• Once we have a surrogate model, or several surrogate


models, at our disposition, we can address a number of
outer-loop problems [Asch2022], such as:
1. Optimization problems, where we seek to maximize,
or minimize some critical response variables.
2. Uncertainty quantification problems, where we seek
some kind of confidence interval around the parameter
values, given an estimation of the inherent material
or process variabilities.
3. Bayesian optimization problems, that combine the
above two.

• All of these require a large number of simulations, or


large volume of experimental data, that can now be
completely replaced by the surrogate model.
) Recall that the evaluation of a surrogate model,
whose training has already been done in an offline
computation, can be done in quasi-real time.

• For the surrogate models themselves, other ML tech-


niques can be envisaged that could provide further in-

SciML - PINN and co. 56


sight and better predictive power. Among these, we can
mention:
) SVM regression that is well-adapted to highly nonlin-
ear relationships.
) Functional data analysis (FDA) that is particularly
well-suited to time series data.
) Further exploration of more sophisticated neural net-
work architectures. This is particularly indicated in
the presence of multi-physics, or multi-modal data.

• The key findings of the approach presented here can be


summarized as follows.
) Firstly, exploratory data analysis, including correla-
tions, partial correlations and unsupervised classifica-
tion by k-means, leads to the judicious choice of a
few design parameters, and response variables.
) Then, supervised classification and regression meth-
ods, used together in synergy, reveal the optimal
parameter choices and propose hitherto unforeseen
operating regimes.
) We can identify the most influential design param-
eters and we can characterize the ranges of these

SciML - PINN and co. 57


parameters and identify optimal clusterings of these.

SciML - PINN and co. 58


PHYSICS CONSTRAINED
LEARNING

SciML - PINN and co. 59


Physics Constrained Learning - PCL

- Idea: use a NN as part of the (P)DE

• Given a physical relation

F (u; ✓) = 0 (2)

represented by an IBVP, or other functional relationship,


with

) u the physical quantity


) ✓ the (material/medium) properties/parameters

• Inverse Problem is defined as:

) Given observations/measurements of u at the loca-


tions x = {xi}

uobs = {u(xi)}i2I

SciML - PINN and co. 60


) Estimate the parameters ✓ by minimizing a
loss/objective/cost function

2
L(✓) = u(x) uobs 2

subject to (2).

SciML - PINN and co. 61


PCL - use of a NN

• If ✓ = ✓(x), model it by a NN

✓(x) ⇡ NN(x)

• Express the numerical scheme, such as RK, finite dif-


ferences, finite, elements, etc., for approximating the
(P)DE (2) as a computational graph G(✓).

• Use reverse-mode AD (aka. backpropagation) to com-


pute the gradient of L with respect to ✓ and the NN
coefficients (weights and biases).

• Minimize by a suitable gradient algorithm


) Adam, SGD (1st order)
) L-BFGS (quasi-Newton)
) trust-region (2nd order)

SciML - PINN and co. 62


PCL - Recall: use of AD

• Optimization problem: min✓ L(u) subject to F (✓, u) =


0.

• Suppose we have a computational graph for u = G(✓).

• Then L̃(✓) = L (G(✓)) and by the IFT we can compute


the gradient with respect to ✓,
) first of F,
 1
@F @F @G @G @F @F
+ = 0, ) =
@✓ @u @✓ @✓ @u @✓

) then of L̃, by the chain rule,


 1
@ L̃ @L @G @L @F @F
= =
@✓ @u @✓ @u @u @✓

• The first derivative is obtained directly from the loss


function, the second and third by reverse-mode AD

SciML - PINN and co. 63


PCL - applications

• PCL has been successfully applied to numerous academic


and real-life problems.
) Geomechanics, solid mechanics, fluid dynamics, seis-
mic inversion
) General approach to inverse modeling: AD-
CME https://kailaix.github.io/ADCME.jl/
latest/ [Xu, Darve]

SciML - PINN and co. 64


Physics Constrained Learning for Stiff Problems
PCL - applications to stiff (P)DEs

PM = Penalty Method —First[Xu,


Kailai Xu
Darve]
Order Physics Constrained Learning 29 / 50

SciML - PINN and co. 65


PHYSICS INSPIRED
LEARNING

SciML - PINN and co. 66


Physics Inspired Neural Networks -
PINN background

- Idea: put the (P)DE into the ML algorithm, via the


cost function (among others)

• Please review the Optimization Lecture for


) unconstrained optimization with penalization/regularization
) constraned optimization with Lagrange Multipliers

• Physics-informed neural network (PINN) models. [Gho-


lami, et al. NeurIPS, 2021]—see also the Introductory
Lecture.
) The typical approach is to incorporate physical do-
main knowledge as soft constraints on an empiri-
cal loss function and use existing machine learning
methodologies to train the model.
) It can be shown that, while existing PINN method-
ologies can learn good models for relatively trivial
problems, they can easily fail to learn relevant physi-
cal phenomena even for simple PDEs.

SciML - PINN and co. 67


! analyze several distinct situations of widespread
physical interest, including learning differential
equations with convection, reaction, and diffusion
operators.
! provide evidence that the soft regularization in
PINNs, which involves differential operators, can
introduce a number of subtle problems, including
making the problem ill-conditioned.
) Importantly, it can be shown that these possible
failure modes are not due to the lack of expressivity
in the NN architecture, but that the PINN’s setup
makes the loss landscape very hard to optimize.
) Two promising solutions to address these failure
modes.
! The first approach is to use curriculum regular-
ization, where the PINN’s loss term starts from a
simple PDE regularization, and becomes progres-
sively more complex as the NN gets trained.
! The second approach is to pose the problem as
a sequence-to-sequence learning task, rather than
learning to predict the entire space-time at once.
! And there are many, many more “fixes” - this implies
the necessity to treat each case with particular

SciML - PINN and co. 68


attention, and not entertain the “magic wand”
illusion...

SciML - PINN and co. 69


PINN - solving (P)DEs with NNs

Definition 2. The process of solving a differential equa-


tion with a neural network, or using a differential equation
as a regularizer in the loss function, is known as a physics-
informed neural network (PINN), since this allows for phys-
ical equations to guide the training of the neural network
in circumstances where data might be lacking.

• Idea: use the neural network to approximate the solution


to the differential equation, while also satisfying any
other physical constraints of the problem.

) For a scalar ODE, the neural network would have


one input, which is the independent variable, and one
output, which is the dependent variable. The neural
network would be trained to minimize a loss function
that includes both
! the error between the neural network’s output and
the known solution at some data points, and
! the error between the neural network’s output and
the differential equation itself.

SciML - PINN and co. 70


) For a PDE, the neural network would have multiple
inputs, which would represent the independent vari-
ables in the PDE, and multiple outputs, which would
represent the dependent variables in the PDE. The
neural network would be trained to minimize a loss
function that includes both the error between the
neural network’s output and the known solution at
some data points, and the error between the neural
network’s output and the PDE.
) PINNS can solve both direct and inverse problems.
) Warning: As higher frequencies and more multi-
scale features are added, more collocation points
and a larger neural network with significantly more
free parameters are typically required to accurately
approximate the solution. This creates a significantly
more complex optimization problem when training
the PINN—see below for pros and cons.

SciML - PINN and co. 71


SciML - hard vs. soft constraint

• Recall: there are two possible optimization strategies for


constraining the NN (ML) to respect the physics
1. Hard constraints
2. Soft constraints

• Suppose we have a (P)DE of the form

F(u(x, t)) = 0, x 2 ⌦ ⇢ Rd , t 2 [0, T ] ,

where
) F is a differential operator representing the (P)DE
) u(x, t) is the state variable (i.e., quantity of interest),
with x, t) the space-time variables
) T is the time horizon and ⌦ is the spatial domain
(empty for ODEs)
) initial and boundary conditions must be added for the
problem to be well-posed

SciML - PINN and co. 72


• Hard constraint: solve the contrained optimization prob-
lem
min L(u) s.t. F(u) = 0,

where
) L(u) is the data (mismatch) loss term
) F is the constraint on the residual of the (P)DE
under consideration
) as was amply discussed in the DA/inverse problem
context, this type of (P)DE-constrained optimization
is usually quite difficult to code and to solve

• Soft constraint: solve the regularized/penalized uncon-


trained optimization problem

min L(u) + ↵F F(u), (3)


L(u) = Lu0 + Lub ,

where
) Lu0 represents the misfit of the NN predictions
) Lub represents the misfit of the initial/boundary con-
ditions

SciML - PINN and co. 73


) ✓ represents the NN parameters
) ↵F is a regularization parameter that controls the
emphasis on the PDE-based residual (which we ideally
want to be zero)

• Finally, we use ML methods (stochastic optimization,


etc.) to train the NN model to minimize the loss.

SciML - PINN and co. 74


PINN - warnings

1. Even with a large training set, this approach does


not guarantee that the NN will obey the conserva-
tion/governing equations in the constraint (3).

2. In many SciML problems, these sorts of constraints on


the system matter, as they correspond to physical mech-
anisms of the system. For example, if the conservation
of energy equation is only approximately satisfied, then
the system being simulated may behave qualitatively
differently or even result in unrealistic solutions.

3. This approach of incorporating physics-based regular-


ization, where the reglarization constraint, LF , corre-
sponds to a differential operator, is very different than
incorporating much simpler norm-based regularization
(such as L1 or L2 regularization), as is common in ML
more generally. Here, the regularization operator, LF ,
is non-trivially structured—it involves a differential op-
erator that could actually be ill-conditioned, and it does

SciML - PINN and co. 75


not correspond to a nice convex set as is the case for a
norm ball.

4. Moreover, LF corresponds to actual physical quantities,


and there is often an important distinction between
satisfying the constraint exactly versus satisfying the
constraint approximately—the soft constraint approach
doing only the latter.

5. Adding/increasing the PDE-based soft constraint regu-


larization makes it more complex and harder to optimize,
especially for cases with non-trivial coefficients.

6. The loss landscape changes as the regularization pa-


rameter ↵F is changed. Reducing the regularization
parameter can help alleviate the complexity of the loss
landscape, but this in turn leads to poor solutions with
high errors that do not satisfy the PDE/constraint.

SciML - PINN and co. 76


PINN - steps

1. The first step is to define a neural network architecture


that can be used to approximate the solution to the
differential equation.
(a) The neural network should have an input layer, an
output layer, and one or more hidden layers.
(b) The number of neurons in each layer and the activa-
tion function used in each layer must be chosen by
the user.

2. The second step is to collect a dataset of known


solutions to the differential equation.
(a) This dataset can be generated using numerical meth-
ods, or
(b) originate from experimental data.

3. The third step is to define the loss function.


(a) The loss function is a measure of the error between
the neural network’s output and the known solutions
to the differential equation.

SciML - PINN and co. 77


(b) The loss function should be chosen so that it penalizes
the neural network for making errors in both the
spatial and temporal domains.

4. The fourth step is to train the neural network.


(a) The training process is done using an optimization
algorithm, such as gradient descent.
(b) The optimization algorithm minimizes the loss func-
tion by adjusting the weights and biases of the neural
network.

Once the neural network has been trained, and validated, it


can be used to approximate the solution to the differential
equation at any point in the domain. The accuracy of the
approximation will depend on the size and quality of the
training dataset, as well as the architecture of the neural
network.

SciML - PINN and co. 78


PINN - formulation

The mathematical formulation of a typical PINN loss


function is as follows:

L= u (Xu ) + b (Xb ) + r (Xr ),

where,

• L is the global loss function.

• u (Xu )is the loss term that penalizes the error between
the neural network’s output and the known solution at
the training points Xu.

• b (Xb )is the loss term that penalizes the error between
the neural network’s output and the boundary conditions
at the training points Xb.

• the loss term that penalizes the error between


r (Xr )is
the neural network’s output and the residual of the
differential equation at the training points Xr .

SciML - PINN and co. 79


The diagram below depicts how a PINN works:

SciML - PINN and co. 80


PINN - Diagram

SciML - PINN and co. 81


PINN Formulation - Neural Network

• Neural Network:
) a basic/adequate definition is to simply consider a
NN as a mathematical function with some learnable
parameters
) more mathematically, let the network be defined as

NN(x, ✓) : Rdx ⇥ Rd✓ ⇥ Rdu

where
! x are the inputs to the network
! ✓ are a set of learnable paramaters (usually,
weights)
! dx, d✓ and du are the dimensions of the network’s
inputs, parameters and outputs, respectively.
) The exact form of the network function is deter-
mined by the neural network’s architecture. Here we
use feedforward fully-connected networks (FCNNs),
defined as

NN(x, ✓) = fn · · · fi · · · f1(x, ✓),

SciML - PINN and co. 82


where
! x 2 Rd0 is the input to the FCNN
! NN 2 Rdn is the output of the FCNN
! n is the number of layers (depth) of the FCNN
! fi(x, ✓) = i(Wix + b) are element-wise, nonlin-
ear activation functions (usually ReLU or hyperbolic
tangent)
! with ✓i = (Wi, bi),
! Wi 2 Rdi⇥di 1 weight matrices, b 2 Rdi bias
vectors, and
! ✓ = (✓1, . . . , ✓i, . . . , ✓n) are the s et of learnable
parameters/weights of the network.

SciML - PINN and co. 83


Recall: FCNN Architecture and
Activation

FCNN - architecture

input hidden layers


layer output
layer

8/

SciML - PINN and co. 84


NN - neuron activation

(0)
a1
w1,1 ⇣ ⌘
(0) (0) (0) (0)
(1)
a1 = w1,0 a0 + w1,1 a1 + ... + w1,n an + b1
w1,2 n
(0) X
a2 = w1,i ai
(0) (0)
+ b1
w1,3
(1) i=1
a2 0 (1) 1 20
w1,4 1 0 (0) 1
0 (0) 13
a1 w1,0 ... w1,n a1 b1
(0)
a3 (1)
6
6 w2,0 w2,n (0) (0) 7
7
a2 6 ... a2 b2 7
6@ ... .. ..
.. = 6 .. + .. 7
w1,n (1)
a3 . . . A @ . A @ . A7
@ A 4 5
(1)
am wm,0 ... wm,n (0)
an
(0)
bm
(0)
a4 ..
. a (1)
=

W(0) a(0) + b(0)

..
. (1) y= (Wx + b) , for a single hidden layer.
am
(0)
an
9/49

SciML - PINN and co. 85


PINN Formulation - (P)DE

• Recall: PINNs use neural networks to solve problems


related to differential equations

• Consider a general boundary-value problem (I)BVP of


the form

D[u](x) = f (x), x 2 ⌦ ⇢ Rd , (4)


Bk [u](x) = gk (x), x2 k ⇢ @⌦,

where
) D[u](x) is a differential operator
) u(x) is the solution
) Bk (·) are a set of boundary and/or initial conditions
that ensure uniqueness of the solution
) the variable x represents/includes both spatial and
time variables
) the full equation describes many possible contexts:
linear and nonlinear, time-dependent and indepen-
dent, irregular higher-order, cyclic BCs, etc.

SciML - PINN and co. 86


) To solve (4), PINNs use a neural network to directly
approximate the solution,

NN(x, ✓) ⇡ u(x)

• PINN provides a functional approximation to the so-


lution, and not a discretized solution similar to that
provided by traditional methods such as finite difference
methods
) as such PINNs are a mesh-free approach for solving
differential equations

SciML - PINN and co. 87


PINN Formulation - Loss Function

• Loss Function: Let F = 0 be the PDE, B = 0


the boundary/initial conditions, I = 0 the inversion
conditions, then the PINN loss is

L(✓, ; T ) = wf Lf (✓, ; Tf )+wbLb(✓ ; Tb)+wiLi(✓, ; Ti)

where

2
Lf (✓; Tf ) = kF (û, x, )k2
2
Lb(✓; Tb) = kB(û, x)k2
1 X 2
Li(✓, , Ti) = kI(û, x)k2
|Ti|
x2Ti

and
) x are the training points,
) û the approximate solution,
) the inversion coefficients,

SciML - PINN and co. 88


) w the weights that ensure balance among the different
loss function terms
) T are the sets of training points for each loss term

• The solution is then given by,

{✓⇤, ⇤
} = argmin L(✓, ; T )
✓,

• Note: solving the inverse problems requires only the


addition of one term in the loss function, and nothing
more!

SciML - PINN and co. 89


PINN Formulation - Error Analysis

• Error analysis can been derived23, in terms of


) optimization error eo = kûT uT k
) generalization error eg = kuT uF k
) approximation error ea = kuF uk

• then
.
e = kûT uk  eo + eg + ea

2
Lu, Karniadakis, SIAM Review, 2021.
3
Mishra, Molinaro; arXiv:2006.16144v2 and IMA J. of Numerical Analysis,
Volume 43, Issue 1, January 2023, Pages 1–43.

SciML - PINN and co. 90


PINN Formulation - Loss Function (II)

• The values for the loss function are available, in general,


at discrete points, often called collocation points.

• We will write down the terms explicitly in this case, for


the direct problem, with composite loss function,

L(✓) = LD (x, ✓) + LB (x, ✓), (5)

where
) (P)DE residual is defined as

NI
↵I X 2
LD (x, ✓) = (D[u](xi, ✓) f (xi))
NI i=1

) (I)BC residual is defined as


k
Nk NB
X ↵k X 2
k
LB (x, ✓) = B
k
B k [u](x i , ✓) gk (xki ) ,
N
k=1 B i=1

SciML - PINN and co. 91


where
NI
! {xi}i=1 is a set of collocation points sampled in
the interior
k
of the domain
NB
! xkj j=1 is a set of points sampled along each
boundary condition (where k permits to separate
Dirichlet, Neumann, mixed, and initial conditions)
k
! ↵I and ↵B are well-chosen scalar weights, chosen
by suitable tuning methods, that ensure the terms
in the loss function are well-balanced.

SciML - PINN and co. 92


PINN Formulation - Loss Function
(III)

We can see, intuitively, that

• by minimizing the (P)DE residual, the method tries to


ensure that the solution learned by the network obeys
the underlying PDE, and

• by minimizing the (I)BC residual, the method tries to


ensure that the learned solution is unique by matching
it to the BCs.

• Note: a sufficient number of collocation and boundary


points must be chosen such that the PINN is able to
learn a consistent solution across the domain.

As usual (see Optimization Lecture), iterative schemes are


typically used to optimize this loss function

• variants of the stochastic gradient descent (SGD)


method, such as the Adam optimizer, or

SciML - PINN and co. 93


• quasi-Newton methods, such as the limited-memory
Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algo-
rithm are employed.

Note:

• These methods require the computation of the gradient


of the loss function with respect to the network param-
eters, which can computed easily and efficiently using
automatic differentiation provided systematically in all
modern deep learning libraries

• Note also that gradients of the network output with re-


spect to its inputs are also typically required to evaluate
the PDE residual in the loss function, and can similarly
be obtained and further differentiated through to update
the network’s parameters using automatic differentiation
once again.

Recall the global flowchart for PINN:

SciML - PINN and co. 94


+

Here is a case with 2 hidden layer NN:


32 2.3. Ways to incorporate scientific principles into machine learning

Figure 2.3: Schematic of a physics-informed neural network (PINN). A PINN is a neural


network, NN (x; ), with trainable parameters , which directly approximates the solution,
u(x), to a differential equation, i.e. NN (x; ) u(x). To train the PINN, a loss function is
used which is composed of two terms, one termed the “boundary” loss which tries to match
the PINN solution to the known solution (and/or its derivatives) along the boundaries of
the domain, and another termed the “physics” loss which tries to minimise the residual
of the underlying equation at a set of locations within the domain. The derivatives of
the PINN solution with respect to its inputs required by the boundary and physics loss
are -obtained
SciML PINN and using
co. autodifferentiation. PINNs can also be used for inverse problems, in95
which case an additional “data” loss is added which compares the PINN solution with
known solution values at additional locations within the domain, and parameters of the
underlying equation are jointly optimised alongside .
PINN - pros and cons

• Here are some of the advantages of using PINNs to


solve differential equations:
4 They can be used to solve a wide variety of differen-
tial equations, including ODEs and PDEs.
4 They can be used to solve problems with complex
geometries and non-linear behavior.
4 They are essentially mesh-free.
4 They can be trained to be very accurate, even with
limited data and noisy data.
4 They are relatively easy to implement, leveraging AD
capabilities.
4 When they work, they can provide impressive speed-
ups of 3 to 4 orders of magnitude.

• Here are some of the disadvantages of using PINNs to


solve differential equations:
8 They can produce a horrendous optimization prob-
lem.

SciML - PINN and co. 96


8 They can be computationally expensive to train.
8 They have difficulty with high frequencies and multi-
ple scales.
8 They can be sensitive to the choice of hyperparam-
eters, in particular to the network architecture and
size.
8 They can be difficult to interpret, as the neural
network may learn a complex relationship between
the inputs and outputs that is not easily understood.

• Overall, PINNs are a promising new approach to solving


differential equations.
) They provide powerful tools that can be used to solve
a wide variety of problems, but they also have some
severe limitations.

SciML - PINN and co. 97


PINN - remedies

• A downside of training PINNs with the loss function


given by (5) is that the BCs are softly enforced.
) This means the learned solution may deviate from the
BCs because the BC term may not be fully minimized.
) Furthermore, it can be challenging to balance the
different objectives of the PDE and BC terms in the
loss function, which can lead to poor convergence
and solution accuracy.

• One possibility is to enforce BCs in a hard fashion by


using the neural network as part of a solution ansatz.
This will be shown in some of the examples below.

• Many other “fixes” have been formulated (see references,


in particular arXiv, where new solutions appear almost
daily...)

SciML - PINN and co. 98


PINN Remedies - enforcing hard BCs

• Idea: use the NN as part of a solution ansatz, that


by definition satisfies the BC, thus avoiding the soft
constraint on LB in (5)

• More precisely, we approximate the solution of the


(P)DE by
C[u](x, ✓) ⇡ u(x, ✓)
where C is an appropriately selected constraining oper-
ator that analytically/exactly enforces the BCs

• Example:

) suppose we want to enforce

u(x = 0) = 0

in a scalar ODE

SciML - PINN and co. 99


) The constraining operator and solution ansatz could
be chosen as

C[u](x, ✓) = (tanh x) u(x, ✓)

or any other function whose value at x = 0 is zero


! the function tanh(x) is zero at 0, forcing the BC
to always be obeyed, but non-zero away from 0,
allowing the network to learn the solution away
from the BC.

• With this approach, the BCs are always satisfied and


therefore the BC term in the loss function (5) can be
removed,
) the PINN can be trained using the simpler uncon-
strained loss function,

N
1 X 2
L(✓) = (D[Cu](xi, ✓) f (xi))
N i=1

N
where {xi}i=1 is a set of collocation points sampled
in the interior of the domain

SciML - PINN and co. 100


• Notes:
) There is no unique way of choosing the constraining
operator, and the definition of a suitable constraining
operator for complex geometries and/or complex BCs
may be difficult or sometimes even impossible, i.e.,
this strategy is problem dependent; in this case,
one may resort to the soft enforcement of boundary
conditions (5) instead.
) A promising approach for alleviating the difficulties
when higher frequencies and multi-scale features are
added to the solution, is to use a finite basis PINN ap-
proach (FBPINN) [Dolean, Heinlein, Mishra, Moseley
2023], where instead of using a single neural network
to represent the solution, many smaller neural net-
works are confined in overlapping subdomains and
summed together to represent the solution.

SciML - PINN and co. 101


Example: IBVP for Diffusion Equation

Compute u(x, t) : ⌦ ⇥ [0, T ] ! R such that

@u(x, t)
r · ( (x)ru(x, t)) = f (x, t) in ⌦ ⇥ (0, T),
@t
(6)
u(x, t) = gD (x, t) on @⌦D ⇥ (0, T),
(x)ru(x, t) · n = gR(x, t) on @⌦R ⇥ (0, T),
u(x, 0) = u0(x) for x 2 ⌦.

Note that (x) is, in general, a tensor (matrix) with


elements ij .

• Direct problem: given , compute u.

• Inverse problem: given u, compute .

SciML - PINN and co. 102


212 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

Procedure 2.1 The PINN algorithm for solving di↵erential equations.


Step 1 Construct a neural network û(x; ✓) with parameters ✓.
Step 2 Specify the two training sets Tf and Tb for the equation and boundary/initial
wnloaded 04/26/21 to 86.252.125.179. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

conditions.
PINN for the Diffusion Equation
Step 3 Specify a loss function by summing the weighted L2 norm of both the PDE
equation and boundary condition residuals.
Step 4 Train the neural network to find the best parameters ✓ ⇤ by minimizing the
loss function L(✓; T ).

PDE( )
@
NN(x, t; ✓) @t Tf
@ û @ 2 û
@t @x2
@2
@x2
x
.. .. û Minimize
t . . I û(x, t) gD (x, t) Loss ✓⇤

@ @ û
Tb
@n @n (x, t) gR (u, x, t)

BC & IC

2
Fig. 1 Schematic of a PINN for solving the di↵usion equation @u = @@xu2 with mixed boundary
[Credit: Lu, Karniadakis, SIAM Review, 2021] @u
@t
conditions (BC) u(x, t) = gD (x, t) on D ⇢ @⌦ and @n (x, t) = gR (u, x, t) on R ⇢ @⌦. The
initial condition (IC) is treated as a special type of boundary condition. Tf and Tb denote
the two sets of residual points for the equation and BC/IC.

• Use FCNN to approximate u at the selected points x,


with training
functions usingdata at
AD, whichresidual points
is conveniently
such as TensorFlow [1] and PyTorch [43].
and
integrated
T f
in T
machine learning packages
b
In the next step, we need to restrict the neural network û to satisfy the physics
imposed by the PDE and boundary conditions. In practice, we restrict û on some
• Usescattered
AD to compute
points derivatives
(e.g., randomly distributedfor theorPDE
points, clusteredand thein the domain
points
boundary/initial conditions
[37]), i.e., the training data T = {x1 , x2 , . . . , x|T | } of size |T |. In addition, T comprises
two sets, Tf ⇢ ⌦ and Tb ⇢ @⌦, which are the points in the domain and on the
boundary, respectively. We refer to Tf and Tb as the sets of “residual points.”
• Minimize the augmented, weighted loss function
To measure the discrepancy between the neural network û and the constraints,
we consider the loss function defined as the weighted summation of the L2 norm of
residuals for the equation and boundary conditions:

(2.2) L(✓; T ) = wf Lf (✓; Tf ) + wb Lb (✓; Tb ),


SciML - PINN and co. 103
where
✓ ◆ 2
1 X @ û @ û @ 2 û @ 2 û
Lf (✓; Tf ) = f x; ,..., ; ,..., ;...; ,
|Tf | @x1 @xd @x1 @x1 @x1 @xd
PHYSICS OPERATOR
LEARNING

SciML - PINN and co. 104


Recall: Universal Approximation for
Operators

Theorem 3 (Chen, Chen 1995). Suppose is contin-


uous, non-polynomial, X is a Banach space, K1 ⇢ X,
K2 ⇢ Rd are compact sets, V is compact in C(K1), G is
continuous operator from V into C(K2). Then, for any
✏ > 0, there exist positive integers m, n, p, constants cik ,
k
⇠ij , ✓ik , ⇣k 2 R, wk 2 Rd, xj 2 K1, such that
0 1
p X
X n Xm
G(u)(y) cki @ k
⇠ij u(xj ) + ✓ik A (wk · y + ⇣k )
k=1 i=1 j=1

<✏

for all u 2 V, y 2 K2.

SciML - PINN and co. 105


Operator Learning - DeepONet, FNO,
PINO

• Idea: train a NN to learn the (P)DE operator


) used by NVIDIA in FourCastNet

• A related set of approaches which incorporate governing


equations into their loss function are physics-informed
neural operators (PINO).

• These are neural networks which are similar to PINNs in


that they are designed to learn the solution to differen-
tial equations, but instead of learning a single solution
they learn an entire family of solutions by adding cer-
tain inputs of the differential equation as inputs to the
network.

• Thus, they do not need to be retrained to carry out


new simulations, and during inference they offer a fast
surrogate model.

SciML - PINN and co. 106


• From a mathematical standpoint, the goal is to learn
an operator to map function spaces to function spaces,
rather than just a single function.

• DeepONet consisted of two sub-networks,


) a “branch” network that encodes the input function
(by taking discretised samples of the input function
at fixed locations as input), and
) a “trunk” network that encodes a set of input coordi-
nates.
) The solution of the differential equation is then ap-
proximated by merging the outputs of both of these
sub-networks.
) The network is trained using a loss function that
extends the PINN loss function (5) by averaging over
many random samples of the input function.

• FNO (Fourier Neural Operator) uses a physics-informed


loss function when training Fourier neural operators
) Similar to DeepONets, FNOs learn an operator to
map between function spaces.
) This is achieved by using a series of stacked Fourier

SciML - PINN and co. 107


layers, where the input to each layer is Fourier trans-
formed and truncated to a fixed number of Fourier
modes.
) This truncation allows the model to learn mappings
that are invariant to the number of discrete points
used in its inputs and outputs.

SciML - PINN and co. 108


Operator Nets

• Use the Universal Operator Approximation Theorem...

0 1
p X
X n m
X
k @ k k
G(u)(y) ci ⇠ij u(xj ) + ✓i A (wk · y + ⇣k )
| {z }
k=1 i=1 j=1 trunk
| {z }
branch

< ✏,

where
) G is the solution operator,
) u is an input function,
) xi are “sensor” points,
) y are random points where we evaluate the output
function G(u).

• 2 main contenders:
) DeepONet
) Fourier Neural Operators (FNO)—a special case of
DeepONet

SciML - PINN and co. 109


DeepONet Architecture

DeepONet architecture

Branch network
u(x1 )
..
.

u(xm )

G(u)(y)

Trunk network
28/49

Directly copied from the Theorem!

SciML - PINN and co. 110


DeepONet Loss Function

• Branch (FCNN, ResNET, CNN, etc.) and trunk net-


works (FCNN) are merged by an inner product.

• Prediction of a function u evaluated at points y is then


given by

q
X
G✓ (u)(y) = bk (u(x)) tk (y) +b0
| {z } | {z }
k=1 branch trunk

• Training weights and biases, ✓, computed by minimizing


the loss (mini-batch by Adam, single-batch by L-BFGS)

N P
1 XX (i) (i)
2
Lo(✓) = G✓ (u(i))(yj ) G(u (i)
)(yj )
N P i=1 j=1

SciML - PINN and co. 111


DeepONet: Pros and Cons

• Pros:
4 relatively fast training (compared to PINN)
4 can overcome the curse of dimensionality (in some
cases...)
4 suitable for multiscale and multiphysics problems

• Cons:
8 no guarantee that physics is respected
8 require large training sets of paired input-output ob-
servations (expensive!)

SciML - PINN and co. 112


DeepONet Formulation (I)

• Parametric, linear/nonlinear operator plus IBC (IBVP)

O(u, s) = 0,
B(u, s) = 0,

• where
) u 2 U is the input function (parameters),
) s 2 S is the hidden, solution function

• If 9! solution s = s(u) 2 S to the IBVP, then we can


define the solution operator G : U 7! S by

G(u) = s(u).

SciML - PINN and co. 113


DeepONet Formulation (II)

• Approximate the solution map G by a DeepONet G✓

q
X
G✓ (u)(y) = bk (u(x)) tk (y) +b0
| {z } | {z }
k=1 branch trunk

where ✓ represents all the trainable weights and biases,


computed by minimizing the loss at a set of P random
p
output points {yj }j=1

P
1X 2
L(u, ✓) = |G✓ (u)(yj ) s(yj )| ,
P j=1

and s(yj ) is the PDE solution evaluated at P locations


in the domain of G(u)

SciML - PINN and co. 114


DeepONet Formulation (III)

• To obtain a vector output, a stacked version is defined


by repeated sampling over i = 1, . . . , N, giving the
overall operator loss

N P
1 XX (i) (i)
2
Lo(✓) = G✓ (u(i))(yj ) s (i)
(yj )
N P i=1 j=1

SciML - PINN and co. 115


DeepONet + PINN = PI-DeepONet

• We can combine the two, to get the best of both worlds

L(✓) = wf Lf (G✓ (u)(y))+wbLb(G✓ (u)(y))+woLo(G✓ (u)(y)

• Results:4

) no need for paired input-ouput observations, just sam-


ples of the input function and BC/IC (self-supervised
learning)
) respects the physics
) improved predictive accuracy
) ideal for parametric PDE studies—optimization, pa-
rameter estimation, screening, etc.

4
Wang, Wang, Bhouri, Perdikaris. arXiv:2103.10974v1, arXiv:2106.05384,
arXiv:2110.01654, arXiv:2110.13297

SciML - PINN and co. 116


DeepONet

Branch Net PDE

Minimize
Loss

Trunk Net BC

[Credit: Wang, Wang, Perdikaris; arXiv, 2021]

• Train by minimizing the composite loss

L(✓) = Lo(✓) + L (✓),

where

) the operator loss is as above for deepOnet, or using


the IBC

1 X X ⇣ (i) (i) ⌘
N P 2
(i) (i)
Lo(✓) = B u (xj ), G✓ (u )(yj )
N P i=1 j=1

SciML - PINN and co. 117


) the physics loss is computed using the operator net-
work approximate solution

1 XN X Q ⇣ ⌘ 2
(i) (i)
L (✓) = O u(i)(xj ), G✓ (u(i))(yj )
N Q i=1 j=1

• This is self-supervised, and does not require paired


input-output observations!

SciML - PINN and co. 118


OTHER METHODS

SciML - PINN and co. 119


Differentiable Physics

A potentially powerful hybrid approach is to open up the


black box of a traditional algorithm and tightly integrate
ML models within it.

• This allows a more granular way of balancing the two


paradigms;
) ML can be inserted where we are unsure how to
solve a problem, or where the traditional workflow is
computationally expensive, and
) traditional components can be kept where we require
robust and interpretable outputs.

• Often, the performance of the traditional workflow is


improved whilst the ML components are easier to train,
are more interpretable, and require less parameters and
training data compared to a naive ML approach.

• A general approach for doing so is to use concepts from


the field of differentiable physics

SciML - PINN and co. 120


) many traditional scientific algorithms can be written
as a composition of basic and differentiable math-
ematical operations (such as matrix multiplication,
addition, subtraction, etc), and that
) modern automatic differentiation and differential pro-
gramming languages [Baydin et al., 2018] make it
easy to track and backpropagate the gradients of
these outputs with respect to their inputs.
) This unlocks the possibility of inserting and train-
ing gradient-based ML components (such as neural
networks) within traditional workflows, whereas oth-
erwise it may have been difficult to do so.

• A simple approach to start with is to re-implement


a traditional workflow inside a modern differentiable
programming language, such as PyTorch, TensorFlow or
JAX.

• Once a traditional algorithm is implemented, its de-


sign can be altered by treating certain parameters as
learnable, or by inserting new learned components.

SciML - PINN and co. 121


Neural ODEs

44 2.3. Ways to incorporate scientific principles into machine learning

ODE solver

+,
= ..(!, #; *)
+-
Input Output
# ..(
!(# = #% ) !( ..0 !'(# = #( ; *)
!0 …

Loss function

Match to true solution


!(# = #( )

Figure 2.4: Schematic of a neural ordinary differential equation (ODE). The goal of a
neural ODE is to learn the right-hand side term of an unknown ODE. A neural network
NN (u, t; ) is used to represent this term, which is trained by using many examples of the
Schematic of a neural ordinary differential equation
solution of the ODE at two times, u(t = t0 ) and u(t = t1 ). More specifically, a standard
(ODE) [Moseley2022].
ODE solver is used to model the solution of the ODE, u(t = t1 ), at time t = t1 given the
solution at time t = t0 and evaluations of the network where needed. Then, the network’s
free parameters, , are updated by matching this estimated solution with the true solution
and differentiating through the entire ODE solver.
• The goal of a neural ODE is to learn the right-hand side
term of
2.3.3.3 an unknown
Neural ODE.
differential equations

The approaches above are simple use cases of differentiable physics in that they

• only
A learn existing
neural parametersNN(u,
network in the traditional
t; ✓) isworkflow.
used However, differentiable
to represent this
physics is more flexible than this; it also allows more complex ML modules (such
term, which is trained by using many examples of the
as neural networks) to be inserted into traditional algorithms, and for these to be
solution of the ODE at two times, u(t = t0) and
learnt by training the entire algorithm end-to-end.
u(t = t1).
An example of this are neural differential equations [Chen et al., 2018a, Rackauckas
et al., 2020], which combine neural networks with traditional numerical solvers in
SciML - PINN and co. 122
order to discover terms in underlying equations. Neural differential equations were
first proposed in the field of ML by Chen et al. [2018a] in the context of continuous-
depth models. More precisely, they used a neural network to learn the right-hand
• More specifically, a standard ODE solver is used to
model the solution of the ODE, u(t = t1), at time
t = t1 given the solution at time t = t0 and evaluations
of the network where needed.

• Then, the network’s free parameters, ✓, are updated by


matching this estimated solution with the true solution
and differentiating through the entire ODE solver

SciML - PINN and co. 123


Other Approaches

• Recurrent NNs - see FIDL example

• Material design (META) uses Graph NNs

• GP and Ridge regression - used by Mendez

• LSTM

• Encoder-decoder

• in fact the list is never-ending...

• and now there is generative learning!

SciML - PINN and co. 124


GENERATIVE PHYSICS
LEARNING

SciML - PINN and co. 125


GPT

• GPT = Genrative Pre-trained Transformer

• Theory:
) ingest huge volumes of data
) “fill in the gaps” using Markov Chains on tokens

• Applications
) NWP + Climatology (ClimaX by Microsoft)
) healthcare and drug-design (Alpha-Fold)
) etc.

• Quo Vadimus??? See Introductory and Ethics Lectures


for more details.

SciML - PINN and co. 126


APPLICATIONS

SciML - PINN and co. 127


Applications of ML for (P)DEs

• Literally, from ALL domains...

• See next lecture.

SciML - PINN and co. 128


Bibliography-Reviews

References

[1] S A Faroughi, N Pawar, C Fernandes, M. Raissi,


S. Das, N K Kalantari, S K Mahjour. Physics-
Guided, Physics-Informed, and Physics-Encoded Neu-
ral Networks in Scientific Computing. arXiv, 2023.
https://arxiv.org/pdf/2211.07377

[2] S. Cuomo, V. Schiano Di Cola, F. Giampaolo, G.


Rozza, M. Raissi, F. Piccialli. Scientific Machine Learn-
ing Through Physics–Informed Neural Networks: Where
we are and What’s Next. Journal of Scientific Comput-
ing (2022) 92:88.

[3] L Lu, P Jin, G Pang, Z Zhang, GE Karniadakis. Learn-


ing nonlinear operators via DeepONet based on the
universal approximation theorem of operators. Nature
Machine Intelligence 3 (3), 218-229 (2021).

SciML - PINN and co. 129


[4] L. Lu, X. Meng, Z. Mao, G. Karniadakis. DeepXDE:
A Deep Learning Library for Solving Differential Equa-
tions. SIAM Review, 63, 1 (2021).

[5] E. Darve, K. Xu. Physics constrained learning for data-


driven inverse modeling from sparse observations. J. of
Computational Physics, 453. (2022).

[6] M. Raissi, P. Perdikaris, G. E Karniadakis. Physics-


informed neural networks: A deep learning framework
for solving forward and inverse problems involving non-
linear partial differential equations. J. of Computational
Physics, 378, pp. 686-707, 2021.

[7] N. Kovachki, Z. Li, B. Liu, K. Bhattacharya, A. Stuart,


A. Anandkulmar. Neural Operator: Learning Maps Be-
tween Function Spaces With Applications to PDEs. J.
of Machine Learning Research 24 (2023).

[8] M. Asch, M. Bocquet, M. Nodet. Data Assimilation:


Theory, Algorithms and Applications. SIAM, 2016.

[9] M. Asch. Digital Twins: from Model-Based to Data-


Driven. SIAM, 2022.

SciML - PINN and co. 130

You might also like