05 Sciml PINN
05 Sciml PINN
ML/NN
2023
) JAX,
) PyTorch,
) TensorFlow.
• Curse of dimensionality
N
X
G(x) = ↵j (yj · x + ✓j )
j=1
M( ) = span (w · x + b) : w 2 Rd, b 2 R ,
m1 ,...,ms . s mi
is dense in C (R ) = \i=1C (Rd).
d
Input x1
Input x2
Output y
Input x3
Input x4
• Others:
) Neural ODEs
) Differentiable physics
) Koopman theory
• An alternative classification:
) constrained (including PINN and PCL)
) encoded (DL-type architectures)
) operator-based (DeepONet, FNO)
d
x(t) = f (x(t)), (1)
dt
where
) vector x(t) denotes the state of a system at time t,
) the function f represents the dynamic constraints
that define the equations of motion of the system,
such as Newton’s second law.
) The dynamics can be generalized to include parame-
terization, time dependence, and forcing.
2 3 2 !state ! 3
T
x (t1) x1(t1) x2(t1) · · · xn(t1)
6 xT(t2) 7 6 x1(t2) x2(t2) · · · xn(t2) 7 #
X=6
4 . 7=6
5 4 .. .. ... .. 7
5 t
.
xT(tm) x1(tm) x2(tm) · · · xn(tm) #
and similarly for Ẋ
Ẋ = ⇥(X)⌅
ẋk = fk (x)
in Eq. 1.
• Thus,
T T T
ẋk = fk (x) = ⌅ ⇥(x )
• Code:
https://faculty.washington.edu/kutz/page26/
• Paper:
https://www.pnas.org/doi/10.1073/pnas.1906995116
• Approaches
) encode certain physical variables, for example using
an LSTM for intermediate variables
) encode symmetries, such as translational and rota-
tional invariance—this can be easily achieved with
convolutional neural networks (CNNs)
N
1 X
MSE = (f (xi) s(xi))2
N i=1
ŷ(x) = Model(x)
Conclusion
• Questions:
) But, can such a surrogate faithfully capture the com-
plex, nonlinear relationships between input and out-
put?
) And, if so, how can the surrogate do this?
• Usually, one should try a few, and then settle for the
one or two that are the simplest, but provide adequate
precision and especially robustness in the face of the
inherent uncertainties in the underlying processes—see
also the Ethics and Bias Lectures.
1
Adversarial and self-supervised are also possible, but far more complicated to
implement.
Data
EDA
Regression Classification
RF NN k-means SVM
Optimal
Design
F (u; ✓) = 0 (2)
uobs = {u(xi)}i2I
2
L(✓) = u(x) uobs 2
subject to (2).
• If ✓ = ✓(x), model it by a NN
✓(x) ⇡ NN(x)
where
) F is a differential operator representing the (P)DE
) u(x, t) is the state variable (i.e., quantity of interest),
with x, t) the space-time variables
) T is the time horizon and ⌦ is the spatial domain
(empty for ODEs)
) initial and boundary conditions must be added for the
problem to be well-posed
where
) Lu0 represents the misfit of the NN predictions
) Lub represents the misfit of the initial/boundary con-
ditions
where,
• u (Xu )is the loss term that penalizes the error between
the neural network’s output and the known solution at
the training points Xu.
• b (Xb )is the loss term that penalizes the error between
the neural network’s output and the boundary conditions
at the training points Xb.
• Neural Network:
) a basic/adequate definition is to simply consider a
NN as a mathematical function with some learnable
parameters
) more mathematically, let the network be defined as
where
! x are the inputs to the network
! ✓ are a set of learnable paramaters (usually,
weights)
! dx, d✓ and du are the dimensions of the network’s
inputs, parameters and outputs, respectively.
) The exact form of the network function is deter-
mined by the neural network’s architecture. Here we
use feedforward fully-connected networks (FCNNs),
defined as
FCNN - architecture
8/
(0)
a1
w1,1 ⇣ ⌘
(0) (0) (0) (0)
(1)
a1 = w1,0 a0 + w1,1 a1 + ... + w1,n an + b1
w1,2 n
(0) X
a2 = w1,i ai
(0) (0)
+ b1
w1,3
(1) i=1
a2 0 (1) 1 20
w1,4 1 0 (0) 1
0 (0) 13
a1 w1,0 ... w1,n a1 b1
(0)
a3 (1)
6
6 w2,0 w2,n (0) (0) 7
7
a2 6 ... a2 b2 7
6@ ... .. ..
.. = 6 .. + .. 7
w1,n (1)
a3 . . . A @ . A @ . A7
@ A 4 5
(1)
am wm,0 ... wm,n (0)
an
(0)
bm
(0)
a4 ..
. a (1)
=
⇣
W(0) a(0) + b(0)
⌘
..
. (1) y= (Wx + b) , for a single hidden layer.
am
(0)
an
9/49
where
) D[u](x) is a differential operator
) u(x) is the solution
) Bk (·) are a set of boundary and/or initial conditions
that ensure uniqueness of the solution
) the variable x represents/includes both spatial and
time variables
) the full equation describes many possible contexts:
linear and nonlinear, time-dependent and indepen-
dent, irregular higher-order, cyclic BCs, etc.
NN(x, ✓) ⇡ u(x)
where
2
Lf (✓; Tf ) = kF (û, x, )k2
2
Lb(✓; Tb) = kB(û, x)k2
1 X 2
Li(✓, , Ti) = kI(û, x)k2
|Ti|
x2Ti
and
) x are the training points,
) û the approximate solution,
) the inversion coefficients,
{✓⇤, ⇤
} = argmin L(✓, ; T )
✓,
• then
.
e = kûT uk eo + eg + ea
2
Lu, Karniadakis, SIAM Review, 2021.
3
Mishra, Molinaro; arXiv:2006.16144v2 and IMA J. of Numerical Analysis,
Volume 43, Issue 1, January 2023, Pages 1–43.
where
) (P)DE residual is defined as
NI
↵I X 2
LD (x, ✓) = (D[u](xi, ✓) f (xi))
NI i=1
Note:
• Example:
u(x = 0) = 0
in a scalar ODE
N
1 X 2
L(✓) = (D[Cu](xi, ✓) f (xi))
N i=1
N
where {xi}i=1 is a set of collocation points sampled
in the interior of the domain
@u(x, t)
r · ( (x)ru(x, t)) = f (x, t) in ⌦ ⇥ (0, T),
@t
(6)
u(x, t) = gD (x, t) on @⌦D ⇥ (0, T),
(x)ru(x, t) · n = gR(x, t) on @⌦R ⇥ (0, T),
u(x, 0) = u0(x) for x 2 ⌦.
conditions.
PINN for the Diffusion Equation
Step 3 Specify a loss function by summing the weighted L2 norm of both the PDE
equation and boundary condition residuals.
Step 4 Train the neural network to find the best parameters ✓ ⇤ by minimizing the
loss function L(✓; T ).
PDE( )
@
NN(x, t; ✓) @t Tf
@ û @ 2 û
@t @x2
@2
@x2
x
.. .. û Minimize
t . . I û(x, t) gD (x, t) Loss ✓⇤
@ @ û
Tb
@n @n (x, t) gR (u, x, t)
BC & IC
2
Fig. 1 Schematic of a PINN for solving the di↵usion equation @u = @@xu2 with mixed boundary
[Credit: Lu, Karniadakis, SIAM Review, 2021] @u
@t
conditions (BC) u(x, t) = gD (x, t) on D ⇢ @⌦ and @n (x, t) = gR (u, x, t) on R ⇢ @⌦. The
initial condition (IC) is treated as a special type of boundary condition. Tf and Tb denote
the two sets of residual points for the equation and BC/IC.
<✏
0 1
p X
X n m
X
k @ k k
G(u)(y) ci ⇠ij u(xj ) + ✓i A (wk · y + ⇣k )
| {z }
k=1 i=1 j=1 trunk
| {z }
branch
< ✏,
where
) G is the solution operator,
) u is an input function,
) xi are “sensor” points,
) y are random points where we evaluate the output
function G(u).
• 2 main contenders:
) DeepONet
) Fourier Neural Operators (FNO)—a special case of
DeepONet
DeepONet architecture
Branch network
u(x1 )
..
.
u(xm )
G(u)(y)
Trunk network
28/49
q
X
G✓ (u)(y) = bk (u(x)) tk (y) +b0
| {z } | {z }
k=1 branch trunk
N P
1 XX (i) (i)
2
Lo(✓) = G✓ (u(i))(yj ) G(u (i)
)(yj )
N P i=1 j=1
• Pros:
4 relatively fast training (compared to PINN)
4 can overcome the curse of dimensionality (in some
cases...)
4 suitable for multiscale and multiphysics problems
• Cons:
8 no guarantee that physics is respected
8 require large training sets of paired input-output ob-
servations (expensive!)
O(u, s) = 0,
B(u, s) = 0,
• where
) u 2 U is the input function (parameters),
) s 2 S is the hidden, solution function
G(u) = s(u).
q
X
G✓ (u)(y) = bk (u(x)) tk (y) +b0
| {z } | {z }
k=1 branch trunk
P
1X 2
L(u, ✓) = |G✓ (u)(yj ) s(yj )| ,
P j=1
N P
1 XX (i) (i)
2
Lo(✓) = G✓ (u(i))(yj ) s (i)
(yj )
N P i=1 j=1
• Results:4
4
Wang, Wang, Bhouri, Perdikaris. arXiv:2103.10974v1, arXiv:2106.05384,
arXiv:2110.01654, arXiv:2110.13297
Minimize
Loss
Trunk Net BC
where
1 X X ⇣ (i) (i) ⌘
N P 2
(i) (i)
Lo(✓) = B u (xj ), G✓ (u )(yj )
N P i=1 j=1
1 XN X Q ⇣ ⌘ 2
(i) (i)
L (✓) = O u(i)(xj ), G✓ (u(i))(yj )
N Q i=1 j=1
ODE solver
+,
= ..(!, #; *)
+-
Input Output
# ..(
!(# = #% ) !( ..0 !'(# = #( ; *)
!0 …
…
Loss function
Figure 2.4: Schematic of a neural ordinary differential equation (ODE). The goal of a
neural ODE is to learn the right-hand side term of an unknown ODE. A neural network
NN (u, t; ) is used to represent this term, which is trained by using many examples of the
Schematic of a neural ordinary differential equation
solution of the ODE at two times, u(t = t0 ) and u(t = t1 ). More specifically, a standard
(ODE) [Moseley2022].
ODE solver is used to model the solution of the ODE, u(t = t1 ), at time t = t1 given the
solution at time t = t0 and evaluations of the network where needed. Then, the network’s
free parameters, , are updated by matching this estimated solution with the true solution
and differentiating through the entire ODE solver.
• The goal of a neural ODE is to learn the right-hand side
term of
2.3.3.3 an unknown
Neural ODE.
differential equations
The approaches above are simple use cases of differentiable physics in that they
• only
A learn existing
neural parametersNN(u,
network in the traditional
t; ✓) isworkflow.
used However, differentiable
to represent this
physics is more flexible than this; it also allows more complex ML modules (such
term, which is trained by using many examples of the
as neural networks) to be inserted into traditional algorithms, and for these to be
solution of the ODE at two times, u(t = t0) and
learnt by training the entire algorithm end-to-end.
u(t = t1).
An example of this are neural differential equations [Chen et al., 2018a, Rackauckas
et al., 2020], which combine neural networks with traditional numerical solvers in
SciML - PINN and co. 122
order to discover terms in underlying equations. Neural differential equations were
first proposed in the field of ML by Chen et al. [2018a] in the context of continuous-
depth models. More precisely, they used a neural network to learn the right-hand
• More specifically, a standard ODE solver is used to
model the solution of the ODE, u(t = t1), at time
t = t1 given the solution at time t = t0 and evaluations
of the network where needed.
• LSTM
• Encoder-decoder
• Theory:
) ingest huge volumes of data
) “fill in the gaps” using Markov Chains on tokens
• Applications
) NWP + Climatology (ClimaX by Microsoft)
) healthcare and drug-design (Alpha-Fold)
) etc.
References