0% found this document useful (0 votes)
8 views21 pages

Neural ODE

Uploaded by

Private
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Neural ODE

Uploaded by

Private
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Deep

Learning
inspired by
Diff. Eqs.

Noseong Park
Assistant Professor
[email protected]
Contents

• Ordinary Differential Equations


• A simple example of ODEs
• ODE Solvers
• Euler method vs. Residual connection
• Runge—Kutta method and Dormand—Prince method
• Neural Ordinary Differential Equations (NODEs)
• Adjoint sensitivity method and training mechanism
• Change of variable theorem and probability density estimation
• Physics-informed Neural Networks (PINNs)
• Other Topics related to Neural Ordinary/Controlled Differential Equations
and Their Applications (by other students)

2
Ordinary Differential Equations

3
An Example of ODEs

• T1 has 100 liters of water, and T2 has 100 liters of fertilizer.


• z(t)=(z1(t), z2(t)) means the amount of fertilizer at time t.
𝑧𝑧1 ’ = inflow per minute – outflow per minute = − 0.03 𝑧𝑧1 + 0.03 𝑧𝑧2 100L 100L
3L/min
𝑧𝑧2 ’ = inflow per minute – outflow per minute = 0.03 𝑧𝑧1 − 0.03 𝑧𝑧2

−0.03 0.03
∴ z’ = Az or z’ − Az = 0, where A = 3L/min
0.03 − 0.03

• When we have an initial value of z(0)=(0, 100), what is z(2)? This kind of
problem is called initial value problem (IVP) or forward problem.
• Given data, what is A? This kind of problem is called backward problem.

4
ODE Solvers

5
Euler method vs. Residual connection

• Among various ODE


solvers, the (explicit) Euler
method is the simplest
method.

• The (explicit) Euler


method and the residual
connection look similar to
each other.

6
Runge—Kutta (RK) method

<https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods>
7
Dormand—Prince (DOPRI) method

• After comparing the RK4 and RK5 results,


• Use a large step-size h if the difference is small.
• Use a small step-size h if the difference is large.

• In other words, the (adaptive) size-size is inversely proportional to the


estimated difference.

• We omit its detailed mathematical definition.

8
Neural Ordinary Differential
Equations (NODEs)

9
An Example of of NODEs

• A typical construction of NODEs is as follows: FE  NODE  Output.


• The NODE layer is analogous to (continuous) residual layers.
• We can use the standard backpropagation algorithm to train.

10
Adjoint Sensitivity Method

• We can use the standard backpropagation to train NODEs.


• However, the depth by DOPRI frequently becomes large.

• We can calculate the gradients with a reverse-mode integral.


• No need to maintain the computation graph of NODEs
• Therefore, a space complexity of O(1) can be obtained.

• Which one is better?


• In our experience, case by case.

Information before t=1 is not needed when t=1.


11
Adjoint Sensitivity Method

• How to convert the left-hand side to the right-hand side:


• Instead of the step-size h, use an integral with limh  0.

Residual network. Adjoint method. Define:

Forward: Forward:

Backward: Backward:
Adjoint State
Adjoint DiffEq

Gradients: Gradients:

12
Change of Variable Theorem

• Let (x, y) be a 2D coordinate.


• T is an invertible transformation.
• (u, v) = T(x, y) and (x, y) = T-1(u, v)
<Jacobian matrix>

Hard to integrate
<Absolute of determinant of Jacobian>
Transform

Easy to integrate

13
Probability Density Estimation

• After taking the log on both sides,

dy

dx

<https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant> 14
Density Estimation in NODEs

• We can estimate p(z(2)) if knowing p(z(0)) as follows:


• log p(z(2)) = log p(z(1)) + log |det of Jacobian at z(1)|,
• log p(z(1)) = log p(z(0)) + log |det of Jacobian at z(0)|.
• Suppose we design a generator using NODEs.
• z(0) typically follows a unit Gaussian. So, we know log p(z(0)).
• Then, we can know log p(z(2)).
• Suppose z(2) is a specific dog image. We know the probability that this
specific image generated by the generator.

N(0, 1)

15
Physics-informed Neural Networks
(PINNs)

16
Partial Differential Equations

• In ODEs, we have only one variable, t.


• z(t) describes the entire state.

• In PDEs, we have more variables. 1


2
2
3
• z(x, t) describes the state at location x and 9 9

time t. ⋮ ⋮
4 6

• Many physical processes are described by Time


PDEs.
• Discretizing the spatial domain and as a result,
solving z(t) is a popular technique to solve
PDEs.

17
An Intuitive Example of PINNs

• Suppose a regression task to predict the position of a falling ball given time t.

(x, t) pairs
Training t 𝑢𝑢� Position of ball, x
Data

• There is one known governing equation that 𝑢𝑢� should follow:


Query about
𝑢𝑢𝑡𝑡𝑡𝑡 − 𝑔𝑔 = 0, 𝑔𝑔 = 9.80665𝑚𝑚/𝑠𝑠𝑠. unknown time t?

• We can use the following loss with no training data:


(𝑢𝑢� 0; 𝜃𝜃 − 0) + (tf.grad(tf.grad(𝑢𝑢� 𝑡𝑡; 𝜃𝜃 , t), t) − 9.80665)

• Which one do you think better, regression or PINN?


NODEs vs. PINNs

• In NODEs, • In PINNs,
• z(t) describes the entire state. • z(x, t) describes a state at (x, t).
• We learn an implicit • We use a given governing
governing equation from data. equation to define a loss
• We use a task-dependent loss function (aided by the
to train NODEs, such as cross- automatic differentiation).
entropy, MSE, etc. • We use PINNs for solving PDEs
• We use NODEs for computer in various scientific domains.
vision, time-series processing,
NLP, etc.

19
Conclusions

• Deep learning inspired by diff. eqs. have proliferated for the past
couple of years.

• It shows strong points in many machine learning and scientific tasks.

• Our lab, BigDyL in Yonsei, is now studying about the following


subjects for many machine learning tasks:
• Neural Ordinary/Controlled/Rough Differential Equations
• Physics-informed Neural Networks

20
References

• Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud,


“Neural Ordinary Differential Equations,” In NeurIPS, 2018.
• Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David
Duvenaud, “FFJORD: Free-form Continuous Dynamics for Scalable
Reversible Generative Models,” In ICLR, 2019
• M. Raissi, P. Perdikaris, G.E. Karniadakisa, “Physics-informed neural
networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations,” Journal of
Computational Physics, Volume 378, 1 February 2019, Pages 686-707
• Jungeun Kim, Kookjin Lee, Dongeun Lee, Sheo Yon Jin, Noseong Park, “DPM:
A Novel Training Method for Physics-Informed Neural Networks in
Extrapolation,” In AAAI, 2021

21

You might also like