Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
Differential Equations
(explicit form)
(solution is a trajectory)
DenseNet Runge-Kutta
Deep Learning as Discretized Differential Equations
Many deep learning networks can be interpreted as ODE solvers.
Network Fixed-step Numerical Scheme
DenseNet Runge-Kutta
But:
(1) What is the underlying dynamics?
(2) Adaptive-step size solvers provide better error handling.
“Neural” Ordinary Differential Equations
Instead of y = F(x),
“Neural” Ordinary Differential Equations
Parameterize
“Neural” Ordinary Differential Equations
Parameterize
Forward:
Backward:
Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:
Forward: Forward:
Backward:
Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:
Forward: Forward:
Backward: Backward:
Adjoint State Adjoint DiffEq
Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:
Forward: Forward:
Backward: Backward:
Adjoint State Adjoint DiffEq
Params: Params:
A Differentiable Primitive for AutoDiff
Forward:
Backward:
A Differentiable Primitive for AutoDiff
Forward:
Backward:
A Differentiable Primitive for AutoDiff
Don’t need to store layer activations for reverse pass - just follow dynamics in
reverse!
Reversible networks (Gomez et al. 2018) also only require O(1)-memory, but
require very specific neural network architectures with partitioned dimensions.
Reverse versus Forward Cost
- Empirically, reverse
pass roughly half as
expensive as forward
pass.
-
- Adapts to instance
difficulty.
-
- Dynamics become
more demanding to
compute during
training.
- Adapts computation
time according to
complexity of diffeq.
- Whereas ODEs
are guaranteed
to be smooth.
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):
- In other words,
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):
- In other words,
With an
invertible F:
Continuous Normalizing Flows
1D: 2D: Data Discrete-NF CNF
Is the ODE being correctly solved?
Stochastic Unbiased Log Density
Stochastic Unbiased Log Density
github.com/rtqichen/torchdiffeq
Thanks!
Extra Slides
Latent Space Visualizations
• Released an implementation of reverse-mode
autodiff through black-box ODE solvers.
- More fine-grained
control than
low-precision floats.