0% found this document useful (0 votes)

52 views72 pages

Differentiable Programming & Optimization

This document discusses differentiable programming and automatic differentiation. It explains that differentiable programming involves writing software composed of differentiable and parameterized building blocks that are executed via automatic differentiation and optimized to perform tasks. Automatic differentiation allows derivatives of computer code to be computed exactly by executing the code in a forward and reverse manner, without needing analytic derivatives. This is known as backpropagation in deep learning but is a specialized case of the more general technique of automatic differentiation.

Uploaded by

eliasc.rodrigues2191

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views72 pages

Differentiable Programming & Optimization

Uploaded by

eliasc.rodrigues2191

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Diﬀerentiable programming

and design optimization

Atılım Güneş Baydin

[email protected]
First MODE Workshop on Differentiable Programming
Université catholique de Louvain
6 Sep 2021
Outline
● What is differentiable programming?
○ How to compute derivatives
○ Automatic differentiation
○ Tools and communities
● Differentiable programming in practice
○ Current state of differentiable programming
● Design optimization
○ Surrogates
○ Direct differentiation

2
What is diﬀerentiable
programming?
Derivatives in machine learning
Deep learning is behind all recent advances
Computer vision Generative models Autonomous driving

Top-5 error rate for ImageNet (NVIDIA devblog) VQ-VAE (Razavi et al. 2019) TESLA Autopilot

Speech recognition/synthesis Machine translation

4
Word error rates (Huang et al., 2014) Google Neural Machine Translation System (GNMT)
Derivatives in machine learning
Deep learning is behind all recent advances
= nonlinear diﬀerentiable functions (programs)
whose parameters are tuned by gradient-based optimization

(Ruder, 2017) http://ruder.io/optimizing-gradient-descent/

5
Automatic differentiation
In practice the derivatives for gradient-based optimization come from
running differentiable code via automatic differentiation

Many names:
- Automatic differentiation
- Algorithmic differentiation automatic
- Autodiff differentiation
- Algodiff
- Autograd
- AD
Also remember:
- Backpropagation (backward propagation of errors)
- Backprop
6
Differentiable programming
Execute differentiable code via automatic differentiation

● Differentiable programming:
Writing software composed of differentiable and parameterized building
blocks that are executed via automatic differentiation and optimized in
order to perform a specified task
● A generalization of deep learning (neural networks are just a class of more
general differentiable functions)

Andrej Karpathy (2017)

“Software 2.0”
https://karpathy.medium.com/software-2-0-a64
152b37c35

7
How do we compute derivatives of
computer code?
Derivatives
as code
We can compute the derivatives not just of
mathematical functions, but of general-purpose
computer code (with control ﬂow, loops, recursions,
etc.) Newton, c. 1665

Leibniz, c. 1675
Derivatives
as code

Baydin, Pearlmutter, Radul, Siskind. 2018.

“Automatic Diﬀerentiation in Machine
Learning: a Survey.” Journal of Machine
Learning Research (JMLR)

10
Manual
Find the analytical derivative using Calculus, and implement as code

Analytic derivatives are needed for theoretical insight

- analytic solutions, proofs
- mathematical analysis, e.g., stability of ﬁxed points
11
Manual
Find the analytical derivative using Calculus, and implement as code

Analytic derivatives are needed for theoretical insight

- analytic solutions, proofs
- mathematical analysis, e.g., stability of ﬁxed points
Unnecessary when we just need derivative evaluations for optimization 12
Symbolic diﬀerentiation
Symbolic computation with Mathematica, Maple, Maxima,
and deep learning frameworks such as Theano
Problem: expression swell

13
Symbolic diﬀerentiation
Symbolic computation with Mathematica, Maple, Maxima,
and deep learning frameworks such as Theano Graph optimization
Problem: expression swell (e.g., in Theano)

14
Symbolic diﬀerentiation
Problem: only applicable to closed-form mathematical functions

You can ﬁnd the derivative of

but not of

Symbolic graph builders such as Theano and TensorFlow (1.0)

have limited, unintuitive control flow, loops, recursion
Numerical differentiation
Finite difference approximation of ,

Problem: needs to be evaluated times,

once with each standard basis vector

Problem: we must select and

we face approximation errors

16
Numerical diﬀerentiation
Finite diﬀerence approximation of ,

Better approximations exist:

- Higher-order finite differences
e.g., center difference:

- Richardson extrapolation
- Differential quadrature
These increase rapidly in complexity
and never completely eliminate the error 17
Numerical differentiation
Finite difference approximation of ,

Still extremely useful as a quick check of our gradient implementations

Good to learn:
Better approximations exist:
- Higher-order finite differences
e.g., center difference:

- Richardson extrapolation
- Diﬀerential quadrature
These increase rapidly in complexity
and never completely eliminate the error 18
Automatic diﬀerentiation
If we don’t need analytic derivative expressions, we can
evaluate a gradient exactly with only one forward and one reverse execution

Nature 323, 533–536 (9 October 1986)

In machine learning, this is known as

backpropagation or “backprop”

- Automatic diﬀerentiation is more than

backprop
- Or, backprop is a specialized reverse mode
automatic diﬀerentiation

19
Backprop or automatic
diﬀerentiation?
1960s 1970s 1980s

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980

Backpropagation Automatic reverse mode
Kelley, 1960
Bryson, 1961 Dreyfus, 1973 Werbos, 1982
Pontryagin et al., 1961 Control parameters First NN-specific backprop
Dreyfus, 1962
Werbos, 1974 Parker, 1985
Wengert, 1964 Reverse mode
Forward mode LeCun, 1985

Rumelhart, Hinton, Williams, 1986

Revived backprop

Griewank, 1989
Revived reverse mode 21
1960s 1970s 1980s

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980

Backpropagation Automatic reverse mode
Recommended
Kelley, 1960 reading:
Bryson, 1961 Dreyfus, 1973 Werbos, 1982
Pontryagin et al., 1961
Griewank, A., 2012. WhoControl parameters
Invented First NN-specific
the Reverse Mode backprop
of Diﬀerentiation?
Dreyfus, 1962
Documenta Mathematica, Extra Volume ISMP, pp.389-400.
Werbos, 1974 Parker, 1985
Wengert, 1964 Reverse mode
Schmidhuber,
Forward mode
J., 2015. Who Invented Backpropagation?
LeCun, 1985
http://people.idsia.ch/~juergen/who-invented-backpropagation.html
Rumelhart, Hinton, Williams, 1986
Revived backprop

Griewank, 1989
Revived reverse mode 22
Automatic differentiation
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert, 1964)
- Alternatively represented as a computational graph showing dependencies

24
Automatic diﬀerentiation
All numerical algorithms, when executed, evaluate to compositions of
a ﬁnite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

25
Automatic diﬀerentiation
All numerical algorithms, when executed, evaluate to compositions of
a ﬁnite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

f(a, b):
c = a * b a
d = log(c) c
return d * log d

26
Automatic diﬀerentiation
All numerical algorithms, when executed, evaluate to compositions of
a ﬁnite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
return d * log d
3
b
1.791 = f(2, 3)

27
Automatic diﬀerentiation
All numerical algorithms, when executed, evaluate to compositions of
a ﬁnite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 28
Automatic diﬀerentiation
All numerical algorithms, when executed, evaluate to compositions of
a ﬁnite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 29
Automatic diﬀerentiation
Two main ﬂavors

Forward mode Reverse mode (a.k.a. backprop)

Primals
Primals
Derivatives
Derivatives
(Tangents)
(Adjoints)

Nested combinations
(higher-order derivatives, Hessian–vector products, etc.)
- Forward-on-reverse
- Reverse-on-forward
- ... 30
Tools and communities
Two communities getting to know each other
Automatic differentiation Machine learning
Methods Theory of differentiation, adjoints, Deep learning, differentiable programming,
checkpointing, source transformation probability theory, Bayesian methods

Applications Scientific computing, engineering design, Virtually all recent machine learning
computational fluid dynamics, Earth sciences, applications, pattern recognition,
computational finance representation learning

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

Community 1st international conference: 1991 1st autodiﬀ workshop at NeurIPS: 2016

http://www.autodiff.org/?module=Workshops https://autodiff-workshop.github.io/
Two communities getting to know each other
Automatic differentiation Machine learning
Methods Theory of differentiation, adjoints, Deep learning, differentiable programming,
checkpointing, source transformation probability theory, Bayesian methods

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

Community 1st international conference: 1991 1st autodiﬀ workshop at NeurIPS: 2016

http://www.autodiff.org/?module=Workshops https://autodiff-workshop.github.io/
Baydin, Atılım Güneş, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. “Automatic Differentiation in Machine 34
Learning: a Survey.” Journal of Machine Learning Research (JMLR) 18 (153): 1–43. http://jmlr.org/papers/v18/17-468.html
Automatic differentiation
It is a (small) field of its own,
with a dedicated community
http://www.autodiff.org/

Non-machine-learning applications in
industry and academia
● Computational fluid dynamics
● Atmospheric sciences
● Computational finance
● Engineering design optimization

35
Fuel ignition, supersonic flow in a rocket nozzle. Gifs: Jason Koebler, SpaceX
Tools and community
International Conferences on AD European Workshops on AD

● 7th at Oxford, UK, 2016 ● 23rd, Virtual, Worldwide, 2020

● 6th at Fort Collins, US, 2012 ● 22nd at London, UK, 2019
● 5th at Bonn, Germany, 2008 ● 21st at Jena, Germany, 2018
● 4th at Chicago, US, 2004 ● 20th at INRIA Sophia-Antipolis, France, 2017
● 3rd at Nizza, France, 2000 ● 19th at Kaiserslautern, Germany, 2016
● 2nd at Santa Fe, US, 1996 ● 18th at Paderborn, Germany, 2015
● 1st at Breckenridge, US, 1991 ● 17th at Argonne, US, 2015
● 16th at Jena, Germany, 2014
● 15th at INRIA Sophia-Antipolos, France, 2014
● 14th at Oxford, UK, 2013
…
● 1st at Nice, France, 2005
36
http://www.autodiff.org/?module=Workshops
Diﬀerentiable programming in
practice
Diﬀerentiable programming frameworks

Two main possibilities:

- Static computational graphs

Let the user deﬁne the graph as a data structure

- Dynamic computational graphs

Construct the graph automatically
(general-purpose automatic diﬀerentiation)

38
Diﬀerentiable programming frameworks

Two main possibilities:

- Static computational graphs

“Deﬁne-and-run”
Let the user deﬁne the graph as a data structure

- Dynamic computational graphs

Construct the graph automatically “Deﬁne-by-run”
(general-purpose automatic diﬀerentiation)

39
Static graphs (define-and-run)
Prototypical examples: Theano, TensorFlow 1.0
- The user creates the graph using symbolic placeholders, using a
mini-language (domain-specific language, DSL)
- Limited (and unintuitive) control flow and expressivity
- The graph gets “compiled” to take care of expression swell, in-place ops.

Graph compilation in
Theano
40
Static graphs (deﬁne-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

41
Static graphs (deﬁne-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

Pure Python:

42
Static graphs (deﬁne-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

Pure Python:

43
Dynamic graphs (define-by-run)
Prototypical examples: PyTorch
General-purpose autodiff, usually via operator overloading
- The user writes regular programs in host programming language
All language features (including control flow) are supported
- The graph is automatically constructed

44
Dynamic graphs (deﬁne-by-run)
Prototypical examples: PyTorch

Let’s implement

45
Dynamic graphs (deﬁne-by-run)
Prototypical examples: PyTorch

Let’s implement

Pure Python:

46
Dynamic graphs (deﬁne-by-run)
Prototypical examples: PyTorch

Let’s implement

Pure Python:

47
Current state of differentiable
programming
Evolution of frameworks
From: coarse-grained (module level) backprop
Towards: fine-grained, general-purpose automatic differentiation

theano
2008

torch7 torch-autograd
2011 2015
PyTorch
2016

HIPS autograd
2014
TensorFlow TensorFlow TensorFlow 2
2015 eager exec 2019
2017
JAX
2018
Design optimization
Design optimization

Inputs Parameters Outputs

Simulator (model) of the system

Optimal Objective Simulator

parameters

51
Design optimization

Inputs Parameters Outputs

Simulator (model) of the system

Can be eﬃciently found by gradient-based optimization if is available

52
Surrogates for diﬀerentiability

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

● Run simulator many times

● Generate a (large) dataset of input - output pairs capturing simulator’s behavior

53
Surrogates for diﬀerentiability
● Use the dataset to learn a diﬀerentiable approximation of the simulator
(e.g., a deep generative model)

Inputs Parameters Outputs

Diﬀerentiable surrogate with 54

Example: a simple surrogate
Tuning synthetic image generation for computer vision

Number or synthetic images required during

training with photorealistic Arnold renderer

Behl, Baydin, Gal, Torr, Vineet “AutoSimulate: (Quickly) Learning Synthetic Data Generation” ECCV 2020 55
Example: exoplanet radiative transfer
● Posterior probability distributions of exoplanet atmospheric parameters
conditioned on observed spectra, using radiative transfer simulators
● Surrogates allow up to 180x faster inference

Himes, Harrington, Cobb, Baydin, Soboczenski, O'Beirne, Zorzan, Wright, Scheﬀer, Domagal-Goldman, Arney. 2020, October. Accelerating Bayesian Inference via Neural Networks:
Application to Exoplanet Retrievals. In AAS/Division for Planetary Sciences Meeting Abstracts (Vol. 52, No. 6, pp. 207-07). 56
Example: local generative surrogates
● Deep generative surrogates (GAN) successively trained in local neighborhoods
● Optimize SHiP muon shield (GEANT4, FairRoot), minimize number of recorded
muons by varying magnet geometry

57
Shirobokov, Belavin, Kagan, Ustyuzhanin, Baydin “Black-Box Optimization with Local Generative Surrogates” NeurIPS 2020
Example: universal probabilistic surrogates
● Replace a (slow) universal probabilistic program with a (fast) LSTM-based
surrogate that works in the same address space
● Enables faster Bayesian inference
● Diﬀerentiable surrogate model can enable gradient-based inference engines

Surrogate simulation of composite material

heating cycles (25x faster inference)

Munk, Scibior, Baydin, Stewart, Fernlund, Poursartip, Wood “Deep Probabilistic Surrogate Networks for Universal Simulator
58
Approximation” ProbProg 2020
Diﬀerentiability without surrogates

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

59
Diﬀerentiability without surrogates

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

Automatic diﬀerentiation
(e.g., source-to-source transformation)

Inputs Parameters Outputs

Diﬀerentiable simulator with 60

Differentiability without surrogates
● Use automatic differentiation tools to make the simulator directly differentiable
● Used in design optimization by the AD community for many decades

Forth, Shaun A.; Evans, Trevor P. Aerofoil Optimisation via AD of a Multigrid Cell-Vertex Euler Flow Solver. 2002

Daniele Casanova, Robin S. Sharp, Mark Final, Bruce Christianson, Pat Symonds. “Application of Automatic Differentiation to Race Car Performance 61
Optimisation” in Automatic Differentiation of Algorithms: From Simulation to Optimization, Springer, 2002
End-to-end differentiable pipelines
● Complex experimental setups can be composed of a pipeline of a series of
distinct simulators (e.g., SHERPA -> GEANT)
● One might need to differentiate through the whole end-to-end pipeline, which
can be achieved by compositionality and the chain rule

DARPA Data Driven Discovery of

Models (D3M)
https://datadrivendiscovery.org/

Milutinovic, Baydin, Zinkov, Harvey, Song, Wood, Shen. 2017. “End-to-End Training of Differentiable Pipelines Across Machine Learning
Frameworks.” NeurIPS 2017 Autodiff Workshop
62
Differentiable programming in particle physics
● Differentiable analysis
Unify analysis pipeline by simultaneously
optimizing the free parameters of an analysis
with respect to the desired physics objective

● Diﬀerentiable simulation
Enable eﬃcient simulation-based inference,
reducing the number of events needed by
orders of magnitude

Baydin, Cranmer, Feickert, Gray, Heinrich, Held, Melo, Neubauer, Pearkes, Simpson, Smith, Stark, Thais, Vassilev, Watts. 2020.
“Diﬀerentiable Programming in High-Energy Physics.” In Snowmass 2021 Letters of Interest (LOI), Division of Particles and Fields (DPF),
American Physical Society. https://snowmass21.org/loi.
63
Optimization of experimental design
● Design of instruments is a complex
task, involving a combination of
performance and cost
considerations
● We need the next generation of
tools to optimize modern and
future particle detectors and
experiments
● MODE (Machine-learning Optimized
Design of Experiments)
collaboration!
https://mode-collaboration.github.io/

Baydin, Cranmer, de Castro Manzano, Delaere, Derkach, Donini, Dorigo, Giammanco, Kieseler, Layer, Louppe, Ratnikov, Strong, Tosi,
Ustyuzhanin, Vischia, Yarar. 2021. “Toward Machine Learning Optimization of Experimental Design.” Nuclear Physics News 31 (1)
64
Summary
Summary
● What is differentiable programming?
○ How to compute derivatives
○ Automatic differentiation
○ Tools and communities
● Differentiable programming in practice
○ Current state of differentiable programming
● Design optimization
○ Surrogates
○ Direct differentiation

66
Thank you for listening
Questions?

Selected references
[1] T. A. Le, A. G. Baydin, and F. Wood. Inference compilation and universal probabilistic programming. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
[2] A. Munk, A. Ścibior, A. G. Baydin, A. Stewart, G. Fernlund, A. Poursartip, and F. Wood. Deep probabilistic surrogate networks for universal simulator approximation. In PROBPROG, 2020.
[3] A. G. Baydin, L. Heinrich, W. Bhimji, L. Shao, S. Naderiparizi, A. Munk, J. Liu, B. Gram-Hansen, G. Louppe, L. Meadows, P. Torr, V. Lee, Prabhat, K. Cranmer, and F. Wood. Efficient probabilistic inference in the quest for physics beyond the
standard model. In NeurIPS, 2019.
[4] A. G. Baydin, L. Shao, W. Bhimji, L. Heinrich, L. F. Meadows, J. Liu, A. Munk, S. Naderiparizi, B. Gram-Hansen, G. Louppe, M. Ma, X. Zhao, P. Torr, V. Lee, K. Cranmer, Prabhat, and F. Wood. Etalumis: Bringing probabilistic programming to scientific
simulators at scale. In SC19, 2019.
[5] K. Cranmer, J. Brehmer, and G. Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020.
[6] B. Gram-Hansen, C. Schroeder, P. H. Torr, Y. W. Teh, T. Rainforth, and A. G. Baydin. Hijacking malaria simulators with probabilistic programming. In ICML workshop on AI for Social Good, 2019.
[7] B. Gram-Hansen, C. S. de Witt, R. Zinkov, S. Naderiparizi, A. Scibior, A. Munk, F. Wood, M. Ghadiri, P. Torr, Y. W. Teh, A. G. Baydin, and T. Rainforth. Efficient bayesian inference for nested simulators. In AABI, 2019.
[8] B. Poduval, A. G. Baydin, and N. Schwadron. Studying solar energetic particles and their seed population using surrogate models. In MML for Space Sciences workshop, COSPAR, 2021.
[9] G. Acciarini, F. Pinto, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Spacecraft collision risk assessment with probabilistic programming. In ML4PS (NeurIPS 2020), 2020.
[10] F. Pinto, G. Acciarini, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Towards automated satellite conjunction management with bayesian deep learning. In AI for Earth Sciences Workshop
(NeurIPS), 2020.
[11] G. Acciarini, F. Pinto, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Kessler: a machine learning library for space collision avoidance. In 8th European Conference on Space Debris, 2021.
[12] S. Shirobokov, V. Belavin, M. Kagan, A. Ustyuzhanin, and A. G. Baydin. Black-box optimization with local generative surrogates. In NeurIPS, 2020.
[13] H. S. Behl, A. G. Baydin, R. Gal, P. H. S. Torr, and V. Vineet. Autosimulate: (quickly) learning synthetic data generation. In 16th European Conference on Computer Vision (ECCV), 2020.
[14] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research (JMLR), 18(153):1–43, 2018.
[15] A. G. Baydin, B. A. Pearlmutter, and J. M. Siskind. DiffSharp: An AD library for .net languages. In 7th International Conference on Algorithmic Differentiation, 2016.
[16] A. G. Baydin, R. Cornish, D. M. Rubio, M. Schmidt, and F. Wood. Online learning rate adaptation with hypergradient descent. In ICLR, 2018.
[17] H. Behl, A. G. Baydin, and P. H. Torr. Alpha maml: Adaptive model-agnostic meta-learning. In AutoML (ICML), 2019.
[18] A. G. Baydin, K. Cranmer, M. Feickert, L. Gray, L. Heinrich, A. Held, A. Melo, M. Neubauer, J. Pearkes, N. Simpson, N. Smith, G. Stark, S. Thais, V. Vassilev, and G. Watts. Differentiable programming in high-energy physics. In Snowmass
2021 Letters of Interest (LOI), Division of Particles and Fields (DPF), American Physical Society, 2020.
[19] A. G. Baydin, K. Cranmer, P. de Castro Manzano, C. Delaere, D. Derkach, J. Donini, T. Dorigo, A. Giammanco, J. Kieseler, L. Layer, G. Louppe, F. Ratnikov, G. C. Strong, M. Tosi, A. Ustyuzhanin, P. Vischia, and H. Yarar. Toward machine learning
optimization of experimental design. Nuclear Physics News International (Submitted), 2020.
[20] L. F. Guedes dos Santos, S. Bose, V. Salvatelli, B. Neuberg, M. Cheung, M. Janvier, M. Jin, Y. Gal, P. Boerner, and A. G. Baydin. Multi-channel auto-calibration for the atmospheric imaging assembly using machine learning. Astronomy &
Astrophysics (in press), 2021.
[21] A. D. Cobb, M. D. Himes, F. Soboczenski, S. Zorzan, M. D. O’Beirne, A. G. Baydin, Y. Gal, S. D. Domagal-Goldman, G. N. Arney, and D. Angerhausen. An ensemble of bayesian neural networks for exoplanetary atmospheric retrieval. The
Astronomical Journal, 158(1), 2019.
[22] C. Schroeder de Witt, B. Gram-Hansen, N. Nardelli, A. Gambardella, R. Zinkov, P. Dokania, N. Siddharth, A. B. Espinosa-Gonzalez, A. Darzi, P. Torr, and A. G. Baydin. Simulation-based inference for global health decisions. In ICML Workshop
on Machine Learning for Global Health, Thirty-seventh International Conference on Machine Learning (ICML 2020), 2020. 67
Supplementary slides
Forward vs reverse

69
Derivatives in machine learning
“Backprop” and gradient descent are at the core of all recent advances

Probabilistic programming and modeling

Pyro ProbTorch
(2017) (2017) PyProb (2019)

Edward TensorFlow Probability

(2016) (2018)

- Variational inference
- “Neural” density estimation
- Transformed distributions via bijectors
- Normalizing flows (Rezende & Mohamed, 2015)
- Masked autoregressive flows (Papamakarios et al., 2017) 70
AD is at the core of machine learning
A new mindset and workflow, enabling differentiable algorithmic elements
● Neural Turing Machine, Differentiable Neural Computer (Graves et al. 2014, 2016)
○ Can infer algorithms: copy, sort, recall
● Stack-augmented RNN (Joulin & Mikolov, 2015)
● End-to-end memory network (Sukhbaatar et al., 2015)
● Stack, queue, deque (Grefenstette et al., 2015)
● Discrete interfaces (Zaremba & Sutskever, 2015)

DNC on binary number recall

(Wikimedia Commons: Kjerish)
71

Automatic Differentiation in ML
No ratings yet
Automatic Differentiation in ML
114 pages
Automatic Differentiation of Algorithms For Machine Learning
No ratings yet
Automatic Differentiation of Algorithms For Machine Learning
7 pages
Numerical Analysis for Economists
No ratings yet
Numerical Analysis for Economists
57 pages
Automatic Differentiation in ML: Survey
No ratings yet
Automatic Differentiation in ML: Survey
53 pages
Step-by-Step Automatic Differentiation Guide
No ratings yet
Step-by-Step Automatic Differentiation Guide
17 pages
LOD Differentiable
No ratings yet
LOD Differentiable
55 pages
Neural Networks & Auto Differentiation
No ratings yet
Neural Networks & Auto Differentiation
13 pages
Back Prop
No ratings yet
Back Prop
8 pages
Automatic Differentiation Review Article7
No ratings yet
Automatic Differentiation Review Article7
15 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
AD Review Paper
No ratings yet
AD Review Paper
32 pages
Tut 01
No ratings yet
Tut 01
39 pages
Dama50 Unit4n
No ratings yet
Dama50 Unit4n
32 pages
C++ Optimization for Data Models
No ratings yet
C++ Optimization for Data Models
283 pages
Mit18 S096iap23 Lec1
No ratings yet
Mit18 S096iap23 Lec1
16 pages
Autodiff
No ratings yet
Autodiff
12 pages
Mit18 S096iap23 Lec01
No ratings yet
Mit18 S096iap23 Lec01
6 pages
Barak Pearl Mutter Auto Diff
No ratings yet
Barak Pearl Mutter Auto Diff
103 pages
1502 05767v2 PDF
No ratings yet
1502 05767v2 PDF
29 pages
Efficient Neural Network Differential Operators
No ratings yet
Efficient Neural Network Differential Operators
11 pages
Understanding Automatic Differentiation
No ratings yet
Understanding Automatic Differentiation
103 pages
Automatic Differentiation, C++ Templates AndPhotogrammetry
No ratings yet
Automatic Differentiation, C++ Templates AndPhotogrammetry
14 pages
Introduction to Automatic Differentiation
No ratings yet
Introduction to Automatic Differentiation
22 pages
Automatic Differentiation: H Avard Berland
No ratings yet
Automatic Differentiation: H Avard Berland
22 pages
Autodiff
No ratings yet
Autodiff
15 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Lecture 3-4
No ratings yet
Lecture 3-4
50 pages
3 Gradient
No ratings yet
3 Gradient
31 pages
NLP Backpropagation Guide
No ratings yet
NLP Backpropagation Guide
8 pages
Matrix Calculus for Deep Learning
No ratings yet
Matrix Calculus for Deep Learning
33 pages
Arjen Markus Cap3
No ratings yet
Arjen Markus Cap3
14 pages
Mathematical View of Automatic Differentiation
No ratings yet
Mathematical View of Automatic Differentiation
78 pages
2 - Learning With Gradient
No ratings yet
2 - Learning With Gradient
23 pages
Appendix D Calculus
No ratings yet
Appendix D Calculus
31 pages
可微分编程deepmind
No ratings yet
可微分编程deepmind
451 pages
Automatic Differentiation Guide
No ratings yet
Automatic Differentiation Guide
14 pages
Deep Learning Numerical Challenges
No ratings yet
Deep Learning Numerical Challenges
46 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Lec06 Derivatives
No ratings yet
Lec06 Derivatives
22 pages
The Elements of Differentiable Programming
No ratings yet
The Elements of Differentiable Programming
300 pages
The Elements of Differentiable Programming
100% (1)
The Elements of Differentiable Programming
485 pages
Overview of Automatic Differentiation
No ratings yet
Overview of Automatic Differentiation
39 pages
Matrix Calculus (For Machine Learning and Beyond) : Lecturers: Alan Edelman and Steven G. Johnson
No ratings yet
Matrix Calculus (For Machine Learning and Beyond) : Lecturers: Alan Edelman and Steven G. Johnson
101 pages
Opt Sem3
No ratings yet
Opt Sem3
50 pages
Deep Learning's Evolution and Impact
No ratings yet
Deep Learning's Evolution and Impact
6 pages
Forward Gradients for ML Optimization
No ratings yet
Forward Gradients for ML Optimization
10 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Basics of Numerical Differentiation
No ratings yet
Basics of Numerical Differentiation
24 pages
Giftthaler Et Al - 2017 - Automatic Differentiation of Rigid Body Dynamics For Optimal Control and
No ratings yet
Giftthaler Et Al - 2017 - Automatic Differentiation of Rigid Body Dynamics For Optimal Control and
10 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
Neural Network Notation Overview
No ratings yet
Neural Network Notation Overview
13 pages
AutomaticDifferentiation AppliedMaths
No ratings yet
AutomaticDifferentiation AppliedMaths
228 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
The Simple Essence of Automatic Differentiation: Conal Elliott
No ratings yet
The Simple Essence of Automatic Differentiation: Conal Elliott
29 pages
Research Paper
No ratings yet
Research Paper
10 pages
Structure of Mobile Computing Application
No ratings yet
Structure of Mobile Computing Application
2 pages
Wa0006.
No ratings yet
Wa0006.
33 pages
Android Development: Key Concepts Explained
No ratings yet
Android Development: Key Concepts Explained
2 pages
Understanding dd Block Size Options
No ratings yet
Understanding dd Block Size Options
2 pages
c167cr Can AP
No ratings yet
c167cr Can AP
24 pages
Experiment No.-1A Aim-To Implement Algorithm Using Array As A Data Structure and Analyse Its Time Complexity Selection Sort Algorithm
No ratings yet
Experiment No.-1A Aim-To Implement Algorithm Using Array As A Data Structure and Analyse Its Time Complexity Selection Sort Algorithm
36 pages
A Project Presentation On
No ratings yet
A Project Presentation On
30 pages
Windows Server Hardware Requirements Guide
No ratings yet
Windows Server Hardware Requirements Guide
4 pages
AwanHeiTech Sales Catalogue
No ratings yet
AwanHeiTech Sales Catalogue
17 pages
Se Home Assgnment
No ratings yet
Se Home Assgnment
2 pages
Ajay Pawar: Software Engineer & Python Developer
No ratings yet
Ajay Pawar: Software Engineer & Python Developer
2 pages
4combine Multiple PDF Files For Free Using iLovePDF Merge PDF Tool
No ratings yet
4combine Multiple PDF Files For Free Using iLovePDF Merge PDF Tool
5 pages
Networking Fundamentals
No ratings yet
Networking Fundamentals
3 pages
Gate 2024 Prep
No ratings yet
Gate 2024 Prep
7 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
FactoryTalk VantagePoint EMI Server - 8.40.00 (Released 11 - 2022)
No ratings yet
FactoryTalk VantagePoint EMI Server - 8.40.00 (Released 11 - 2022)
15 pages
Third Generation: of Computer 1965-1971
No ratings yet
Third Generation: of Computer 1965-1971
9 pages
Oracle Cloud Services Pricing
No ratings yet
Oracle Cloud Services Pricing
12 pages
System Error Logs Analysis
No ratings yet
System Error Logs Analysis
46 pages
Chapter-3 2
No ratings yet
Chapter-3 2
79 pages
Akamai Web App & API Protection Course
No ratings yet
Akamai Web App & API Protection Course
6 pages
Cloud Security: Timothy Brown
No ratings yet
Cloud Security: Timothy Brown
40 pages
Mobile-App Srs
No ratings yet
Mobile-App Srs
27 pages
40 - 1544575176 - Firmware Install Instructions For TL, GD, DM, and AD D-Audio (Ver 2.05.10)
No ratings yet
40 - 1544575176 - Firmware Install Instructions For TL, GD, DM, and AD D-Audio (Ver 2.05.10)
7 pages
HMI Development: Slide 1
No ratings yet
HMI Development: Slide 1
11 pages
Data Visualization with R Programming
No ratings yet
Data Visualization with R Programming
3 pages
Majed Al-Rashid: Network Engineer
No ratings yet
Majed Al-Rashid: Network Engineer
2 pages
Multiline Message Display in C++
No ratings yet
Multiline Message Display in C++
8 pages
(Mar, 2024) Dumpsactual 1Z0-902 PDF Dumps and 1Z0-902 Exam Questions (Q34-Q49)
No ratings yet
(Mar, 2024) Dumpsactual 1Z0-902 PDF Dumps and 1Z0-902 Exam Questions (Q34-Q49)
7 pages
Summary STM32F4 - English
No ratings yet
Summary STM32F4 - English
8 pages

Differentiable Programming & Optimization

Uploaded by

Differentiable Programming & Optimization

Uploaded by

Diﬀerentiable programming

and design optimization

Atılım Güneş Baydin

Speech recognition/synthesis Machine translation

(Ruder, 2017) http://ruder.io/optimizing-gradient-descent/

Andrej Karpathy (2017)

Baydin, Pearlmutter, Radul, Siskind. 2018.

Analytic derivatives are needed for theoretical insight

Analytic derivatives are needed for theoretical insight

You can ﬁnd the derivative of

Symbolic graph builders such as Theano and TensorFlow (1.0)

Problem: needs to be evaluated times,

Problem: we must select and

Better approximations exist:

Still extremely useful as a quick check of our gradient implementations

Nature 323, 533–536 (9 October 1986)

In machine learning, this is known as

- Automatic diﬀerentiation is more than

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980

Rumelhart, Hinton, Williams, 1986

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980

Forward mode Reverse mode (a.k.a. backprop)

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

● 7th at Oxford, UK, 2016 ● 23rd, Virtual, Worldwide, 2020

Two main possibilities:

- Static computational graphs

- Dynamic computational graphs

Two main possibilities:

- Static computational graphs

- Dynamic computational graphs

Inputs Parameters Outputs

Simulator (model) of the system

Optimal Objective Simulator

Inputs Parameters Outputs

Simulator (model) of the system

Can be eﬃciently found by gradient-based optimization if is available

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

● Run simulator many times

Inputs Parameters Outputs

Diﬀerentiable surrogate with 54

Number or synthetic images required during

Surrogate simulation of composite material

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

Inputs Parameters Outputs

Non-diﬀerentiable simulator (model)

Inputs Parameters Outputs

Diﬀerentiable simulator with 60

DARPA Data Driven Discovery of

Probabilistic programming and modeling

Edward TensorFlow Probability

DNC on binary number recall

You might also like