0% found this document useful (0 votes)
15 views

Differentiable Programming and Design Optimization

This document discusses differentiable programming and automatic differentiation. It explains that differentiable programming involves writing software composed of differentiable and parameterized building blocks that are executed via automatic differentiation and optimized to perform tasks. Automatic differentiation allows derivatives of computer code to be computed exactly by executing the code in a forward and reverse manner, without needing analytic derivatives. This is known as backpropagation in deep learning but is a specialized case of the more general technique of automatic differentiation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Differentiable Programming and Design Optimization

This document discusses differentiable programming and automatic differentiation. It explains that differentiable programming involves writing software composed of differentiable and parameterized building blocks that are executed via automatic differentiation and optimized to perform tasks. Automatic differentiation allows derivatives of computer code to be computed exactly by executing the code in a forward and reverse manner, without needing analytic derivatives. This is known as backpropagation in deep learning but is a specialized case of the more general technique of automatic differentiation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Differentiable programming

and design optimization

Atılım Güneş Baydin


[email protected]
First MODE Workshop on Differentiable Programming
Université catholique de Louvain
6 Sep 2021
Outline
● What is differentiable programming?
○ How to compute derivatives
○ Automatic differentiation
○ Tools and communities
● Differentiable programming in practice
○ Current state of differentiable programming
● Design optimization
○ Surrogates
○ Direct differentiation

2
What is differentiable
programming?
Derivatives in machine learning
Deep learning is behind all recent advances
Computer vision Generative models Autonomous driving

Top-5 error rate for ImageNet (NVIDIA devblog) VQ-VAE (Razavi et al. 2019) TESLA Autopilot

Speech recognition/synthesis Machine translation

4
Word error rates (Huang et al., 2014) Google Neural Machine Translation System (GNMT)
Derivatives in machine learning
Deep learning is behind all recent advances
= nonlinear differentiable functions (programs)
whose parameters are tuned by gradient-based optimization

(Ruder, 2017) http://ruder.io/optimizing-gradient-descent/

5
Automatic differentiation
In practice the derivatives for gradient-based optimization come from
running differentiable code via automatic differentiation

Many names:
- Automatic differentiation
- Algorithmic differentiation automatic
- Autodiff differentiation
- Algodiff
- Autograd
- AD
Also remember:
- Backpropagation (backward propagation of errors)
- Backprop
6
Differentiable programming
Execute differentiable code via automatic differentiation

● Differentiable programming:
Writing software composed of differentiable and parameterized building
blocks that are executed via automatic differentiation and optimized in
order to perform a specified task
● A generalization of deep learning (neural networks are just a class of more
general differentiable functions)

Andrej Karpathy (2017)


“Software 2.0”
https://karpathy.medium.com/software-2-0-a64
152b37c35

7
How do we compute derivatives of
computer code?
Derivatives
as code
We can compute the derivatives not just of
mathematical functions, but of general-purpose
computer code (with control flow, loops, recursions,
etc.) Newton, c. 1665

Leibniz, c. 1675
Derivatives
as code

Baydin, Pearlmutter, Radul, Siskind. 2018.


“Automatic Differentiation in Machine
Learning: a Survey.” Journal of Machine
Learning Research (JMLR)

10
Manual
Find the analytical derivative using Calculus, and implement as code

Analytic derivatives are needed for theoretical insight


- analytic solutions, proofs
- mathematical analysis, e.g., stability of fixed points
11
Manual
Find the analytical derivative using Calculus, and implement as code

Analytic derivatives are needed for theoretical insight


- analytic solutions, proofs
- mathematical analysis, e.g., stability of fixed points
Unnecessary when we just need derivative evaluations for optimization 12
Symbolic differentiation
Symbolic computation with Mathematica, Maple, Maxima,
and deep learning frameworks such as Theano
Problem: expression swell

13
Symbolic differentiation
Symbolic computation with Mathematica, Maple, Maxima,
and deep learning frameworks such as Theano Graph optimization
Problem: expression swell (e.g., in Theano)

14
Symbolic differentiation
Problem: only applicable to closed-form mathematical functions

You can find the derivative of

but not of

Symbolic graph builders such as Theano and TensorFlow (1.0)


have limited, unintuitive control flow, loops, recursion
Numerical differentiation
Finite difference approximation of ,

Problem: needs to be evaluated times,


once with each standard basis vector

Problem: we must select and


we face approximation errors

16
Numerical differentiation
Finite difference approximation of ,

Better approximations exist:


- Higher-order finite differences
e.g., center difference:

- Richardson extrapolation
- Differential quadrature
These increase rapidly in complexity
and never completely eliminate the error 17
Numerical differentiation
Finite difference approximation of ,

Still extremely useful as a quick check of our gradient implementations


Good to learn:
Better approximations exist:
- Higher-order finite differences
e.g., center difference:

- Richardson extrapolation
- Differential quadrature
These increase rapidly in complexity
and never completely eliminate the error 18
Automatic differentiation
If we don’t need analytic derivative expressions, we can
evaluate a gradient exactly with only one forward and one reverse execution

Nature 323, 533–536 (9 October 1986)

In machine learning, this is known as


backpropagation or “backprop”

- Automatic differentiation is more than


backprop
- Or, backprop is a specialized reverse mode
automatic differentiation

19
Backprop or automatic
differentiation?
1960s 1970s 1980s

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980


Backpropagation Automatic reverse mode
Kelley, 1960
Bryson, 1961 Dreyfus, 1973 Werbos, 1982
Pontryagin et al., 1961 Control parameters First NN-specific backprop
Dreyfus, 1962
Werbos, 1974 Parker, 1985
Wengert, 1964 Reverse mode
Forward mode LeCun, 1985

Rumelhart, Hinton, Williams, 1986


Revived backprop

Griewank, 1989
Revived reverse mode 21
1960s 1970s 1980s

Precursors Linnainmaa, 1970, 1976 Speelpenning, 1980


Backpropagation Automatic reverse mode
Recommended
Kelley, 1960 reading:
Bryson, 1961 Dreyfus, 1973 Werbos, 1982
Pontryagin et al., 1961
Griewank, A., 2012. WhoControl parameters
Invented First NN-specific
the Reverse Mode backprop
of Differentiation?
Dreyfus, 1962
Documenta Mathematica, Extra Volume ISMP, pp.389-400.
Werbos, 1974 Parker, 1985
Wengert, 1964 Reverse mode
Schmidhuber,
Forward mode
J., 2015. Who Invented Backpropagation?
LeCun, 1985
http://people.idsia.ch/~juergen/who-invented-backpropagation.html
Rumelhart, Hinton, Williams, 1986
Revived backprop

Griewank, 1989
Revived reverse mode 22
Automatic differentiation
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert, 1964)
- Alternatively represented as a computational graph showing dependencies

24
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

25
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

f(a, b):
c = a * b a
d = log(c) c
return d * log d

26
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
return d * log d
3
b
1.791 = f(2, 3)

27
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 28
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies

primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 29
Automatic differentiation
Two main flavors

Forward mode Reverse mode (a.k.a. backprop)

Primals
Primals
Derivatives
Derivatives
(Tangents)
(Adjoints)

Nested combinations
(higher-order derivatives, Hessian–vector products, etc.)
- Forward-on-reverse
- Reverse-on-forward
- ... 30
Tools and communities
Two communities getting to know each other
Automatic differentiation Machine learning
Methods Theory of differentiation, adjoints, Deep learning, differentiable programming,
checkpointing, source transformation probability theory, Bayesian methods

Applications Scientific computing, engineering design, Virtually all recent machine learning
computational fluid dynamics, Earth sciences, applications, pattern recognition,
computational finance representation learning

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

Community 1st international conference: 1991 1st autodiff workshop at NeurIPS: 2016

http://www.autodiff.org/?module=Workshops https://autodiff-workshop.github.io/
Two communities getting to know each other
Automatic differentiation Machine learning
Methods Theory of differentiation, adjoints, Deep learning, differentiable programming,
checkpointing, source transformation probability theory, Bayesian methods

Applications Scientific computing, engineering design, Virtually all recent machine learning
computational fluid dynamics, Earth sciences, applications, pattern recognition,
computational finance representation learning

Languages C, C++, FORTRAN Python

Tools ADOL-C, ADIFOR, Tapenade, etc. PyTorch, TensorFlow, JAX, etc.

Community 1st international conference: 1991 1st autodiff workshop at NeurIPS: 2016

http://www.autodiff.org/?module=Workshops https://autodiff-workshop.github.io/
Baydin, Atılım Güneş, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. “Automatic Differentiation in Machine 34
Learning: a Survey.” Journal of Machine Learning Research (JMLR) 18 (153): 1–43. http://jmlr.org/papers/v18/17-468.html
Automatic differentiation
It is a (small) field of its own,
with a dedicated community
http://www.autodiff.org/

Non-machine-learning applications in
industry and academia
● Computational fluid dynamics
● Atmospheric sciences
● Computational finance
● Engineering design optimization

35
Fuel ignition, supersonic flow in a rocket nozzle. Gifs: Jason Koebler, SpaceX
Tools and community
International Conferences on AD European Workshops on AD

● 7th at Oxford, UK, 2016 ● 23rd, Virtual, Worldwide, 2020


● 6th at Fort Collins, US, 2012 ● 22nd at London, UK, 2019
● 5th at Bonn, Germany, 2008 ● 21st at Jena, Germany, 2018
● 4th at Chicago, US, 2004 ● 20th at INRIA Sophia-Antipolis, France, 2017
● 3rd at Nizza, France, 2000 ● 19th at Kaiserslautern, Germany, 2016
● 2nd at Santa Fe, US, 1996 ● 18th at Paderborn, Germany, 2015
● 1st at Breckenridge, US, 1991 ● 17th at Argonne, US, 2015
● 16th at Jena, Germany, 2014
● 15th at INRIA Sophia-Antipolos, France, 2014
● 14th at Oxford, UK, 2013

● 1st at Nice, France, 2005
36
http://www.autodiff.org/?module=Workshops
Differentiable programming in
practice
Differentiable programming frameworks

Two main possibilities:

- Static computational graphs


Let the user define the graph as a data structure

- Dynamic computational graphs


Construct the graph automatically
(general-purpose automatic differentiation)

38
Differentiable programming frameworks

Two main possibilities:

- Static computational graphs


“Define-and-run”
Let the user define the graph as a data structure

- Dynamic computational graphs


Construct the graph automatically “Define-by-run”
(general-purpose automatic differentiation)

39
Static graphs (define-and-run)
Prototypical examples: Theano, TensorFlow 1.0
- The user creates the graph using symbolic placeholders, using a
mini-language (domain-specific language, DSL)
- Limited (and unintuitive) control flow and expressivity
- The graph gets “compiled” to take care of expression swell, in-place ops.

Graph compilation in
Theano
40
Static graphs (define-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

41
Static graphs (define-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

Pure Python:

42
Static graphs (define-and-run)
Prototypical examples: Theano, TensorFlow

Let’s implement

Pure Python:

43
Dynamic graphs (define-by-run)
Prototypical examples: PyTorch
General-purpose autodiff, usually via operator overloading
- The user writes regular programs in host programming language
All language features (including control flow) are supported
- The graph is automatically constructed

44
Dynamic graphs (define-by-run)
Prototypical examples: PyTorch

Let’s implement

45
Dynamic graphs (define-by-run)
Prototypical examples: PyTorch

Let’s implement

Pure Python:

46
Dynamic graphs (define-by-run)
Prototypical examples: PyTorch

Let’s implement

Pure Python:

47
Current state of differentiable
programming
Evolution of frameworks
From: coarse-grained (module level) backprop
Towards: fine-grained, general-purpose automatic differentiation

theano
2008

torch7 torch-autograd
2011 2015
PyTorch
2016

HIPS autograd
2014
TensorFlow TensorFlow TensorFlow 2
2015 eager exec 2019
2017
JAX
2018
Design optimization
Design optimization

Inputs Parameters Outputs

Simulator (model) of the system

Optimal Objective Simulator


parameters

51
Design optimization

Inputs Parameters Outputs

Simulator (model) of the system

Can be efficiently found by gradient-based optimization if is available

52
Surrogates for differentiability

Inputs Parameters Outputs

Non-differentiable simulator (model)

● Run simulator many times


● Generate a (large) dataset of input - output pairs capturing simulator’s behavior

53
Surrogates for differentiability
● Use the dataset to learn a differentiable approximation of the simulator
(e.g., a deep generative model)

Inputs Parameters Outputs

Differentiable surrogate with 54


Example: a simple surrogate
Tuning synthetic image generation for computer vision

Number or synthetic images required during


training with photorealistic Arnold renderer

Behl, Baydin, Gal, Torr, Vineet “AutoSimulate: (Quickly) Learning Synthetic Data Generation” ECCV 2020 55
Example: exoplanet radiative transfer
● Posterior probability distributions of exoplanet atmospheric parameters
conditioned on observed spectra, using radiative transfer simulators
● Surrogates allow up to 180x faster inference

Himes, Harrington, Cobb, Baydin, Soboczenski, O'Beirne, Zorzan, Wright, Scheffer, Domagal-Goldman, Arney. 2020, October. Accelerating Bayesian Inference via Neural Networks:
Application to Exoplanet Retrievals. In AAS/Division for Planetary Sciences Meeting Abstracts (Vol. 52, No. 6, pp. 207-07). 56
Example: local generative surrogates
● Deep generative surrogates (GAN) successively trained in local neighborhoods
● Optimize SHiP muon shield (GEANT4, FairRoot), minimize number of recorded
muons by varying magnet geometry

57
Shirobokov, Belavin, Kagan, Ustyuzhanin, Baydin “Black-Box Optimization with Local Generative Surrogates” NeurIPS 2020
Example: universal probabilistic surrogates
● Replace a (slow) universal probabilistic program with a (fast) LSTM-based
surrogate that works in the same address space
● Enables faster Bayesian inference
● Differentiable surrogate model can enable gradient-based inference engines

Surrogate simulation of composite material


heating cycles (25x faster inference)

Munk, Scibior, Baydin, Stewart, Fernlund, Poursartip, Wood “Deep Probabilistic Surrogate Networks for Universal Simulator
58
Approximation” ProbProg 2020
Differentiability without surrogates

Inputs Parameters Outputs

Non-differentiable simulator (model)

59
Differentiability without surrogates

Inputs Parameters Outputs

Non-differentiable simulator (model)


Automatic differentiation
(e.g., source-to-source transformation)

Inputs Parameters Outputs

Differentiable simulator with 60


Differentiability without surrogates
● Use automatic differentiation tools to make the simulator directly differentiable
● Used in design optimization by the AD community for many decades

Forth, Shaun A.; Evans, Trevor P. Aerofoil Optimisation via AD of a Multigrid Cell-Vertex Euler Flow Solver. 2002

Daniele Casanova, Robin S. Sharp, Mark Final, Bruce Christianson, Pat Symonds. “Application of Automatic Differentiation to Race Car Performance 61
Optimisation” in Automatic Differentiation of Algorithms: From Simulation to Optimization, Springer, 2002
End-to-end differentiable pipelines
● Complex experimental setups can be composed of a pipeline of a series of
distinct simulators (e.g., SHERPA -> GEANT)
● One might need to differentiate through the whole end-to-end pipeline, which
can be achieved by compositionality and the chain rule

DARPA Data Driven Discovery of


Models (D3M)
https://datadrivendiscovery.org/

Milutinovic, Baydin, Zinkov, Harvey, Song, Wood, Shen. 2017. “End-to-End Training of Differentiable Pipelines Across Machine Learning
Frameworks.” NeurIPS 2017 Autodiff Workshop
62
Differentiable programming in particle physics
● Differentiable analysis
Unify analysis pipeline by simultaneously
optimizing the free parameters of an analysis
with respect to the desired physics objective

● Differentiable simulation
Enable efficient simulation-based inference,
reducing the number of events needed by
orders of magnitude

Baydin, Cranmer, Feickert, Gray, Heinrich, Held, Melo, Neubauer, Pearkes, Simpson, Smith, Stark, Thais, Vassilev, Watts. 2020.
“Differentiable Programming in High-Energy Physics.” In Snowmass 2021 Letters of Interest (LOI), Division of Particles and Fields (DPF),
American Physical Society. https://snowmass21.org/loi.
63
Optimization of experimental design
● Design of instruments is a complex
task, involving a combination of
performance and cost
considerations
● We need the next generation of
tools to optimize modern and
future particle detectors and
experiments
● MODE (Machine-learning Optimized
Design of Experiments)
collaboration!
https://mode-collaboration.github.io/

Baydin, Cranmer, de Castro Manzano, Delaere, Derkach, Donini, Dorigo, Giammanco, Kieseler, Layer, Louppe, Ratnikov, Strong, Tosi,
Ustyuzhanin, Vischia, Yarar. 2021. “Toward Machine Learning Optimization of Experimental Design.” Nuclear Physics News 31 (1)
64
Summary
Summary
● What is differentiable programming?
○ How to compute derivatives
○ Automatic differentiation
○ Tools and communities
● Differentiable programming in practice
○ Current state of differentiable programming
● Design optimization
○ Surrogates
○ Direct differentiation

66
Thank you for listening
Questions?

Selected references
[1] T. A. Le, A. G. Baydin, and F. Wood. Inference compilation and universal probabilistic programming. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
[2] A. Munk, A. Ścibior, A. G. Baydin, A. Stewart, G. Fernlund, A. Poursartip, and F. Wood. Deep probabilistic surrogate networks for universal simulator approximation. In PROBPROG, 2020.
[3] A. G. Baydin, L. Heinrich, W. Bhimji, L. Shao, S. Naderiparizi, A. Munk, J. Liu, B. Gram-Hansen, G. Louppe, L. Meadows, P. Torr, V. Lee, Prabhat, K. Cranmer, and F. Wood. Efficient probabilistic inference in the quest for physics beyond the
standard model. In NeurIPS, 2019.
[4] A. G. Baydin, L. Shao, W. Bhimji, L. Heinrich, L. F. Meadows, J. Liu, A. Munk, S. Naderiparizi, B. Gram-Hansen, G. Louppe, M. Ma, X. Zhao, P. Torr, V. Lee, K. Cranmer, Prabhat, and F. Wood. Etalumis: Bringing probabilistic programming to scientific
simulators at scale. In SC19, 2019.
[5] K. Cranmer, J. Brehmer, and G. Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020.
[6] B. Gram-Hansen, C. Schroeder, P. H. Torr, Y. W. Teh, T. Rainforth, and A. G. Baydin. Hijacking malaria simulators with probabilistic programming. In ICML workshop on AI for Social Good, 2019.
[7] B. Gram-Hansen, C. S. de Witt, R. Zinkov, S. Naderiparizi, A. Scibior, A. Munk, F. Wood, M. Ghadiri, P. Torr, Y. W. Teh, A. G. Baydin, and T. Rainforth. Efficient bayesian inference for nested simulators. In AABI, 2019.
[8] B. Poduval, A. G. Baydin, and N. Schwadron. Studying solar energetic particles and their seed population using surrogate models. In MML for Space Sciences workshop, COSPAR, 2021.
[9] G. Acciarini, F. Pinto, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Spacecraft collision risk assessment with probabilistic programming. In ML4PS (NeurIPS 2020), 2020.
[10] F. Pinto, G. Acciarini, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Towards automated satellite conjunction management with bayesian deep learning. In AI for Earth Sciences Workshop
(NeurIPS), 2020.
[11] G. Acciarini, F. Pinto, S. Metz, S. Boufelja, S. Kaczmarek, K. Merz, J. A. Martinez-Heras, F. Letizia, C. Bridges, and A. G. Baydin. Kessler: a machine learning library for space collision avoidance. In 8th European Conference on Space Debris, 2021.
[12] S. Shirobokov, V. Belavin, M. Kagan, A. Ustyuzhanin, and A. G. Baydin. Black-box optimization with local generative surrogates. In NeurIPS, 2020.
[13] H. S. Behl, A. G. Baydin, R. Gal, P. H. S. Torr, and V. Vineet. Autosimulate: (quickly) learning synthetic data generation. In 16th European Conference on Computer Vision (ECCV), 2020.
[14] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research (JMLR), 18(153):1–43, 2018.
[15] A. G. Baydin, B. A. Pearlmutter, and J. M. Siskind. DiffSharp: An AD library for .net languages. In 7th International Conference on Algorithmic Differentiation, 2016.
[16] A. G. Baydin, R. Cornish, D. M. Rubio, M. Schmidt, and F. Wood. Online learning rate adaptation with hypergradient descent. In ICLR, 2018.
[17] H. Behl, A. G. Baydin, and P. H. Torr. Alpha maml: Adaptive model-agnostic meta-learning. In AutoML (ICML), 2019.
[18] A. G. Baydin, K. Cranmer, M. Feickert, L. Gray, L. Heinrich, A. Held, A. Melo, M. Neubauer, J. Pearkes, N. Simpson, N. Smith, G. Stark, S. Thais, V. Vassilev, and G. Watts. Differentiable programming in high-energy physics. In Snowmass
2021 Letters of Interest (LOI), Division of Particles and Fields (DPF), American Physical Society, 2020.
[19] A. G. Baydin, K. Cranmer, P. de Castro Manzano, C. Delaere, D. Derkach, J. Donini, T. Dorigo, A. Giammanco, J. Kieseler, L. Layer, G. Louppe, F. Ratnikov, G. C. Strong, M. Tosi, A. Ustyuzhanin, P. Vischia, and H. Yarar. Toward machine learning
optimization of experimental design. Nuclear Physics News International (Submitted), 2020.
[20] L. F. Guedes dos Santos, S. Bose, V. Salvatelli, B. Neuberg, M. Cheung, M. Janvier, M. Jin, Y. Gal, P. Boerner, and A. G. Baydin. Multi-channel auto-calibration for the atmospheric imaging assembly using machine learning. Astronomy &
Astrophysics (in press), 2021.
[21] A. D. Cobb, M. D. Himes, F. Soboczenski, S. Zorzan, M. D. O’Beirne, A. G. Baydin, Y. Gal, S. D. Domagal-Goldman, G. N. Arney, and D. Angerhausen. An ensemble of bayesian neural networks for exoplanetary atmospheric retrieval. The
Astronomical Journal, 158(1), 2019.
[22] C. Schroeder de Witt, B. Gram-Hansen, N. Nardelli, A. Gambardella, R. Zinkov, P. Dokania, N. Siddharth, A. B. Espinosa-Gonzalez, A. Darzi, P. Torr, and A. G. Baydin. Simulation-based inference for global health decisions. In ICML Workshop
on Machine Learning for Global Health, Thirty-seventh International Conference on Machine Learning (ICML 2020), 2020. 67
Supplementary slides
Forward vs reverse

69
Derivatives in machine learning
“Backprop” and gradient descent are at the core of all recent advances

Probabilistic programming and modeling

Pyro ProbTorch
(2017) (2017) PyProb (2019)

Edward TensorFlow Probability


(2016) (2018)

- Variational inference
- “Neural” density estimation
- Transformed distributions via bijectors
- Normalizing flows (Rezende & Mohamed, 2015)
- Masked autoregressive flows (Papamakarios et al., 2017) 70
AD is at the core of machine learning
A new mindset and workflow, enabling differentiable algorithmic elements
● Neural Turing Machine, Differentiable Neural Computer (Graves et al. 2014, 2016)
○ Can infer algorithms: copy, sort, recall
● Stack-augmented RNN (Joulin & Mikolov, 2015)
● End-to-end memory network (Sukhbaatar et al., 2015)
● Stack, queue, deque (Grefenstette et al., 2015)
● Discrete interfaces (Zaremba & Sutskever, 2015)

DNC on binary number recall


(Wikimedia Commons: Kjerish)
71

You might also like