Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
This lecture:
- Derivatives in machine learning
- Review of essential concepts (what is a derivative, Jacobian, etc.)
- How do we compute derivatives
- Automatic differentiation
Next lecture:
- Current landscape of tools
- Implementation techniques
- Advanced concepts (higher-order API, checkpointing, etc.)
2
Derivatives and
machine learning
3
Derivatives in machine learning
“Backprop” and gradient descent are at the core of all recent advances
Computer vision
Top-5 error rate for ImageNet (NVIDIA devblog) Faster R-CNN (Ren et al. 2015) NVIDIA DRIVE PX 2 segmentation
4
Word error rates (Huang et al., 2014) Google Neural Machine Translation System (GNMT)
Derivatives in machine learning
“Backprop” and gradient descent are at the core of all recent advances
Pyro ProbTorch
(2017) (2017)
- Variational inference
- “Neural” density estimation
- Transformed distributions via bijectors
- Normalizing flows (Rezende & Mohamed, 2015)
- Masked autoregressive flows (Papamakarios et al., 2017) 5
Derivatives in machine learning
At the core of all: differentiable functions (programs) whose parameters are
tuned by gradient-based optimization
6
Automatic differentiation
Execute differentiable functions (programs) via automatic differentiation
A word on naming:
- Differentiable programming, a generalization of deep learning (Olah, LeCun)
“Neural networks are just a class of differentiable functions”
- Automatic differentiation
- Algorithmic differentiation
- AD
- Autodiff
- Algodiff
- Autograd
Also remember:
- Backprop
7
- Backpropagation (backward propagation of errors)
Essential concepts
refresher
8
Derivative
Function of a real variable
Leibniz, c. 1675
Leibniz Lagrange Newton 9
Derivative
Function of a real variable
Newton, c. 1665
…
around 15 such rules
“del”
11
Partial derivative
Function of several real variables
Consider all partial derivatives simultaneously and accumulate all direct and
indirect contributions (Important: will be useful later)
13
Matrix calculus and machine learning
Extension to Scalar output Vector output
multivariable
functions Scalar input
Vector input
16
How to compute derivatives
17
Derivatives
as code
18
Derivatives
as code
19
Manual
You can see papers like this:
21
Symbolic differentiation
Symbolic computation with Mathematica, Maple, Maxima,
and deep learning frameworks such as Theano Graph optimization
Problem: expression swell (e.g., in Theano)
22
Symbolic differentiation
Problem: only applicable to closed-form mathematical functions
but not of
24
Numerical differentiation
Finite difference approximation of ,
- Richardson extrapolation
- Differential quadrature
These increase rapidly in complexity
and never completely eliminate the error 25
Numerical differentiation
Finite difference approximation of ,
- Richardson extrapolation
- Differential quadrature
These increase rapidly in complexity
and never completely eliminate the error 26
Automatic differentiation
If we don’t need analytic derivative expressions, we can
evaluate a gradient exactly with only one forward and one reverse execution
28
1960s 1970s 1980s
Griewank, 1989
Revived reverse mode 29
1960s 1970s 1980s
Griewank, 1989
Revived reverse mode 30
Automatic differentiation
31
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
32
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
33
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
f(a, b):
c = a * b a
d = log(c) c
return d * log d
34
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
return d * log d
3
b
1.791 = f(2, 3)
35
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 36
Automatic differentiation
All numerical algorithms, when executed, evaluate to compositions of
a finite set of elementary operations with known derivatives
- Called a trace or a Wengert list (Wengert,1964)
- Alternatively represented as a computational graph showing dependencies
primal
f(a, b):
2
c = a * b a 6
d = log(c) c 1.791
0.5
return d * log d
3 0.166 1
b
1.791 = f(2, 3) derivative
0.333
[0.5, 0.333] = f’(2, 3) tangent, adjoint
“gradient” 37
Automatic differentiation
Two main flavors
Primals
Primals
Derivatives
Derivatives
(Tangents)
(Adjoints)
Nested combinations
(higher-order derivatives, Hessian–vector products, etc.)
- Forward-on-reverse
- Reverse-on-forward
- ... 38
What happens to control flow?
It disappears: branches are taken, loops are unrolled, functions are inlined, etc.
until we are left with the linear trace of execution
f(a, b):
c = a * b
if c > 0:
d = log(c)
else:
d = sin(c)
return d
39
What happens to control flow?
It disappears: branches are taken, loops are unrolled, functions are inlined, etc.
until we are left with the linear trace of execution
f(a = 2, b = 3): 2
c = a * b = 6 a 6
c 1.791
if c > 0:
d = log(c) = 1.791 * log d
3
else:
b
d = sin(c)
return d
40
What happens to control flow?
It disappears: branches are taken, loops are unrolled, functions are inlined, etc.
until we are left with the linear trace of execution
f(a = 2, b = -1): 2
c = a * b = -2 a -2
c -0.909
if c > 0:
d = log(c) * sin d
-1
else:
b
d = sin(c) = -0.909
return d
41
What happens to control flow?
It disappears: branches are taken, loops are unrolled, functions are inlined, etc.
until we are left with the linear trace of execution
f(a = 2, b = -1): 2
c = a * b = -2 a -2
c -0.909
if c > 0:
d = log(c) * sin d
-1
else:
b
d = sin(c) = -0.909
return d A directed acyclic graph (DAG)
Topological ordering
42
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2):
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
43
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2):
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
44
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
45
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
46
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
47
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
48
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
49
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
50
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
51
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 0
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
52
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 0
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
53
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3 -0.279
v1 = x1 * x2 x1 * sin y1
1 2.880
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 0
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
54
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3 -0.279
v1 = x1 * x2 x1 * sin y1
1 2.880
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0
f(2, 3)
55
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
v1
f(x1, x2): 2 3 -0.279
v1 = x1 * x2 x1 * sin y1
1 2.880
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0 3
f(2, 3)
56
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
In general, forward mode evaluates v1
2 3 -0.279
a Jacobian–vector product x1 sin y1
*
1 2.880
So we evaluated: 1.098
v2
3 7.098
0
x2 log + y2
0 3
57
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
6
In general, forward mode evaluates v1
2 3 -0.279
a Jacobian–vector product x1 sin y1
*
1 2.880
So we evaluated: 1.098
v2
3 7.098
0
x2 log + y2
0 3
Can be any
not only unit vectors
58
Forward mode Primals: independent
Derivatives (tangents): independent
dependent
dependent
So we evaluated:
Can be any
not only unit vectors
59
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
v1
f(x1, x2):
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
60
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
v1
f(x1, x2):
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
61
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
62
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
63
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2)
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
64
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2
v1 = x1 * x2 x1 * sin y1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
65
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 -0.279
v1 = x1 * x2 x1 * sin y1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
66
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 -0.279
v1 = x1 * x2 x1 * sin y1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
67
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2)
f(2, 3)
68
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
69
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
70
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
71
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2
x2 log + y2
return (y1, y2) 0
f(2, 3)
72
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0
f(2, 3)
73
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0
f(2, 3)
74
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
2.880 1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0
f(2, 3)
75
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
2.880 1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 0
f(2, 3)
76
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
v1
f(x1, x2): 2 0.960 -0.279
v1 = x1 * x2 x1 * sin y1
2.880 1
v2 = log(x2) 1.098
y1 = sin(v1) v2
3 7.098
y2 = v1 + v2 0
x2 log + y2
return (y1, y2) 1.920 0
f(2, 3)
77
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
6
In general, forward mode evaluates a v1
2 0.960 -0.279
transposed Jacobian–vector product x1 sin y1
*
2.880 1
1.098
So we evaluated: v2
3 7.098
0
x2 log + y2
1.920 0
78
Reverse mode Primals: independent
Derivatives (adjoints): independent
dependent
dependent
For this is
So we evaluated: the gradient
79
Forward vs reverse summary
In the extreme In the extreme
use forward mode to evaluate use reverse mode to evaluate
80
Forward vs reverse summary
In the extreme In the extreme
use forward mode to evaluate use reverse mode to evaluate
81
Backprop through
normal PDF
82
Backprop through normal PDF
0.5
x - ·2 / - exp * f
0
µ ·2 * * sqrt 1/·
1
σ 2 π
83
Summary
84
Summary
This lecture:
- Derivatives in machine learning
- Review of essential concepts (what is a derivative, etc.)
- How do we compute derivatives
- Automatic differentiation
Next lecture:
- Current landscape of tools
- Implementation techniques
- Advanced concepts (higher-order API, checkpointing, etc.)
85
References
Baydin, A.G., Pearlmutter, B.A., Radul, A.A. and Siskind, J.M., 2017. Automatic differentiation in machine learning: a survey.
Journal of Machine Learning Research (JMLR), 18(153), pp.1-153.
Baydin, Atılım Güneş, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016. “Tricks from Deep Learning.” In 7th International
Conference on Algorithmic Differentiation, Christ Church Oxford, UK, September 12–15, 2016.
Baydin, Atılım Güneş, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016. “DiffSharp: An AD Library for .NET Languages.” In 7th
International Conference on Algorithmic Differentiation, Christ Church Oxford, UK, September 12–15, 2016.
Baydin, Atılım Güneş, Robert Cornish, David Martínez Rubio, Mark Schmidt, and Frank Wood. 2018. “Online Learning Rate
Adaptation with Hypergradient Descent.” In Sixth International Conference on Learning Representations (ICLR), Vancouver,
Canada, April 30 – May 3, 2018.
Griewank, A. and Walther, A., 2008. Evaluating derivatives: principles and techniques of algorithmic differentiation (Vol. 105).
SIAM.
86
Extra slides
87
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
88
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
89
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
f(2, 3)
90
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
f(2, 3)
91
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a
c = a * b c
d = log(c) * log d
3
return d
b
f(2, 3)
92
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a
c = a * b 1 c
d = log(c) * log d
3
return d
b
f(2, 3)
93
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a
c = a * b 1 c
d = log(c) * log d
3
return d
b
0
f(2, 3)
94
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a 6
c = a * b 1 c
d = log(c) * log d
3
return d
b
0
f(2, 3)
95
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a 6
c = a * b 1 c
d = log(c) * log d
3 3
return d
b
0
f(2, 3)
96
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
1
d = log(c) * log d
3 3
return d
b
0
f(2, 3)
97
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
1
d = log(c) * log d
3 3 0.5
return d
b
0
f(2, 3)
98
Forward mode Primals: independent 🡨 dependent
Derivatives (tangents): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
1
d = log(c) * log d
3 3 0.5
return d
b
0
f(2, 3)
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
100
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
f(2, 3)
101
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a
c = a * b c
d = log(c) * log d
return d
b
f(2, 3)
102
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a
c = a * b c
d = log(c) * log d
3
return d
b
f(2, 3)
103
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c
d = log(c) * log d
3
return d
b
f(2, 3)
104
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
d = log(c) * log d
3
return d
b
f(2, 3)
105
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
d = log(c) * log d
3 1
return d
b
f(2, 3)
106
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
d = log(c) * log d
3 1
return d
b
f(2, 3)
107
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
d = log(c) * log d
3 0.166 1
return d
b
f(2, 3)
108
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
d = log(c) * log d
3 0.166 1
return d
b
f(2, 3)
109
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
0.5
d = log(c) * log d
3 0.166 1
return d
b
f(2, 3)
110
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
0.5
d = log(c) * log d
3 0.166 1
return d
b
f(2, 3)
111
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
0.5
d = log(c) * log d
3 0.166 1
return d
b
0.333
f(2, 3)
112
Reverse mode Primals: independent 🡨 dependent
Derivatives (adjoints): independent 🡨 dependent
2
f(a, b): a 6
c = a * b c 1.791
0.5
d = log(c) * log d
3 0.166 1
return d
b
0.333
f(2, 3)
2
f(a, b): a 6
c = a * b c 1.791
0.5
d = log(c) * log d
3 0.166 1
return d
b
0.333
f(2, 3)