3 Gradient
3 Gradient
Another example:
Vector
Scalar
Scalar
Vector
Gradients
x, vector Vector
y, scalar Scalar
Scalar
Vector
y, vector Vector
x, scalar Scalar
Scalar
Vector
y, vector Vector
x, vector Scalar
Scalar
Vector
Scalar
Vector
Matrix
Chain Rule
Generalize to Vectors
▪ Generalize to vectors:
Example 1
Assume
Compute
Decompose
Example 1
Assume
Compute
Decompose
Auto Differentiation
Auto differentiation
Auto Differentiation (AD)
▪ Numerical differentiation
Computation Graph
b = sym.var()
c=2*a+b
later
Computation Graph
b = nd.ones((2,1)
c=2*a+b
Two Modes
▪ By chain rule
▪ Forward accumulation
Assume
Forward Backward
Reverse Accumulation
Assume
Forward Backward
Read pre-computed
results
Reverse Accumulation
Assume
Forward Backward
Reverse Accumulation
Assume
Forward Backward
Reverse Accumulation Summary
Forward Backward
Complexities
▪ Memory complexity: O(n), needs to record all intermediate results in the forward pass
▪ O(n) time complexity to compute one gradient, O(n*k) to compute gradients for k variables
Part 2
Part 1
Complexities