Backpropagation: TA: Yi Wen
Backpropagation: TA: Yi Wen
TA: Yi Wen
Plotted on WolframAlpha
Approach #1: Random Search
Intuition: the step we take in the domain of function
Approach #2: Numerical Gradient
Intuition: rate of change of a function with respect to a
variable surrounding a small region
Approach #2: Numerical Gradient
Intuition: rate of change of a function with respect to a
variable surrounding a small region
Finite Differences:
Approach #3: Analytical Gradient
Recall: partial derivative by limit definition
Approach #3: Analytical Gradient
Recall: chain rule
Approach #3: Analytical Gradient
Recall: chain rule
E.g.
Approach #3: Analytical Gradient
Recall: chain rule
E.g.
Approach #3: Analytical Gradient
Recall: chain rule
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Problem Statement: Backpropagation
local(x,W,b) => y
Input x W,b y output
dx,dW,db <= grad_local(dy,x,W,b)
dx dy
dW,db
Modularity: Previous Example
Compound function
Intermediate Variables
(forward propagation)
Modularity: 2-Layer Neural Network
Compound function
Intermediate Variables
(forward propagation)
???
???
???
Agenda
● Motivation
● Backprop Tips & Tricks
● Matrix calculus primer
Derivative w.r.t. Vector
Scalar-by-Vector
Vector-by-Vector
1. intermediate functions
2. local gradients
Derivative w.r.t. Vector: Chain Rule 3. full gradients
?
Derivative w.r.t. Vector: Takeaway
Derivative w.r.t. Matrix
Scalar-by-Matrix
Vector-by-Matrix ?
Derivative w.r.t. Matrix: Dimension Balancing
?
Vector-by-vector
?
Vector-by-vector
?
Vector-by-vector
?
Matrix multiplication [Backprop]
? ?
Elementwise function [Backprop]