cs231n 2018 Lecture04
cs231n 2018 Lecture04
Administrative
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 2 April 12, 2018
Administrative
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 3 April 12, 2018
Where we are...
scores function
SVM loss
want
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 4 April 12, 2018
Optimization
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 5 April 12, 2018
Gradient descent
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 6 April 12, 2018
Computational graphs
x
s (scores) hinge
* loss
+
L
W
R
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 7 April 12, 2018
Convolutional network
(AlexNet)
input image
weights
loss
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 8 April 12, 2018
Neural Turing Machine
input image
loss
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 9 April 12, 2018
Neural Turing Machine
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018
Backpropagation: a simple example
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 11 April 12, 2018
Backpropagation: a simple example
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 12 April 12, 2018
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 13 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 14 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 15 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 16 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 17 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 18 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 19 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 20 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Chain rule:
Want:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 21 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Chain rule:
Want:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 22 April 13, 2017
Backpropagation: a simple example
e.g. x = -2, y = 5, z = -4
Chain rule:
Want:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 23 April 13, 2017
Backpropagation: a simple example
df/dq =
Chain rule:
Want:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 24 April 13, 2017
f
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 25 April 12, 2018
“local gradient”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 26 April 12, 2018
“local gradient”
gradients
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 27 April 12, 2018
“local gradient”
gradients
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 28 April 12, 2018
“local gradient”
gradients
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 29 April 12, 2018
“local gradient”
gradients
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 30 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 31 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 32 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 33 April 12, 2018
Another example:
df/dx = d(x + c)/dx = 1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 34 April 12, 2018
Another example:
Upstream Local
gradient gradient
Upstream
gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 35 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 36 April 12, 2018
Another example:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 37 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 38 April 12, 2018
Another example:
Upstream Local
gradient gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 39 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 40 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 41 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 42 April 12, 2018
Another example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 43 April 12, 2018
Another example:
dxy/dx = y dxy/dy = x
2 * 0.2 = 0.4
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 44 April 12, 2018
Another example:
w1 = -2*0.2 = -0.4
x1 = -3*0.2 = -0.6
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 45 April 12, 2018
Computational graph representation may not
be unique. Choose one where local gradients
at each node can be easily expressed!
sigmoid function
sigmoid gate
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 46 April 12, 2018
Computational graph representation may not
be unique. Choose one where local gradients
at each node can be easily expressed!
sigmoid function
sigmoid gate
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 47 April 12, 2018
Patterns in backward flow
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 48 April 12, 2018
Patterns in backward flow
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 49 April 12, 2018
Patterns in backward flow
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 50 April 12, 2018
Patterns in backward flow
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 51 April 12, 2018
Patterns in backward flow
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 52 April 12, 2018
Gradients add at branches
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 53 April 12, 2018
Gradients for vectorized code (x,y,z are This is now the
now vectors) Jacobian matrix
(derivative of each
element of z w.r.t. each
“local gradient” element of x)
f
gradients
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 54 April 12, 2018
Vectorized operations
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 55 April 12, 2018
Vectorized operations
Jacobian matrix
Q: what is the
size of the
Jacobian matrix?
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 56 April 12, 2018
Vectorized operations
Jacobian matrix
Q: what is the
size of the
Jacobian matrix?
[4096 x 4096!]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 57 April 12, 2018
Vectorized operations
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018
Vectorized operations
Jacobian matrix
Q: what is the
size of the Q2: what does it
Jacobian matrix? look like?
[4096 x 4096!]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 60 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 61 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 62 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 63 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 64 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 65 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 66 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 67 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 68 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 69 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 70 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 71 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 72 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 73 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 74 April 12, 2018
A vectorized example:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 75 April 12, 2018
In discussion section: A matrix example...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 76 April 13, 2017
Modularized implementation: forward / backward API
Graph (or Net) object (rough pseudo code)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 77 April 12, 2018
Modularized implementation: forward / backward API
x
z
*
y
(x,y,z are scalars)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 78 April 12, 2018
Example: Caffe layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 79 April 12, 2018
Caffe Sigmoid Layer
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 80 April 12, 2018
In Assignment 1: Writing SVM / Softmax
Stage your forward/backward computation!
margins
E.g. for the SVM:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 81 April 12, 2018
Summary so far...
● neural nets will be very large: impractical to write down gradient formula
by hand for all parameters
● backpropagation = recursive application of the chain rule along a
computational graph to compute the gradients of all
inputs/parameters/intermediates
● implementations maintain a graph structure, where the nodes implement
the forward() / backward() API
● forward: compute result of an operation and save any intermediates
needed for gradient computation in memory
● backward: apply the chain rule to compute the gradient of the loss
function with respect to the inputs
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 82 April 12, 2018
Next: Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 83 April 12, 2018
Neural networks: without the brain stuff
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 84 April 12, 2018
Neural networks: without the brain stuff
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 85 April 12, 2018
Neural networks: without the brain stuff
(Before) Linear score function:
(Now) 2-layer Neural Network
x W1 h W2 s
3072 100 10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 86 April 12, 2018
Neural networks: without the brain stuff
(Before) Linear score function:
(Now) 2-layer Neural Network
x W1 h W2 s
3072 100 10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 87 April 12, 2018
Neural networks: without the brain stuff
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 88 April 12, 2018
Full implementation of training a 2-layer Neural Network needs ~20 lines:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 89 April 12, 2018
In HW: Writing a 2-layer net
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 90 April 12, 2018
This image by Fotis Bobolas is
licensed under CC-BY 2.0
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 91 April 12, 2018
Impulses carried toward cell body
dendrite
presynaptic
terminal
axon
cell body
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 92 April 12, 2018
Impulses carried toward cell body
dendrite
presynaptic
terminal
axon
cell body
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 93 April 12, 2018
Impulses carried toward cell body
dendrite
presynaptic
terminal
axon
cell body
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 94 April 12, 2018
Impulses carried toward cell body
dendrite
presynaptic
terminal
axon
cell body
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 95 April 12, 2018
Be very careful with your brain analogies!
Biological Neurons:
● Many different types
● Dendrites can perform complex non-linear computations
● Synapses are not a single weight but a complex non-linear dynamical
system
● Rate code may not be adequate
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 96 April 12, 2018
Activation functions
Sigmoid Leaky ReLU
tanh Maxout
ReLU ELU
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 97 April 12, 2018
Neural networks: Architectures
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 98 April 12, 2018
Example feed-forward computation of a neural network
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 99 April 12, 2018
Example feed-forward computation of a neural network
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018
0
Summary
- We arrange neurons into fully-connected layers
- The abstraction of a layer has the nice property that it
allows us to use efficient vectorized code (e.g. matrix
multiplies)
- Neural networks are not really neural
- Next time: Convolutional Neural Networks
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018
1