0% found this document useful (0 votes)

60 views78 pages

4.2 Backpropagation 1

The document discusses backpropagation and gradient descent in neural networks. It covers computational graphs and how gradients flow through the graph in backpropagation using the chain rule. Specifically, it provides examples of how local gradients at each node are calculated from upstream gradients. It also discusses common patterns in the backward flow of gradients, such as gradient distributors for addition gates and gradient routers for maximum gates.

Uploaded by

Mitiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views78 pages

4.2 Backpropagation 1

Uploaded by

Mitiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 1 April 12, 2018

Where we are...

scores function

SVM loss

data loss + regularization

want

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 4 April 12, 2018
Optimization

Landscape image is CC0 1.0 public domain

Walking man image is CC0 1.0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 5 April 12, 2018
Gradient descent

Numerical gradient: slow :(, approximate :(, easy to write :)

Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your

implementation with numerical gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 6 April 12, 2018
Computational graphs

x
s (scores) hinge

* loss
+
L

W
R

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 7 April 12, 2018
Convolutional network
(AlexNet)

input image

weights

loss

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 8 April 12, 2018
Backpropagation: a simple example

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 11 April 12, 2018
Backpropagation: a simple example

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 12 April 12, 2018
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 13 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 14 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 15 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 16 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 17 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 18 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 19 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 20 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 21 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 22 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 23 April 13, 2017
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 24 April 13, 2017
f

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 25 April 12, 2018
“local gradient”

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 26 April 12, 2018
“local gradient”

gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 27 April 12, 2018
“local gradient”

gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 28 April 12, 2018
“local gradient”

gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 29 April 12, 2018
“local gradient”

gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 30 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 31 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 32 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 33 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 34 April 12, 2018
Another example:

Upstream Local
gradient gradient

Upstream
gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 35 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 36 April 12, 2018
Another example:

Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 37 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 38 April 12, 2018
Another example:

Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 39 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 40 April 12, 2018
Another example:

Upstream Local
gradient gradient

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 41 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 42 April 12, 2018
Another example:

[upstream gradient] x [local gradient]

[0.2] x [1] = 0.2
[0.2] x [1] = 0.2 (both inputs!)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 43 April 12, 2018
Another example:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 44 April 12, 2018
Another example:

[upstream gradient] x [local gradient]

x0: [0.2] x [2] = 0.4
w0: [0.2] x [-1] = -0.2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 45 April 12, 2018
Computational graph representation may not
be unique. Choose one where local gradients
at each node can be easily expressed!

sigmoid function

sigmoid gate

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 46 April 12, 2018
Computational graph representation may not
be unique. Choose one where local gradients
at each node can be easily expressed!

sigmoid function

sigmoid gate

[upstream gradient] x [local gradient]

[1.00] x [(1 - 0.73) (0.73)]= 0.2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 47 April 12, 2018
Patterns in backward flow

add gate: gradient distributor

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 48 April 12, 2018
Patterns in backward flow

add gate: gradient distributor

Q: What is a max gate?

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 49 April 12, 2018
Patterns in backward flow

add gate: gradient distributor

max gate: gradient router

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 50 April 12, 2018
Patterns in backward flow

add gate: gradient distributor

max gate: gradient router
Q: What is a mul gate?

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 51 April 12, 2018
Patterns in backward flow

add gate: gradient distributor

max gate: gradient router
mul gate: gradient switcher

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 52 April 12, 2018
Gradients add at branches

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 53 April 12, 2018
Gradients for vectorized code (x,y,z are This is now the
now vectors) Jacobian matrix
(derivative of each
element of z w.r.t. each
“local gradient” element of x)

f
gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 54 April 12, 2018
Vectorized operations