0% found this document useful (0 votes)

25 views

Lecture 4

The document outlines the topics that will be covered in Lecture 4, including announcements, administrative details, and an overview of neural networks and backpropagation. Specifically, there will be a discussion on backpropagation in the discussion section the next day. The lecture will also cover recent advances in deep learning like DALL-E 2, GPT-4, and neural network architectures. It provides examples of feedforward neural network computations and notes that implementing and training a basic 2-layer neural network only requires around 20 lines of code.

Uploaded by

Laila Shoukry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Lecture 4

Uploaded by

Laila Shoukry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 146

Lecture 4:

Neural Networks and

Backpropagation

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 1 April 13, 2023
Announcements

AWS credit: create an account, submit the number ID using

google form by 4/13.

Assignment 1 due Fri 4/21 at 11:59pm

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 2 April 13, 2023
Administrative: Project Proposal

Due Mon 4/24

TA expertise are posted on the webpage.

(http://cs231n.stanford.edu/office_hours.html)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 3 April 13, 2023
Administrative: Discussion Section

Discussion section tomorrow:

Backpropagation

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 4 April 13, 2023
Recap
- We have some dataset of (x,y) e.g.
- We have a score function:
- We have a loss function:

Softmax

SVM

Full loss

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 5 April 13, 2023
Finding the best W: Optimize with Gradient Descent

Landscape image is CC0 1.0 public domain

Walking man image is CC0 1.0 public domain

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 6 April 13, 2023
Gradient descent

Numerical gradient: slow :(, approximate :(, easy to write :)

Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your

implementation with numerical gradient

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 7 April 13, 2023
Stochastic Gradient Descent (SGD)
Full sum expensive
when N is large!

Approximate sum
using a minibatch of
examples
32 / 64 / 128 common

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 8 April 13, 2023
Last time: fancy optimizers

SGD

SGD+Momentum

RMSProp

Adam

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 9 April 13, 2023
Last time: learning rate scheduling
Step: Reduce learning rate at a few fixed
points. E.g. for ResNets, multiply LR by 0.1
Reduce learning rate after epochs 30, 60, and 90.

Cosine:

Linear:

Inverse sqrt:

: Initial learning rate

: Learning rate at epoch t
: Total number of epochs

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 10 April 13, 2023
Today:

Deep Learning

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 11 April 13, 2023
Dall-E 2

“Teddy bears working on new AI research “Rabbits attending a college seminar on “A wise cat meditating in the Himalayas
on the moon in the 1980s.” human anatomy.” searching for enlightenment.”

Image source: Sam Altman, https://openai.com/dall-e-2/, https://twitter.com/sama/status/1511724264629678084

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 12 April 13, 2023
Ramesh et al., Hierarchical Text-Conditional
Image Generation with CLIP Latents, 2022.

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 13 April 13, 2023
GPT-4

Image source: https://openai.com/research/gpt-4

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 14 April 13, 2023
Segment Anything Model (SAM)

Kirillov et al., Segment Anything, 2023

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 15 April 13, 2023
Neural Networks

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 16 April 13, 2023
Neural networks: the original linear classifier

(Before) Linear score function:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 17 April 13, 2023
Neural networks: 2 layers

(Before) Linear score function:

(Now) 2-layer Neural Network

(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 18 April 13, 2023
Why do we want non-linearity?

Cannot separate red

and blue points with
linear classifier

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 19 April 13, 2023
Why do we want non-linearity?

y θ

f(x, y) = (r(x, y), θ(x, y))

x r

Cannot separate red After applying feature

and blue points with transform, points can
linear classifier be separated by linear
classifier

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 20 April 13, 2023
Neural networks: also called fully connected network

(Before) Linear score function:

(Now) 2-layer Neural Network

“Neural Network” is a very broad term; these are more accurately called
“fully-connected networks” or sometimes “multi-layer perceptrons” (MLP)
(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 21 April 13, 2023
Neural networks: 3 layers

(Before) Linear score function:

(Now) 2-layer Neural Network
or 3-layer Neural Network

(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 22 April 13, 2023
Neural networks: hierarchical computation
(Before) Linear score function:
(Now) 2-layer Neural Network

x W1 h W2 s
3072 100 10

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 23 April 13, 2023
Neural networks: learning 100s of templates
(Before) Linear score function:
(Now) 2-layer Neural Network

x W1 h W2 s
3072 100 10

Learn 100 templates instead of 10. Share templates between classes

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 24 April 13, 2023
Neural networks: why is max operator important?
(Before) Linear score function:
(Now) 2-layer Neural Network

The function is called the activation function.

Q: What if we try to build a neural network without one?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 25 April 13, 2023
Neural networks: why is max operator important?
(Before) Linear score function:
(Now) 2-layer Neural Network

The function is called the activation function.

Q: What if we try to build a neural network without one?

A: We end up with a linear classifier again!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 26 April 13, 2023
ReLU is a good default
Activation functions choice for most problems

Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 27 April 13, 2023
Neural networks: Architectures

“3-layer Neural Net”, or

“2-layer Neural Net”, or “2-hidden-layer Neural Net”
“1-hidden-layer Neural Net”
“Fully-connected” layers

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 28 April 13, 2023
Example feed-forward computation of a neural network

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 29 April 13, 2023
Full implementation of training a 2-layer Neural Network needs ~20 lines:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 30 April 13, 2023
Full implementation of training a 2-layer Neural Network needs ~20 lines:

Define the network

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 31 April 13, 2023
Full implementation of training a 2-layer Neural Network needs ~20 lines:

Define the network

Forward pass

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 32 April 13, 2023
Full implementation of training a 2-layer Neural Network needs ~20 lines:

Define the network

Forward pass

Calculate the analytical gradients

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 33 April 13, 2023
Full implementation of training a 2-layer Neural Network needs ~20 lines:

Define the network

Forward pass

Calculate the analytical gradients

Gradient descent

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 34 April 13, 2023
Setting the number of layers and their sizes

more neurons = more capacity

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 35 April 07, 2022
Do not use size of neural network as a regularizer. Use stronger regularization instead:

(Web demo with ConvNetJS:

http://cs.stanford.edu/people/karpathy/convnetjs/demo
/classify2d.html)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 36 April 07, 2022
This image by Fotis Bobolas is
licensed under CC-BY 2.0

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 37 April 13, 2023
Impulses carried toward cell body
dendrite
presynaptic
terminal

axon

cell body

Impulses carried away

from cell body
This image by Felipe Perucho
is licensed under CC-BY 3.0

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 38 April 13, 2023
Impulses carried toward cell body
dendrite
presynaptic
terminal

axon

cell body

Impulses carried away

from cell body
This image by Felipe Perucho
is licensed under CC-BY 3.0

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 39 April 13, 2023
Impulses carried toward cell body
dendrite
presynaptic
terminal

axon

cell body

Impulses carried away

from cell body
This image by Felipe Perucho
is licensed under CC-BY 3.0

sigmoid activation function

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 40 April 13, 2023
Impulses carried toward cell body
dendrite
presynaptic
terminal

axon

cell body

Impulses carried away

from cell body
This image by Felipe Perucho
is licensed under CC-BY 3.0

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 41 April 13, 2023
Biological Neurons: Neurons in a neural network:
Complex connectivity patterns Organized into regular layers for
computational efficiency

This image is CC0 Public Domain

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 42 April 13, 2023
Biological Neurons: But neural networks with random
Complex connectivity patterns connections can work too!

This image is CC0 Public Domain

Xie et al, “Exploring Randomly Wired Neural Networks for Image Recognition”, arXiv 2019

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 43 April 13, 2023
Be very careful with your brain analogies!
Biological Neurons:
● Many different types
● Dendrites can perform complex non-linear computations
● Synapses are not a single weight but a complex non-linear dynamical system

[Dendritic Computation. London and Hausser]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 44 April 13, 2023
Plugging in neural networks with loss functions
Nonlinear score function
SVM Loss on predictions

Regularization

Total loss: data loss + regularization

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 45 April 13, 2023
Problem: How to compute gradients?
Nonlinear score function
SVM Loss on predictions

Regularization

Total loss: data loss + regularization

If we can compute then we can learn W1 and W2

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 46 April 13, 2023
(Bad) Idea: Derive on paper
Problem: Very tedious: Lots of
matrix calculus, need lots of paper
Problem: What if we want to
change loss? E.g. use softmax
instead of SVM? Need to
re-derive from scratch =(
Problem: Not feasible for very
complex models!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 47 April 13, 2023
Better Idea: Computational graphs + Backpropagation

x
s (scores) hinge

* loss
+
L

W
R

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 48 April 13, 2023
Convolutional network
(AlexNet)

input image

weights

loss

Figure copyright Alex Krizhevsky, Ilya Sutskever, and

Geoffrey Hinton, 2012. Reproduced with permission.

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 49 April 13, 2023
Really complex neural
networks!!

input image

loss

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 50 April 13, 2023
Neural Turing Machine

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - April 13, 2023
Solution: Backpropagation

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 52 April 13, 2023
Backpropagation: a simple example

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 53 April 13, 2023
Backpropagation: a simple example

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 54 April 13, 2023
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 55 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 56 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 57 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 58 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 59 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 60 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 61 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 62 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 63 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Want:

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 64 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 65 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 66 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 67 April 07, 2022
Backpropagation: a simple example

e.g. x = -2, y = 5, z = -4

Chain rule:

Want:
Upstream Local
gradient gradient

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 4 - 68 April 07, 2022
f

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 69 April 13, 2023
“local gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 70 April 13, 2023
“local gradient”

“Upstream
gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 71 April 13, 2023
“local gradient”

“Downstream
gradients”
f

“Upstream
gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 72 April 13, 2023
“local gradient”

“Downstream
gradients”
f

“Upstream
gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 73 April 13, 2023
“local gradient”

“Downstream
gradients”
f

“Upstream
gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 74 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 75 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 76 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 77 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 78 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 79 April 13, 2023
Another example:

Upstream Local
gradient gradient

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 80 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 81 April 13, 2023
Another example:

Upstream Local
gradient gradient

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 82 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 83 April 13, 2023
Another example:

Upstream Local
gradient gradient

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 84 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 85 April 13, 2023
Another example:

Upstream Local
gradient gradient

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 86 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 87 April 13, 2023
Another example:

[upstream gradient] x [local gradient]

[0.2] x [1] = 0.2
[0.2] x [1] = 0.2 (both inputs!)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 88 April 13, 2023
Another example:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 89 April 13, 2023
Another example:

[upstream gradient] x [local gradient]

w0: [0.2] x [-1] = -0.2
x0: [0.2] x [2] = 0.4

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 90 April 13, 2023
Another example: Computational graph
representation may not
be unique. Choose one
Sigmoid where local gradients at
function each node can be easily
expressed!

Sigmoid

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 91 April 13, 2023
Another example: Computational graph
representation may not
be unique. Choose one
Sigmoid where local gradients at
function each node can be easily
expressed!

Sigmoid

Sigmoid local
gradient:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 92 April 13, 2023
Another example: Computational graph
representation may not
be unique. Choose one
Sigmoid where local gradients at
function each node can be easily
expressed!

Sigmoid

[upstream gradient] x [local gradient]

[1.00] x [(1 - 1/(1+e-1)) (1/(1+e-1))] = 0.2

Sigmoid local
gradient:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 93 April 13, 2023
Another example: Computational graph
representation may not
be unique. Choose one
Sigmoid where local gradients at
function each node can be easily
expressed!

Sigmoid

[upstream gradient] x [local gradient]

[1.00] x [(1 - 0.73) (0.73)] = 0.2

Sigmoid local
gradient:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 94 April 13, 2023
Patterns in gradient flow
add gate: gradient distributor
3
2
7
4
+ 2
2

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 95 April 13, 2023
Patterns in gradient flow
add gate: gradient distributor mul gate: “swap multiplier”
3 2
2 5*3=15
7 6
4
+ 2 3
× 5
2 2*5=10

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 96 April 13, 2023
Patterns in gradient flow
add gate: gradient distributor mul gate: “swap multiplier”
3 2
2 5*3=15
7 6
4
+ 2 3
× 5
2 2*5=10

copy gate: gradient adder

7
7 4
4+2=6 7
2

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 97 April 13, 2023
Patterns in gradient flow
add gate: gradient distributor mul gate: “swap multiplier”
3 2
2 5*3=15
7 6
4
+ 2 3
× 5
2 2*5=10

copy gate: gradient adder max gate: gradient router

7 4
7 4 0 5
max
4+2=6 7 5 9
2 9

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 98 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Backward pass:
Compute grads

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 99 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Base case

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 100 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Sigmoid

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 101 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Add gate

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 102 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Add gate

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 103 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Multiply gate

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 104 April 13, 2023
Backprop Implementation:
“Flat” code Forward pass:
Compute output

Multiply gate

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 105 April 13, 2023
“Flat” Backprop: Do this for assignment 1!
Stage your forward/backward computation!
margins
E.g. for the SVM:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 106 April 13, 2023
“Flat” Backprop: Do this for assignment 1!
E.g. for two-layer neural net:

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 107 April 13, 2023
Backprop Implementation: Modularized API
Graph (or Net) object (rough pseudo code)

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 108 April 13, 2023
Modularized implementation: forward / backward API
Gate / Node / Function object: Actual PyTorch code

x
z Need to cache
* some values for
use in backward

y
Upstream
(x,y,z are scalars) gradient

Multiply upstream
and local gradients

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 109 April 13, 2023
Example: PyTorch operators

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 110 April 13, 2023
PyTorch sigmoid layer
Forward

Source

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 111 April 13, 2023
PyTorch sigmoid layer
Forward

Forward actually
defined elsewhere...

Source

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 112 April 13, 2023
PyTorch sigmoid layer
Forward

Forward actually
defined elsewhere...

Backward

Source

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 113 April 13, 2023
So far: backprop with scalars

What about vector-valued functions?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 114 April 13, 2023
Recap: Vector derivatives
Scalar to Scalar

Regular derivative:

If x changes by a
small amount, how
much will y change?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 115 April 13, 2023
Recap: Vector derivatives
Scalar to Scalar Vector to Scalar

Regular derivative: Derivative is Gradient:

If x changes by a For each element of x,

small amount, how if it changes by a small
much will y change? amount then how much
will y change?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 116 April 13, 2023
Recap: Vector derivatives
Scalar to Scalar Vector to Scalar Vector to Vector

Regular derivative: Derivative is Gradient: Derivative is Jacobian:

If x changes by a For each element of x, For each element of x, if it

small amount, how if it changes by a small changes by a small amount
much will y change? amount then how much then how much will each
will y change? element of y change?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 117 April 13, 2023
Backprop with Vectors
Loss L still a scalar!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 118 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx

Dz
f
Dy

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 119 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx

Dz
f
Dy
“Upstream gradient”

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 120 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx

Dz
f
Dy Dz
“Upstream gradient”
For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 121 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx “local
gradients”

Dz
“Downstream
gradients”
f
Dy Dz
“Upstream gradient”
For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 122 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx “local
gradients”

[Dx x Dz] Dz
“Downstream
gradients”
f
[Dy x Dz]
Dy Dz
Jacobian
matrices “Upstream gradient”
For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 123 April 13, 2023
Backprop with Vectors
Loss L still a scalar!
Dx “local
gradients”

Dx [Dx x Dz] Dz
“Downstream
gradients”
Matrix-vector
multiply
f
[Dy x Dz]
Dy Dz
Jacobian
matrices “Upstream gradient”
Dy For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 124 April 13, 2023
Gradients of variables wrt loss have same dims as the original variable

Loss L still a scalar!

Dx Dz
f
Dy Dz
“Upstream gradient”
Dy For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 125 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
[ 3 ] (elementwise) [ 3 ]
[ -1 ] [ 0 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 126 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
[ 3 ] (elementwise) [ 3 ]
[ -1 ] [ 0 ]

4D dL/dz:
[ 4 ]
[ -1 ] Upstream
[ 5 ] gradient
[ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 127 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
[ 3 ] (elementwise) [ 3 ]
[ -1 ] [ 0 ]

Jacobian dz/dx 4D dL/dz:

[1000] [ 4 ]
[0000] [ -1 ] Upstream
[0010] [ 5 ] gradient
[0000] [ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 128 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
[ 3 ] (elementwise) [ 3 ]
[ -1 ] [ 0 ]

[dz/dx] [dL/dz] 4D dL/dz:

[1000][4 ] [ 4 ]
[ 0 0 0 0 ] [ -1 ] [ -1 ] Upstream
[0010][5 ] [ 5 ] gradient
[0000][9 ] [ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 129 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
[ 3 ] (elementwise) [ 3 ]
[ -1 ] [ 0 ]

4D dL/dx: [dz/dx] [dL/dz] 4D dL/dz:

[4] [1000][4 ] [ 4 ]
[0] [ 0 0 0 0 ] [ -1 ] [ -1 ] Upstream
[5] [0010][5 ] [ 5 ] gradient
[0] [0000][9 ] [ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 130 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
Jacobian is sparse:
off-diagonal entries
[ 3 ] (elementwise) [ 3 ]
always zero! Never [ -1 ] [ 0 ]
explicitly form
Jacobian -- instead
use implicit 4D dL/dx: [dz/dx] [dL/dz] 4D dL/dz:
multiplication [4] [1000][4 ] [ 4 ]
[0] [ 0 0 0 0 ] [ -1 ] [ -1 ] Upstream
[5] [0010][5 ] [ 5 ] gradient
[0] [0000][9 ] [ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 131 April 13, 2023
Backprop with Vectors
4D input x: 4D output z:
[ 1 ] [ 1 ]
[ -2 ] f(x) = max(0,x) [ 0 ]
Jacobian is sparse:
off-diagonal entries
[ 3 ] (elementwise) [ 3 ]
always zero! Never [ -1 ] [ 0 ]
explicitly form
Jacobian -- instead
use implicit 4D dL/dx: [dz/dx] [dL/dz] 4D dL/dz:
multiplication [4] [ 4 ]
[0] z [ -1 ] Upstream
[5] [ 5 ] gradient
[0] [ 9 ]

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 132 April 13, 2023
Backprop with Matrices (or Tensors) Loss L still a scalar!

dL/dx always has the

[Dx×Mx]
same shape as x!

[Dz×Mz]
Matrix-vector
multiply
f
[Dy×My]
Jacobian
matrices

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 133 April 13, 2023
Backprop with Matrices (or Tensors) Loss L still a scalar!

dL/dx always has the

[Dx×Mx]
same shape as x!

[Dx×Mx]
[Dz×Mz]
“Downstream
gradients”
Matrix-vector
multiply
f
[Dy×My] [Dz×Mz]
Jacobian
matrices “Upstream gradient”
[Dy×My] For each element of z, how
much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 134 April 13, 2023
Backprop with Matrices (or Tensors) Loss L still a scalar!

dL/dx always has the

[Dx×Mx] “local same shape as x!
gradients”
[Dx×Mx]
[Dz×Mz]
“Downstream
Matrix-vector
gradients” multiply
[Dy×My] [Dz×Mz]
Jacobian
matrices “Upstream gradient”
[Dy×My] For each element of z, how
For each element of y, how much
does it influence each element of z? much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 135 April 13, 2023
Backprop with Matrices (or Tensors) Loss L still a scalar!

dL/dx always has the

[Dx×Mx] “local same shape as x!
gradients”
[Dx×Mx] [(Dx×Mx)×(Dz×Mz)] [Dz×Mz]
“Downstream
Matrix-vector
gradients” multiply [(Dy×My)×(Dz×Mz)]
[Dy×My] [Dz×Mz]
Jacobian
matrices “Upstream gradient”
[Dy×My] For each element of z, how
For each element of y, how much
does it influence each element of z? much does it influence L?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 136 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] [ -8 1 4 6 ]
[ 2 1 3 2]
[ 3 2 1 -2]

Also see derivation in the course notes:

http://cs231n.stanford.edu/handouts/linear-backprop.pdf

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 137 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Jacobians: [ -8 1 4 6 ]
[ 2 1 3 2] dy/dx: [(N×D)×(N×M)]
[ 3 2 1 -2] dy/dw: [(D×M)×(N×M)]
For a neural net we may have
N=64, D=M=4096
Each Jacobian takes ~256 GB of
memory! Must work with them implicitly!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 138 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Q: What parts of y [ -8 1 4 6 ]
[ 2 1 3 2] are affected by one
[ 3 2 1 -2] element of x?

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 139 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Q: What parts of y [ -8 1 4 6 ]
[ 2 1 3 2] are affected by one
[ 3 2 1 -2] element of x?
A: affects the
whole row

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 140 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Q: What parts of y Q: How much [ -8 1 4 6 ]
[ 2 1 3 2] are affected by one does
[ 3 2 1 -2] element of x? affect ?
A: affects the
whole row

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 141 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Q: What parts of y Q: How much [ -8 1 4 6 ]
[ 2 1 3 2] are affected by one does
[ 3 2 1 -2] element of x? affect ?
A: affects the A:
whole row

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 142 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] Q: What parts of y Q: How much [ -8 1 4 6 ]
[ 2 1 3 2] are affected by one does
[ 3 2 1 -2] element of x? affect ?
A: affects the A:
[N×D] [N×M] [M×D] whole row

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 143 April 13, 2023
Backprop with Matrices y: [N×M]
[13 9 -2 -6 ]
x: [N×D] Matrix Multiply [ 5 2 17 1 ]
[ 2 1 -3 ]
[ -3 4 2 ] dL/dy: [N×M]
w: [D×M] [ 2 3 -3 9 ]
[ 3 2 1 -1] [ -8 1 4 6 ]
[ 2 1 3 2] By similar logic:
[ 3 2 1 -2]

[N×D] [N×M] [M×D] [D×M] [D×N] [N×M] These formulas are

easy to remember: they
are the only way to
make shapes match up!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 144 April 13, 2023
Summary for today:

● (Fully-connected) Neural Networks are stacks of linear functions and nonlinear

activation functions; they have much more representational power than linear
classifiers
● backpropagation = recursive application of the chain rule along a computational
graph to compute the gradients of all inputs/parameters/intermediates
● implementations maintain a graph structure, where the nodes implement the
forward() / backward() API
● forward: compute result of an operation and save any intermediates needed for
gradient computation in memory
● backward: apply the chain rule to compute the gradient of the loss function with
respect to the inputs

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 145 April 13, 2023
Next Time: Convolutional Neural Networks!

Fei-Fei Li, Yunzhu Li, Ruohan Gao Lecture 4 - 146 April 13, 2023

Lecture 4 PDF
No ratings yet
Lecture 4 PDF
169 pages
Lecture 5 Fall 2024
No ratings yet
Lecture 5 Fall 2024
49 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
Lecture 5
No ratings yet
Lecture 5
114 pages
Lecture 11
No ratings yet
Lecture 11
130 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
CS 236 Section 3
No ratings yet
CS 236 Section 3
59 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
winter1516_lecture52
No ratings yet
winter1516_lecture52
20 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
DLAI4 Revision
No ratings yet
DLAI4 Revision
6 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
lecture_7
No ratings yet
lecture_7
118 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
winter1516_lecture51
No ratings yet
winter1516_lecture51
20 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
No ratings yet
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
76 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Lecture 3
No ratings yet
Lecture 3
105 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
009-Neural_Networks-Complete
No ratings yet
009-Neural_Networks-Complete
61 pages
DNN Ho
No ratings yet
DNN Ho
8 pages
2021 Pho1 15 Neural Networks Part1
No ratings yet
2021 Pho1 15 Neural Networks Part1
77 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Lecture 1 2 Ruohan
No ratings yet
Lecture 1 2 Ruohan
35 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Day 4. Deep Neural Networks
No ratings yet
Day 4. Deep Neural Networks
44 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
IV Ai & Ds Al3451 Ml Unit4 Qb
No ratings yet
IV Ai & Ds Al3451 Ml Unit4 Qb
6 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
LLM for Maths People
No ratings yet
LLM for Maths People
53 pages
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
No ratings yet
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
57 pages
winter1516_lecture53
No ratings yet
winter1516_lecture53
20 pages
Chapter 5 - Neural Networks
No ratings yet
Chapter 5 - Neural Networks
52 pages
Lecture6 Neural Network Basics v1.1
No ratings yet
Lecture6 Neural Network Basics v1.1
40 pages
thi-giac-may-tinh_vo-quang-hoang-khang_cs231n_2017_lecture5---convolutional-neural-networks - [cuuduongthancong.com]
No ratings yet
thi-giac-may-tinh_vo-quang-hoang-khang_cs231n_2017_lecture5---convolutional-neural-networks - [cuuduongthancong.com]
78 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
DL_EXP-3_16010422230
No ratings yet
DL_EXP-3_16010422230
9 pages
Lecture 2
No ratings yet
Lecture 2
101 pages
L2 - UCLxDeepMind DL2020
No ratings yet
L2 - UCLxDeepMind DL2020
104 pages
Unit - 3 DL
No ratings yet
Unit - 3 DL
17 pages
Deep Neural Network AIML Handout v1.0-1
No ratings yet
Deep Neural Network AIML Handout v1.0-1
8 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Introduction to Deep Learning
From Everand
Introduction to Deep Learning
Eugene Charniak
No ratings yet
Cluster-Based Grid Computing On Wireless Network Data Transmission With Routing Analysis Protocol and Deep Learning
No ratings yet
Cluster-Based Grid Computing On Wireless Network Data Transmission With Routing Analysis Protocol and Deep Learning
18 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
8 pages
Joint Inversion of Seismic Refraction and Resistivity Data Using Layered Models - Applications To Groundwater Investigation
No ratings yet
Joint Inversion of Seismic Refraction and Resistivity Data Using Layered Models - Applications To Groundwater Investigation
13 pages
Conferene Guide
No ratings yet
Conferene Guide
15 pages
URetinex-Net Retinex-Based Deep Unfolding Network For Low-Light Image Enhancement
No ratings yet
URetinex-Net Retinex-Based Deep Unfolding Network For Low-Light Image Enhancement
10 pages
RBF Network XOR Problem
No ratings yet
RBF Network XOR Problem
16 pages
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
No ratings yet
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
17 pages
[FREE PDF sample] Deep Learning and Medical Applications Mathematics in Industry 40 Jin Keun Seo (Editor) ebooks
100% (3)
[FREE PDF sample] Deep Learning and Medical Applications Mathematics in Industry 40 Jin Keun Seo (Editor) ebooks
65 pages
EPGP in Data Science Gen AI PDF
No ratings yet
EPGP in Data Science Gen AI PDF
63 pages
SMC21 2021 SMC Paper Final New
No ratings yet
SMC21 2021 SMC Paper Final New
8 pages
Curriculum of Ms in Avionics Engineering
No ratings yet
Curriculum of Ms in Avionics Engineering
34 pages
Chapter 08
100% (2)
Chapter 08
202 pages
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
No ratings yet
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
47 pages
pptml
No ratings yet
pptml
16 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
Data Science
No ratings yet
Data Science
64 pages
CS 229, Summer 2019 Problem Set #2 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #2 Solutions
18 pages
H13 311 Enu V8.02
No ratings yet
H13 311 Enu V8.02
30 pages
PDF Optimization Methods in Electromagnetic Radiation Springer Monographs in Mathematics Angell Thomas S Kirsch Andreas download
100% (5)
PDF Optimization Methods in Electromagnetic Radiation Springer Monographs in Mathematics Angell Thomas S Kirsch Andreas download
55 pages
CS7015 (Deep Learning) : Lecture 8
No ratings yet
CS7015 (Deep Learning) : Lecture 8
86 pages
Wavelet Methods For Continuous-Time Prediction Using Hilbert-Valued Autoregressive Processes
No ratings yet
Wavelet Methods For Continuous-Time Prediction Using Hilbert-Valued Autoregressive Processes
26 pages
Modeling of Species Distributions With Maxent: New Extensions and A Comprehensive Evaluation
No ratings yet
Modeling of Species Distributions With Maxent: New Extensions and A Comprehensive Evaluation
15 pages
Data+Science+in+Python+-+Regression
No ratings yet
Data+Science+in+Python+-+Regression
234 pages
SocBiz-Winter Analytics Resources
No ratings yet
SocBiz-Winter Analytics Resources
7 pages
Deep Learning - IIT Ropar - - Unit 10 - Week 7
No ratings yet
Deep Learning - IIT Ropar - - Unit 10 - Week 7
4 pages
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
100% (3)
Instant Download Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little PDF All Chapter
74 pages
Msds iuFUXPCU
No ratings yet
Msds iuFUXPCU
47 pages
Smart Used Car Price Prediction: Somesh Alkanthi Vishwakarma Institute of Technology, Pune
No ratings yet
Smart Used Car Price Prediction: Somesh Alkanthi Vishwakarma Institute of Technology, Pune
6 pages
Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
No ratings yet
Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
9 pages
2019-Liu-Machine Learning For Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures
No ratings yet
2019-Liu-Machine Learning For Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures
8 pages

Lecture 4

Uploaded by

Lecture 4

Uploaded by

Lecture 4:

Neural Networks and

AWS credit: create an account, submit the number ID using

Assignment 1 due Fri 4/21 at 11:59pm

Due Mon 4/24

TA expertise are posted on the webpage.

Discussion section tomorrow:

Landscape image is CC0 1.0 public domain

Numerical gradient: slow :(, approximate :(, easy to write :)

In practice: Derive analytic gradient, check your

: Initial learning rate

Image source: Sam Altman, https://openai.com/dall-e-2/, https://twitter.com/sama/status/1511724264629678084

Image source: https://openai.com/research/gpt-4

Kirillov et al., Segment Anything, 2023

(Before) Linear score function:

(Before) Linear score function:

Cannot separate red

f(x, y) = (r(x, y), θ(x, y))

Cannot separate red After applying feature

(Before) Linear score function:

(Before) Linear score function:

Learn 100 templates instead of 10. Share templates between classes

The function is called the activation function.

The function is called the activation function.

A: We end up with a linear classifier again!

Sigmoid Leaky ReLU

“3-layer Neural Net”, or

Define the network

Define the network

Define the network

Calculate the analytical gradients

Define the network

Calculate the analytical gradients

more neurons = more capacity

(Web demo with ConvNetJS:

Impulses carried away

Impulses carried away

Impulses carried away

sigmoid activation function

Impulses carried away

This image is CC0 Public Domain

This image is CC0 Public Domain

[Dendritic Computation. London and Hausser]

Total loss: data loss + regularization

Total loss: data loss + regularization

If we can compute then we can learn W1 and W2

Figure copyright Alex Krizhevsky, Ilya Sutskever, and

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

[upstream gradient] x [local gradient]

[upstream gradient] x [local gradient]

[upstream gradient] x [local gradient]

[upstream gradient] x [local gradient]

copy gate: gradient adder

copy gate: gradient adder max gate: gradient router

What about vector-valued functions?

Regular derivative: Derivative is Gradient:

If x changes by a For each element of x,

Regular derivative: Derivative is Gradient: Derivative is Jacobian:

If x changes by a For each element of x, For each element of x, if it

Loss L still a scalar!

Jacobian dz/dx 4D dL/dz:

[dz/dx] [dL/dz] 4D dL/dz:

4D dL/dx: [dz/dx] [dL/dz] 4D dL/dz:

dL/dx always has the

dL/dx always has the

dL/dx always has the

dL/dx always has the

Also see derivation in the course notes:

[N×D] [N×M] [M×D] [D×M] [D×N] [N×M] These formulas are

● (Fully-connected) Neural Networks are stacks of linear functions and nonlinear

You might also like