0% found this document useful (0 votes)

34 views

Lecture 4 PDF

Uploaded by

Yash Sunil Ahirrao

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Lecture 4 PDF

Uploaded by

Yash Sunil Ahirrao

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 169

Lecture 4:

Neural Networks and

Backpropagation

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 1 April 08, 2021

Announcements: Assignment 1

Assignment 1 due Fri 4/16 at 11:59pm

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 2 April 08, 2021

Administrative: Project Proposal

Due Mon 4/19

TA expertise are posted on the webpage.

(http://cs231n.stanford.edu/office_hours.html)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 3 April 08, 2021

Administrative: Discussion Section

Discussion section tomorrow:

Backpropagation

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 4 April 08, 2021

Administrative: Midterm Updates

- Tues, May 4 and is worth 15% of your grade.

- available for 24 hours on Gradescope from May 4, 12PM PDT
to May 5, 11:59 AM PDT.
- 3-hour consecutive timeframe
- Exam will be designed for 1.5 hours.
- Open book and open internet but no collaboration
- Only make private posts during those 24 hours

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 5 April 08, 2021

Recap: from last time

f(x,W) = Wx + b

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 6 April 08, 2021

Recap: loss functions
Linear score function

SVM loss (or softmax)

data loss + regularization

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 7 April 08, 2021

Finding the best W: Optimize with Gradient Descent

Landscape image is CC0 1.0 public domain

Walking man image is CC0 1.0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 8 April 08, 2021

Gradient descent

Numerical gradient: slow :(, approximate :(, easy to write :)

Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your

implementation with numerical gradient

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 9 April 08, 2021

Stochastic Gradient Descent (SGD)
Full sum expensive
when N is large!

Approximate sum
using a minibatch of
examples
32 / 64 / 128 common

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 10 April 08, 2021

What we are going to discuss today!
Linear score function

SVM loss (or softmax)

data loss + regularization

How to find the best W?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 11 April 08, 2021

Problem: Linear Classifiers are not very powerful

Visual Viewpoint Geometric Viewpoint

Linear classifiers learn Linear classifiers

one template per class can only draw linear
decision boundaries

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 12 April 08, 2021

Pixel Features

Class
scores
f(x) = Wx

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 13 April 16, 2020

Image Features

f(x) = Wx
Class
scores
Feature Representation

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 14 April 16, 2020

Image Features: Motivation

Cannot separate red

and blue points with
linear classifier

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 15 April 16, 2020

Image Features: Motivation

y θ

f(x, y) = (r(x, y), θ(x, y))

x r

Cannot separate red After applying feature

and blue points with transform, points can
linear classifier be separated by linear
classifier

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 16 April 16, 2020

Example: Color Histogram

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 17 April 16, 2020

Example: Histogram of Oriented Gradients (HoG)

Divide image into 8x8 pixel regions Example: 320x240 image gets divided
Within each region quantize edge into 40x30 bins; in each bin there are
direction into 9 bins 9 numbers so feature vector has
30*40*9 = 10,800 numbers
Lowe, “Object recognition from local scale-invariant features”, ICCV 1999
Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 18 April 16, 2020

Example: Bag of Words
Step 1: Build codebook

Cluster patches to
Extract random form “codebook”
patches of “visual words”

Step 2: Encode images

Fei-Fei and Perona, “A bayesian hierarchical model for learning natural scene categories”, CVPR 2005

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 19 April 16, 2020

Image Features

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 20 April 16, 2020

Image features vs ConvNets
f
Feature Extraction 10 numbers giving
scores for classes

training

Krizhevsky, Sutskever, and Hinton, “Imagenet classification

with deep convolutional neural networks”, NIPS 2012.
Figure copyright Krizhevsky, Sutskever, and Hinton, 2012.
Reproduced with permission.

10 numbers giving
scores for classes
training

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 21 April 16, 2020

One Solution: Feature Transformation
f(x, y) = (r(x, y), θ(x, y))

Transform data with a cleverly

chosen feature transform f,
then apply linear classifier

Color Histogram Histogram of Oriented Gradients (HoG)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 22 April 08, 2021

Today: Neural Networks

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 23 April 08, 2021

Neural networks: the original linear classifier

(Before) Linear score function:

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 24 April 08, 2021

Neural networks: 2 layers

(Before) Linear score function:

(Now) 2-layer Neural Network

(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 25 April 08, 2021

Neural networks: also called fully connected network

(Before) Linear score function:

(Now) 2-layer Neural Network

“Neural Network” is a very broad term; these are more accurately called
“fully-connected networks” or sometimes “multi-layer perceptrons” (MLP)
(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 26 April 08, 2021

Neural networks: 3 layers

(Before) Linear score function:

(Now) 2-layer Neural Network
or 3-layer Neural Network

(In practice we will usually add a learnable bias at each layer as well)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 27 April 08, 2021

Neural networks: hierarchical computation
(Before) Linear score function:
(Now) 2-layer Neural Network

x W1 h W2 s
3072 100 10

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 28 April 08, 2021

Neural networks: learning 100s of templates
(Before) Linear score function:
(Now) 2-layer Neural Network

x W1 h W2 s
3072 100 10

Learn 100 templates instead of 10. Share templates between classes

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 29 April 08, 2021

Neural networks: why is max operator important?
(Before) Linear score function:
(Now) 2-layer Neural Network

The function is called the activation function.

Q: What if we try to build a neural network without one?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 30 April 08, 2021

Neural networks: why is max operator important?
(Before) Linear score function:
(Now) 2-layer Neural Network

The function is called the activation function.

Q: What if we try to build a neural network without one?

A: We end up with a linear classifier again!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 31 April 08, 2021

Activation functions
Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 32 April 08, 2021

ReLU is a good default
Activation functions choice for most problems

Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - 33 April 08, 2021

Neural networks: Architectures

“3-layer Neural Net”, or

“2-layer Neural Net”, or “2-hidden-layer Neural Net”
“1-hidden-layer Neural Net”
“Fully-connected” layers