0% found this document useful (0 votes)
42 views98 pages

Lecture 2

Uploaded by

abczyxpqr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views98 pages

Lecture 2

Uploaded by

abczyxpqr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Lecture 2:

Image Classification with Linear Classifiers

Fei-Fei Li, Ehsan Adeli Lecture 2 - 1 April 4, 2023


Administrative: Assignment 1
Out tomorrow, Due 4/19 11:59pm

- K-Nearest Neighbor
- Linear classifiers: SVM, Softmax
- Two-layer neural network
- Image features

Fei-Fei Li, Ehsan Adeli Lecture 2 - 2 April 4, 2023


Administrative: Course Project

Project proposal due 4/22 (Monday) 11:59pm

Contact us on Ed, each project team will have a TA assigned to them for future
questions

your assigned TA for initial guidance (Canvas -> People -> Groups)

Use Google Form to find project partners (will be posted later today)

“Is X a valid project for 231n?” --- Ed private post / TA Office Hours

More info on the website

Fei-Fei Li, Ehsan Adeli Lecture 2 - 3 April 4, 2023


Administrative: Discussion Sections

This Friday 12:30pm-1:20 pm, in person at NVIDIA Auditorium, remote on Zoom


(recording will be made available)

Python / Numpy, Google Colab

Presenter: Chengshu (Eric) Li (TA)

Fei-Fei Li, Ehsan Adeli Lecture 2 - 4 April 4, 2023


Syllabus

Deep Learning Basics Convolutional Neural Networks Computer Vision Applications

Data-driven approaches Convolutions RNNs / Attention / Transformers


Linear classification & kNN PyTorch / TensorFlow Image captioning
Loss functions Activation functions Object detection and segmentation
Optimization Batch normalization Style transfer
Backpropagation Transfer learning Video understanding
Multi-layer perceptrons Data augmentation Generative models
Neural Networks Momentum / RMSProp / Adam Self-supervised learning
Architecture design Vision and Language
3D vision
Robot learning
Human-centered AI
Fairness & ethics

Fei-Fei Li, Ehsan Adeli Lecture 2 - 5 April 4, 2023


Image Classification
A Core Task in Computer Vision

Today:
● The image classification task
● Two basic data-driven approaches to image classification
○ K-nearest neighbor and linear classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 6 April 4, 2023


Image Classification: A core task in Computer Vision

(assume given a set of possible labels)


{dog, cat, truck, plane, ...}

cat

This image by Nikita is


licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 2 - 7 April 4, 2023


The Problem: Semantic Gap

What the computer sees

An image is a tensor of integers


between [0, 255]:
This image by Nikita is
licensed under CC-BY 2.0 e.g. 800 x 600 x 3
(3 channels RGB)

Fei-Fei Li, Ehsan Adeli Lecture 2 - 8 April 4, 2023


Challenges: Viewpoint variation

All pixels change when


the camera moves!

This image by Nikita is


licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 2 - 9 April 4, 2023


Challenges: Illumination

This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 10 April 4, 2023


Challenges: Background Clutter

This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 11 April 4, 2023


Challenges: Occlusion

This image by jonsson is licensed


This image is CC0 1.0 public domain This image is CC0 1.0 public domain
under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 2 - 12 April 4, 2023


Challenges: Deformation

This image by Umberto Salvagnin is This image by sare bear is This image by Tom Thai is licensed
This image by Umberto Salvagnin is
licensed under CC-BY 2.0 licensed under CC-BY 2.0 under CC-BY 2.0
licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 2 - 13 April 4, 2023


Challenges: Intraclass variation

This image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 14 April 4, 2023


Challenges: Context

Image source: https://www.linkedin.com/posts/ralph-aboujaoude-diaz-40838313_technology-artificialintelligence-computervision-activity-


6912446088364875776-h-Iq?utm_source=linkedin_share&utm_medium=member_desktop_web

Fei-Fei Li, Ehsan Adeli Lecture 2 - 15 April 4, 2023


Modern computer vision algorithms

This image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 16 April 4, 2023


An image classifier

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for


recognizing a cat, or other classes.

Fei-Fei Li, Ehsan Adeli Lecture 2 - 17 April 4, 2023


Attempts have been made

Find edges Find corners

?
John Canny, “A Computational Approach to Edge Detection”, IEEE TPAMI 1986

Fei-Fei Li, Ehsan Adeli Lecture 2 - 18 April 4, 2023


Machine Learning: Data-Driven Approach
1. Collect a dataset of images and labels
2. Use Machine Learning algorithms to train a classifier
3. Evaluate the classifier on new images
Example training set

Fei-Fei Li, Ehsan Adeli Lecture 2 - 19 April 4, 2023


Nearest Neighbor Classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 20 April 4, 2023


First classifier: Nearest Neighbor

Memorize all data


and labels

Predict the label of


the most similar
training image

Fei-Fei Li, Ehsan Adeli Lecture 2 - 21 April 4, 2023


First classifier: Nearest Neighbor
?

deer bird plane cat car

Training data with labels

query data

Distance Metric

Fei-Fei Li, Ehsan Adeli Lecture 2 - 22 April 4, 2023


Distance Metric to compare images

L1 distance:

add

Fei-Fei Li, Ehsan Adeli Lecture 2 - 23 April 4, 2023


Nearest Neighbor classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 24 April 4, 2023


Nearest Neighbor classifier

Memorize training data

Fei-Fei Li, Ehsan Adeli Lecture 2 - 25 April 4, 2023


Nearest Neighbor classifier

For each test image:


Find closest train image
Predict label of nearest image

Fei-Fei Li, Ehsan Adeli Lecture 2 - 26 April 4, 2023


Nearest Neighbor classifier

Q: With N examples, how


fast are training and
prediction?

Ans: Train O(1),


predict O(N)

This is bad: we want


classifiers that are fast at
prediction; slow for
training is ok

Fei-Fei Li, Ehsan Adeli Lecture 2 - 27 April 4, 2023


Nearest Neighbor classifier

Many methods exist for fast /


approximate nearest
neighbor (beyond the scope
of 231N!)

A good implementation:
https://github.com/facebookresearch/faiss

Johnson et al, “Billion-scale similarity search with


GPUs”, arXiv 2017

Fei-Fei Li, Ehsan Adeli Lecture 2 - 28 April 4, 2023


What does this look like?

1-nearest neighbor
Fei-Fei Li, Ehsan Adeli Lecture 2 - 29 April 4, 2023
K-Nearest Neighbors
Instead of copying label from nearest neighbor,
take majority vote from K closest points

K=1 K=3 K=5

Fei-Fei Li, Ehsan Adeli Lecture 2 - 30 April 4, 2023


K-Nearest Neighbors: Distance Metric

L1 (Manhattan) distance L2 (Euclidean) distance

Fei-Fei Li, Ehsan Adeli Lecture 2 - 31 April 4, 2023


K-Nearest Neighbors: Distance Metric

L1 (Manhattan) distance L2 (Euclidean) distance

K=1 K=1

Fei-Fei Li, Ehsan Adeli Lecture 2 - 32 April 4, 2023


K-Nearest Neighbors: try it yourself!

http://vision.stanford.edu/teaching/cs231n-demos/knn/

Fei-Fei Li, Ehsan Adeli Lecture 2 - 33 April 4, 2023


Hyperparameters

What is the best value of k to use?


What is the best distance to use?

These are hyperparameters: choices about the


algorithms themselves.

Very problem/dataset-dependent.
Must try them all out and see what works best.

Fei-Fei Li, Ehsan Adeli Lecture 2 - 34 April 4, 2023


Setting Hyperparameters
Idea #1: Choose hyperparameters that
work best on the training data

train

Fei-Fei Li, Ehsan Adeli Lecture 2 - 35 April 4, 2023


Setting Hyperparameters
Idea #1: Choose hyperparameters that BAD: K = 1 always works
work best on the training data perfectly on training data

train

Fei-Fei Li, Ehsan Adeli Lecture 2 - 36 April 4, 2023


Setting Hyperparameters
Idea #1: Choose hyperparameters that BAD: K = 1 always works
work best on the training data perfectly on training data

train

Idea #2: choose hyperparameters


that work best on test data
train test

Fei-Fei Li, Ehsan Adeli Lecture 2 - 37 April 4, 2023


Setting Hyperparameters
Idea #1: Choose hyperparameters that BAD: K = 1 always works
work best on the training data perfectly on training data

train

Idea #2: choose hyperparameters BAD: No idea how algorithm


that work best on test data will perform on new data
train test

Never do this!

Fei-Fei Li, Ehsan Adeli Lecture 2 - 38 April 4, 2023


Setting Hyperparameters
Idea #1: Choose hyperparameters that BAD: K = 1 always works
work best on the training data perfectly on training data

train

Idea #2: choose hyperparameters BAD: No idea how algorithm


that work best on test data will perform on new data
train test

Idea #3: Split data into train, val; choose Better!


hyperparameters on val and evaluate on test
train validation test

Fei-Fei Li, Ehsan Adeli Lecture 2 - 39 April 4, 2023


Setting Hyperparameters
train

Idea #4: Cross-Validation: Split data into folds, try


each fold as validation and average the results

fold 1 fold 2 fold 3 fold 4 fold 5 test


fold 1 fold 2 fold 3 fold 4 fold 5 test
fold 1 fold 2 fold 3 fold 4 fold 5 test
fold 1 fold 2 fold 3 fold 4 fold 5 test
fold 1 fold 2 fold 3 fold 4 fold 5 test

Useful for small datasets, but not used too frequently in deep learning
Fei-Fei Li, Ehsan Adeli Lecture 2 - 40 April 4, 2023
Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

Fei-Fei Li, Ehsan Adeli Lecture 2 - 41 April 4, 2023


Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images Test images and nearest neighbors

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

Fei-Fei Li, Ehsan Adeli Lecture 2 - 42 April 4, 2023


Setting Hyperparameters Example of
5-fold cross-validation
for the value of k.

Each point: single


outcome.

The line goes


through the mean, bars
indicated standard
deviation

(Seems that k ~= 7 works best


for this data)

Fei-Fei Li, Ehsan Adeli Lecture 2 - 43 April 4, 2023


What does this look like?

Fei-Fei Li, Ehsan Adeli Lecture 2 - 44 April 4, 2023


What does this look like?

Fei-Fei Li, Ehsan Adeli Lecture 2 - 45 April 4, 2023


k-Nearest Neighbor with pixel distance never used.

- Distance metrics on pixels are not informative


Original image is CC0
public domain
Original Occluded Shifted (1 pixel) Tinted

(All three images on the right have the same pixel distances to the one on the left)

Fei-Fei Li, Ehsan Adeli Lecture 2 - 46 April 4, 2023


K-Nearest Neighbors: Summary
In image classification we start with a training set of images and labels, and must
predict labels on the test set

The K-Nearest Neighbors classifier predicts labels based on the K nearest training
examples

Distance metric and K are hyperparameters

Choose hyperparameters using the validation set

Only run on the test set once at the very end!

Fei-Fei Li, Ehsan Adeli Lecture 2 - 48 April 4, 2023


Linear Classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 49 April 4, 2023


Parametric Approach

Image

10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ehsan Adeli Lecture 2 - 50 April 4, 2023


Parametric Approach: Linear Classifier

Image
f(x,W). = W x
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ehsan Adeli Lecture 2 - 51 April 4, 2023


Parametric Approach: Linear Classifier
3072x1

Image
f(x,W) = W x
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ehsan Adeli Lecture 2 - 52 April 4, 2023


Parametric Approach: Linear Classifier
3072x1

Image
f(x,W) = W x + b 10x1
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ehsan Adeli Lecture 2 - 53 April 4, 2023


Neural Network

Linear
classifiers

This image is CC0 1.0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 54 April 4, 2023


[Krizhevsky et al. 2012] Linear layers

[He et al. 2015]

Fei-Fei Li, Ehsan Adeli Lecture 2 - 55 April 4, 2023


Recall CIFAR10

50,000 training images


each image is 32x32x3

10,000 test images.

Fei-Fei Li, Ehsan Adeli Lecture 2 - 56 April 4, 2023


Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Flatten tensors into a vector

56

56 231
231

24 2
24

Input image
2

Fei-Fei Li, Ehsan Adeli Lecture 2 - 57 April 4, 2023


Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Algebraic Viewpoint
Flatten tensors into a vector

56
0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score
56 231
231

24 2
1.5 1.3 2.1 0.0
24
+ 3.2
= 437.9 Dog score

0 0.25 0.2 -0.3 -1.2 61.95 Ship score


Input image
2
W b
Fei-Fei Li, Ehsan Adeli Lecture 2 - 58 April 4, 2023
Interpreting a Linear Classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 59 April 4, 2023


Interpreting a Linear Classifier: Visual Viewpoint

Fei-Fei Li, Ehsan Adeli Lecture 2 - 60 April 4, 2023


Interpreting a Linear Classifier: Geometric Viewpoint

f(x,W) = Wx + b

Array of 32x32x3 numbers


(3072 numbers total)

Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 2 - 61 April 4, 2023


Hard cases for a linear classifier
Class 1: Class 1: Class 1:
First and third quadrants 1 <= L2 norm <= 2 Three modes

Class 2: Class 2: Class 2:


Second and fourth quadrants Everything else Everything else

Fei-Fei Li, Ehsan Adeli Lecture 2 - 62 April 4, 2023


Linear Classifier – Choose a good W

1. Define a loss function that quantifies


our unhappiness with the scores
across the training data.
2. Come up with a way of efficiently
finding the parameters that minimize
the loss function. (optimization)

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

Fei-Fei Li, Ehsan Adeli Lecture 2 - 63 April 4, 2023


Suppose: 3 training examples, 3 classes.
With some W the scores are:

cat 3.2 1.3 2.2


car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 - 64 April 4, 2023


Suppose: 3 training examples, 3 classes. A loss function tells how good
With some W the scores are: our current classifier is

cat 3.2 1.3 2.2


car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -65 April 4, 2023


Suppose: 3 training examples, 3 classes. A loss function tells how good
With some W the scores are: our current classifier is

Given a dataset of examples

Where is image and


cat 3.2 1.3 2.2 is (integer) label

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -66 April 4, 2023


Suppose: 3 training examples, 3 classes. A loss function tells how good
With some W the scores are: our current classifier is

Given a dataset of examples

Where is image and


cat 3.2 1.3 2.2 is (integer) label

car 5.1 4.9 2.5 Loss over the dataset is a average


of loss over examples:
frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -67 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -68 April 4, 2023


Suppose: 3 training examples, 3 classes. Interpreting Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
Loss
where is the (integer) label,

and using the shorthand for the scores


vector:
difference in
scores between
correct and
the SVM loss has the form:
cat 3.2 1.3 2.2 incorrect class

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -69 April 4, 2023


Suppose: 3 training examples, 3 classes. Interpreting Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
Loss
where is the (integer) label,

and using the shorthand for the scores


vector:
difference in
scores between
correct and
the SVM loss has the form:
cat 3.2 1.3 2.2 incorrect class

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -70 April 4, 2023


Suppose: 3 training examples, 3 classes. Interpreting Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
Loss
where is the (integer) label,

and using the shorthand for the scores


vector:
difference in
scores between
correct and
the SVM loss has the form:
cat 3.2 1.3 2.2 incorrect class

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -71 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Fei-Fei Li, Ehsan Adeli Lecture 2 -72 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5 = max(0, 5.1 - 3.2 + 1)
+max(0, -1.7 - 3.2 + 1)
frog -1.7 2.0 -3.1 = max(0, 2.9) + max(0, -3.9)
= 2.9 + 0
Losses: 2.9 = 2.9

Fei-Fei Li, Ehsan Adeli Lecture 2 -73 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5 = max(0, 1.3 - 4.9 + 1)
+max(0, 2.0 - 4.9 + 1)
frog -1.7 2.0 -3.1 = max(0, -2.6) + max(0, -1.9)
=0+0
Losses: 2.9 0 =0

Fei-Fei Li, Ehsan Adeli Lecture 2 -74 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5 = max(0, 2.2 - (-3.1) + 1)
+max(0, 2.5 - (-3.1) + 1)
frog -1.7 2.0 -3.1 = max(0, 6.3) + max(0, 6.6)
= 6.3 + 6.6
Losses: 2.9 0 12.9 = 12.9

Fei-Fei Li, Ehsan Adeli Lecture 2 -75 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5 Loss over full dataset is average:

frog -1.7 2.0 -3.1


Losses: 2.9 0 12.9 L = (2.9 + 0 + 12.9)/3
= 5.27
Fei-Fei Li, Ehsan Adeli Lecture 2 -76 April 4, 2023
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:

Q1: What happens to loss if car


scores decrease by 0.5 for this
training example?

Q2: what is the min/max possible


cat 1.3 SVM loss Li?
car 4.9
Q3: At initialization W is small so all
frog 2.0 s ≈ 0. What is the loss Li, assuming N
Losses: 0 examples and C classes?

Fei-Fei Li, Ehsan Adeli Lecture 2 -77 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
Q4: What if the sum
frog -1.7 2.0 -3.1 was over all classes?
Losses: 2.9 0 12.9 (including j = y_i)

Fei-Fei Li, Ehsan Adeli Lecture 2 -78 April 4, 2023


Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
Q5: What if we used
frog -1.7 2.0 -3.1 mean instead of sum?
Losses: 2.9 0 12.9
Fei-Fei Li, Ehsan Adeli Lecture 2 -79 April 4, 2023
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,

and using the shorthand for the scores


vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
Q6: What if we used
frog -1.7 2.0 -3.1
Losses: 2.9 0 12.9
Fei-Fei Li, Ehsan Adeli Lecture 2 -80 April 4, 2023
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
Loss
where is the (integer) label,

and using the shorthand for the scores


vector:
difference in
scores between
correct and
the SVM loss has the form:
cat 3.2 1.3 2.2 incorrect class

car 5.1 4.9 2.5


Q6: What if we used
frog -1.7 2.0 -3.1
Losses: 2.9 0 12.9
Fei-Fei Li, Ehsan Adeli Lecture 2 -81 April 4, 2023
Multiclass SVM Loss: Example code

# First calculate scores


# Then calculate the margins sj - syi + 1
# only sum j is not yi, so when j = yi, set to zero.
# sum across all j

Fei-Fei Li, Ehsan Adeli Lecture 2 - 82 April 4, 2023


Softmax classifier

Fei-Fei Li, Ehsan Adeli Lecture 2 - 83 April 4, 2023


Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities

cat 3.2
car 5.1
frog -1.7

Fei-Fei Li, Ehsan Adeli Lecture 2 - 84 April 4, 2023


Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function

cat 3.2
car 5.1
frog -1.7

Fei-Fei Li, Ehsan Adeli Lecture 2 - 85 April 4, 2023


Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities
must be >= 0

cat 3.2 24.5


exp
car 5.1 164.0
frog -1.7 0.18
unnormalized
probabilities
Fei-Fei Li, Ehsan Adeli Lecture 2 - 86 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13


exp normalize
car 5.1 164.0 0.87
frog -1.7 0.18 0.00
unnormalized probabilities
probabilities
Fei-Fei Li, Ehsan Adeli Lecture 2 - 87 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13


exp normalize
car 5.1 164.0 0.87
frog -1.7 0.18 0.00
Unnormalized log- unnormalized probabilities
probabilities / logits probabilities
Fei-Fei Li, Ehsan Adeli Lecture 2 - 88 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13 Li = -log(0.13)


exp normalize = 2.04
car 5.1 164.0 0.87
frog -1.7 0.18 0.00
Unnormalized log- unnormalized probabilities
probabilities / logits probabilities
Fei-Fei Li, Ehsan Adeli Lecture 2 - 89 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13 compare


1.00
exp normalize
car 5.1 164.0 0.87 0.00
frog -1.7 0.18 0.00 0.00
Unnormalized log- unnormalized probabilities Correct
probabilities / logits probabilities probs
Fei-Fei Li, Ehsan Adeli Lecture 2 - 91 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13 compare


1.00
exp normalize Kullback–Leibler
car 5.1 164.0 0.87 divergence 0.00
frog -1.7 0.18 0.00 0.00
Unnormalized log- unnormalized probabilities Correct
probabilities / logits probabilities probs
Fei-Fei Li, Ehsan Adeli Lecture 2 - 92 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function
Probabilities Probabilities
must be >= 0 must sum to 1

cat 3.2 24.5 0.13 compare


1.00
exp normalize
car 5.1 164.0 0.87 Cross Entropy 0.00
frog -1.7 0.18 0.00 0.00
Unnormalized log- unnormalized probabilities Correct
probabilities / logits probabilities probs
Fei-Fei Li, Ehsan Adeli Lecture 2 - 93 April 4, 2023
Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function

Maximize probability of correct class Putting it all together:

cat 3.2
car 5.1
frog -1.7

Fei-Fei Li, Ehsan Adeli Lecture 2 - 94 April 4, 2023


Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
Softmax
Function

Maximize probability of correct class Putting it all together:

cat 3.2
Q1: What is the min/max possible softmax loss Li?
car 5.1
frog -1.7 Q2: At initialization all sj will be approximately equal;
what is the softmax loss Li, assuming C classes?

Fei-Fei Li, Ehsan Adeli Lecture 2 - 95 April 4, 2023


Softmax vs. SVM

Fei-Fei Li, Ehsan Adeli Lecture 2 - 97 April 4, 2023


Softmax vs. SVM

Fei-Fei Li, Ehsan Adeli Lecture 2 - 98 April 4, 2023


Softmax vs. SVM

assume scores: Q: What is the softmax loss and the


SVM loss?
[10, -2, 3]
[10, 9, 9]
[10, -100, -100]
and
Fei-Fei Li, Ehsan Adeli Lecture 2 - 99 April 4, 2023
Softmax vs. SVM

assume scores: Q: What is the softmax loss and the


SVM loss if I double the correct
[20, -2, 3] class score from 10 -> 20?
[20, 9, 9]
[20, -100, -100]
and
Fei-Fei Li, Ehsan Adeli Lecture 2 - 100 April 4, 2023
Coming up: f(x,W) = Wx + b

- Regularization
- Optimization

Fei-Fei Li, Ehsan Adeli Lecture 2 - 101 April 4, 2023

You might also like