Lecture 2 PDF
Lecture 2 PDF
Image Classification
A Core Task in Computer Vision
- K-Nearest Neighbor
- Linear classifiers: SVM, Softmax
- Two-layer neural network
- Image features
Find your teammates on Piazza (the pinned “Search for Teammates” post)
“Is X a valid project for 231n?” --- Piazza private post / TA Office Hours
Today:
● The image classification task
● Two basic data-driven approaches to image classification
○ K-nearest neighbor and linear classifier
cat
This image is CC0 1.0 public domain This image is CC0 1.0 public domain
This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain
This image by Umberto Salvagnin This image by sare bear is This image by Tom Thai is
This image by Umberto Salvagnin
is licensed under CC-BY 2.0 licensed under CC-BY 2.0 licensed under CC-BY 2.0
is licensed under CC-BY 2.0
?
John Canny, “A Computational Approach to Edge Detection”, IEEE TPAMI 1986
Memorize all
data and labels
query data
Distance Metric
L1 distance:
add
Q: With N examples,
how fast are training
and prediction?
A good implementation:
https://github.com/facebookresearch/faiss
1-nearest neighbor
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 27 April 1, 2021
K-Nearest Neighbors
Instead of copying label from nearest neighbor,
take majority vote from K closest points
K=1 K=1
http://vision.stanford.edu/teaching/cs231n-demos/knn/
Very problem/dataset-dependent.
Must try them all out and see what works best.
train
train
train
train
Never do this!
train
Useful for small datasets, but not used too frequently in deep learning
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
Original image is
CC0 public domain
Dimensions = 2
Points = 42
Dimensions = 1
Points = 4
Image
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx + b 10x1
10x1 10x3072
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Linear
classifiers
Construction worker in
Man in black shirt
orange safety vest is
is playing guitar. Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015
working on road. Figures copyright IEEE, 2015. Reproduced for educational purposes.
56
56 231
231
24 2
24
Input image
2
56
0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score
56 231
231
24 2
1.5 1.3 2.1 0.0
24
+ 3.2
= 437.9 Dog score
f(x,W) = Wx + b
Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0