Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Overview
• Basic recognition tasks
• A statistical learning approach
• Traditional or “shallow” recognition pipeline
• Bags of features
• Classifiers
• Next time: neural networks and “deep”
recognition pipeline
Common recognition tasks
Image classification
• outdoor/indoor
•
city/forest/factory/etc.
Image tagging
• street
• people
• building
•
mountain
• …
Object detection
• find pedestrians
Activity recognition
• walking
• shopping
• rolling a cart
• sitting
• talking
• …
Image parsing
sky
mountain
building
tree
building
banner
street lamp
market
people
Image description
This is a busy street in an Asian city.
Mountains and a large palace or
fortress loom in the background. In the
foreground, we see colorful souvenir
stalls and people walking around and
shopping. One person in the lower left
is pushing an empty cart, and a couple
of people in the middle are sitting,
possibly posing for a photograph.
Image classification
The statistical learning
framework
• Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
The statistical learning
framework
y = f(x)
output prediction Image
function feature
Learned
model
Testing
Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Traditional recognition pipeline
Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction
Extracted descriptors
from the training set
Clustering
Clustering
cluster k
Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features
assigned to it
Example visual vocabulary
…
Appearance codebook
Source: B. Leibe
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)
level 0
level 0 level 1
Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction
Classifiers: Nearest neighbor
Test Training
Training examples
examples example
from class 2
from class 1
k=5
K-nearest neighbor classifier
Which separator
is best?
Support vector machines
• Find hyperplane that maximizes the margin
between the positive and negative examples
x i positive ( yi 1) : xi w b 1
x i negative ( yi 1) : x i w b 1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
x i positive ( yi 1) : xi w b 1
x i negative ( yi 1) : x i w b 1
Quadratic optimization problem:
1 2
min w subject to yi ( w x i b ) 1
w ,b 2
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
SVM parameter learning
1 2
• Separable data: min w subject to yi ( w x i b ) 1
w ,b 2
• Non-separable data:
n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1
n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1
+1
Margin 0
-1
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can
always be mapped to some higher-dimensional
feature space where the training set is separable
Φ: x → φ(x)
Image source
Nonlinear SVMs
• Linearly separable dataset in 1D:
0 x
0 x
w x b i i yi x i x b
learned Support
weight vector
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
The kernel trick
• Linear SVM decision function:
w x b i i yi x i x b
y ( x ) ( x) b y K ( x , x) b
i
i i i
i
i i i
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Polynomial kernel: K (x, y ) (c x y )
d
Gaussian kernel
• Also known as the radial basis function (RBF)
kernel:
1 2
K (x, y ) exp 2 x y
K(x, y)
||x – y||
Gaussian kernel
SV’s
Kernels for histograms
• Histogram intersection:
N
K(h1, h2 ) =å min(h1 (i), h2 (i))
i=1
N
K(h1, h2 ) =å h1 (i)h2 (i)
i=1
SVMs: Pros and cons
• Pros
• Kernel-based framework is very powerful, flexible
• Training is convex optimization, globally optimal solution can
be found
• Amenable to theoretical analysis
• SVMs work very well in practice, even with very small
training sample sizes
• Cons
• No “direct” multi-class SVM, must combine two-class SVMs
(e.g., with one-vs-others)
• Computation, memory (esp. for nonlinear SVMs)
Generalization
• Generalization refers to the ability to correctly
classify never before seen examples
• Can be controlled by turning “knobs” that affect
the complexity of the model
Underfitting Overfitting
Error
Test error
Training error
Figure source
Effect of training set size
Source: D. Hoiem
Validation
• Split the data into training, validation, and test subsets
• Use training set to optimize model parameters
• Use validation test to choose the best model
• Use test set only to evaluate performance
Stopping point
Validation
set loss
Model complexity