0% found this document useful (0 votes)
33 views

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

The document introduces common object recognition tasks like image classification, tagging, detection, and parsing. It then discusses traditional recognition pipelines using hand-designed features and classifiers. The statistical learning framework is presented along with common steps like feature extraction, learning visual vocabularies, and representing images with bags of features. Finally, nearest neighbor, linear, and support vector machine classifiers are covered.

Uploaded by

ikhsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

The document introduces common object recognition tasks like image classification, tagging, detection, and parsing. It then discusses traditional recognition pipelines using hand-designed features and classifiers. The statistical learning framework is presented along with common steps like feature extraction, learning visual vocabularies, and representing images with bags of features. Finally, nearest neighbor, linear, and support vector machine classifiers are covered.

Uploaded by

ikhsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Introduction to object recognition

Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Overview
• Basic recognition tasks
• A statistical learning approach
• Traditional or “shallow” recognition pipeline
• Bags of features
• Classifiers
• Next time: neural networks and “deep”
recognition pipeline
Common recognition tasks
Image classification
• outdoor/indoor

city/forest/factory/etc.
Image tagging
• street
• people
• building

mountain
• …
Object detection
• find pedestrians
Activity recognition
• walking
• shopping
• rolling a cart
• sitting
• talking
• …
Image parsing
sky
mountain

building

tree
building
banner

street lamp

market
people
Image description
This is a busy street in an Asian city.
Mountains and a large palace or
fortress loom in the background. In the
foreground, we see colorful souvenir
stalls and people walking around and
shopping. One person in the lower left
is pushing an empty cart, and a couple
of people in the middle are sitting,
possibly posing for a photograph.
Image classification
The statistical learning
framework
• Apply a prediction function to a feature representation of
the image to get the desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
The statistical learning
framework
y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples


{(x1,y1), …, (xN,yN)}, estimate the prediction function f by
minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Learned
model
Testing

Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Traditional recognition pipeline

Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction

• Features are not learned


• Trainable classifier is often generic (e.g. SVM)
Bags of features
Traditional features: Bags-of-features
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
1. Local feature extraction
• Sample patches and extract descriptors
2. Learning the visual vocabulary

Extracted descriptors
from the training set

Slide credit: Josef Sivic


2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic


2. Learning the visual vocabulary
Visual vocabulary

Clustering

Slide credit: Josef Sivic


Review: K-means clustering
• Want to minimize sum of squared Euclidean
distances between features xi and their
nearest cluster centers mk
D( X , M )    i k
( x 
cluster k point i in
m ) 2

cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features
assigned to it
Example visual vocabulary


Appearance codebook
Source: B. Leibe
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud


http://chir.ag/projects/preztags/
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud


http://chir.ag/projects/preztags/
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud


http://chir.ag/projects/preztags/
Spatial pyramids

level 0

Lazebnik, Schmid & Ponce (CVPR 2006)


Spatial pyramids

level 0 level 1

Lazebnik, Schmid & Ponce (CVPR 2006)


Spatial pyramids

level 0 level 1 level 2

Lazebnik, Schmid & Ponce (CVPR 2006)


Spatial pyramids
• Scene classification results
Spatial pyramids
• Caltech101 classification results
Traditional recognition pipeline

Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction
Classifiers: Nearest neighbor

Test Training
Training examples
examples example
from class 2
from class 1

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs


No training required!
K-nearest neighbor classifier
• For a new point, find the k closest points
from training data
• Vote for class label with labels of the k points

k=5
K-nearest neighbor classifier

Which classifier is more robust to outliers?

Credit: Andrej Karpathy, http://cs231n.github.io/classification/


K-nearest neighbor classifier

Credit: Andrej Karpathy, http://cs231n.github.io/classification/


Linear classifiers

Find a linear function to separate the classes:


f(x) = sgn(w  x + b)
Visualizing linear classifiers

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/


Nearest neighbor vs. linear classifiers
• NN pros:
• Simple to implement
• Decision boundaries not necessarily linear
• Works for any number of classes
• Nonparametric method
• NN cons:
• Need good distance function
• Slow at test time
• Linear pros:
• Low-dimensional parametric representation
• Very fast at test time
• Linear cons:
• Works for two classes
• How to train the linear function?
• What if data is not linearly separable?
Support vector machines
• When the data is linearly separable, there may
be more than one separator (hyperplane)

Which separator
is best?
Support vector machines
• Find hyperplane that maximizes the margin
between the positive and negative examples
x i positive ( yi  1) : xi  w  b  1
x i negative ( yi  1) : x i  w  b  1

For support vectors, x i  w  b  1

Distance between point | xi  w  b |


and hyperplane: || w ||

Therefore, the margin is 2 / ||w||


Support vectors Margin

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
x i positive ( yi  1) : xi  w  b  1
x i negative ( yi  1) : x i  w  b  1
Quadratic optimization problem:

1 2
min w subject to yi ( w  x i  b )  1
w ,b 2

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
SVM parameter learning
1 2
• Separable data: min w subject to yi ( w  x i  b )  1
w ,b 2

Maximize Classify training data correctly


margin

• Non-separable data:
n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1

Maximize Minimize classification mistakes


margin
SVM parameter learning

n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1

+1
Margin 0
-1
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can
always be mapped to some higher-dimensional
feature space where the training set is separable

Φ: x → φ(x)

Image source
Nonlinear SVMs
• Linearly separable dataset in 1D:

0 x

• Non-separable dataset in 1D:

0 x

• We can map the data to a higher-dimensional space:


x2

0 x Slide credit: Andrew Moore


The kernel trick
• General idea: the original input space can
always be mapped to some higher-dimensional
feature space where the training set is separable

• The kernel trick: instead of explicitly computing


the lifting transformation φ(x), define a kernel
function K such that
K(x , y) = φ(x) · φ(y)

(to be valid, the kernel function must satisfy


Mercer’s condition)
The kernel trick
• Linear SVM decision function:

w  x  b  i  i yi x i  x  b

learned Support
weight vector

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
The kernel trick
• Linear SVM decision function:

w  x  b  i  i yi x i  x  b

• Kernel SVM decision function:

  y  ( x )   ( x)  b    y K ( x , x)  b
i
i i i
i
i i i

• This gives a nonlinear decision boundary in the


original feature space

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Polynomial kernel: K (x, y )  (c  x  y )
d
Gaussian kernel
• Also known as the radial basis function (RBF)
kernel:
 1 2
K (x, y )  exp  2 x  y 
  

K(x, y)

||x – y||
Gaussian kernel

SV’s
Kernels for histograms
• Histogram intersection:
N
K(h1, h2 ) =å min(h1 (i), h2 (i))
i=1

• Square root (Bhattacharyya kernel):

N
K(h1, h2 ) =å h1 (i)h2 (i)
i=1
SVMs: Pros and cons
• Pros
• Kernel-based framework is very powerful, flexible
• Training is convex optimization, globally optimal solution can
be found
• Amenable to theoretical analysis
• SVMs work very well in practice, even with very small
training sample sizes

• Cons
• No “direct” multi-class SVM, must combine two-class SVMs
(e.g., with one-vs-others)
• Computation, memory (esp. for nonlinear SVMs)
Generalization
• Generalization refers to the ability to correctly
classify never before seen examples
• Can be controlled by turning “knobs” that affect
the complexity of the model

Training set (labels known) Test set (labels


unknown)
Diagnosing generalization ability
• Training error: how does the model perform on the data on
which it was trained?
• Test error: how does it perform on never before seen data?

Underfitting Overfitting
Error

Test error

Training error

Low Model complexity High


Source: D. Hoiem
Underfitting and overfitting
• Underfitting: training and test error are both high
• Model does an equally poor job on the training and the test set
• Either the training procedure is ineffective or the model is too
“simple” to represent the data
• Overfitting: Training error is low but test error is high
• Model fits irrelevant characteristics (noise) in the training data
• Model is too complex or amount of training data is insufficient

Underfitting Good generalization Overfitting

Figure source
Effect of training set size

Few training examples


Test Error

Many training examples

Low Model complexity High

Source: D. Hoiem
Validation
• Split the data into training, validation, and test subsets
• Use training set to optimize model parameters
• Use validation test to choose the best model
• Use test set only to evaluate performance

Stopping point

Test set loss


Error

Validation
set loss

Training set loss

Model complexity

You might also like