0% found this document useful (0 votes)

36 views

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

The document introduces common object recognition tasks like image classification, tagging, detection, and parsing. It then discusses traditional recognition pipelines using hand-designed features and classifiers. The statistical learning framework is presented along with common steps like feature extraction, learning visual vocabularies, and representing images with bags of features. Finally, nearest neighbor, linear, and support vector machine classifiers are covered.

Uploaded by

ikhsan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

Uploaded by

ikhsan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

Introduction to object recognition

Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Overview
• Basic recognition tasks
• A statistical learning approach
• Traditional or “shallow” recognition pipeline
• Bags of features
• Classifiers
• Next time: neural networks and “deep”
recognition pipeline
Common recognition tasks
Image classification
• outdoor/indoor
•
city/forest/factory/etc.
Image tagging
• street
• people
• building
•
mountain
• …
Object detection
• find pedestrians
Activity recognition
• walking
• shopping
• rolling a cart
• sitting
• talking
• …
Image parsing
sky
mountain

building

tree
building
banner

street lamp

market
people
Image description
This is a busy street in an Asian city.
Mountains and a large palace or
fortress loom in the background. In the
foreground, we see colorful souvenir
stalls and people walking around and
shopping. One person in the lower left
is pushing an empty cart, and a couple
of people in the middle are sitting,
possibly posing for a photograph.
Image classification
The statistical learning
framework
• Apply a prediction function to a feature representation of
the image to get the desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
The statistical learning
framework
y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples

{(x1,y1), …, (xN,yN)}, estimate the prediction function f by
minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Learned
model
Testing

Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Traditional recognition pipeline

Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)
Bags of features
Traditional features: Bags-of-features
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
1. Local feature extraction
• Sample patches and extract descriptors
2. Learning the visual vocabulary

Extracted descriptors
from the training set

Slide credit: Josef Sivic

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

2. Learning the visual vocabulary
Visual vocabulary
…

Clustering

Slide credit: Josef Sivic

Review: K-means clustering
• Want to minimize sum of squared Euclidean
distances between features xi and their
nearest cluster centers mk
D( X , M )    i k
( x 
cluster k point i in
m ) 2

cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features
assigned to it
Example visual vocabulary

…
Appearance codebook
Source: B. Leibe
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Bags of features: Motivation
• Orderless document representation: frequencies of
words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/projects/preztags/
Spatial pyramids

level 0

Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids

level 0 level 1

Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids

level 0 level 1 level 2

Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids
• Scene classification results
Spatial pyramids
• Caltech101 classification results
Traditional recognition pipeline

Image Hand-designed
Trainable Object
Pixels feature
classifier Class
extraction
Classifiers: Nearest neighbor

Test Training
Training examples
examples example
from class 2
from class 1

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

No training required!
K-nearest neighbor classifier
• For a new point, find the k closest points
from training data
• Vote for class label with labels of the k points

k=5
K-nearest neighbor classifier

Which classifier is more robust to outliers?

Credit: Andrej Karpathy, http://cs231n.github.io/classification/

K-nearest neighbor classifier

Credit: Andrej Karpathy, http://cs231n.github.io/classification/

Linear classifiers

Find a linear function to separate the classes:

f(x) = sgn(w  x + b)
Visualizing linear classifiers

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/

Nearest neighbor vs. linear classifiers
• NN pros:
• Simple to implement
• Decision boundaries not necessarily linear
• Works for any number of classes
• Nonparametric method
• NN cons:
• Need good distance function
• Slow at test time
• Linear pros:
• Low-dimensional parametric representation
• Very fast at test time
• Linear cons:
• Works for two classes
• How to train the linear function?
• What if data is not linearly separable?
Support vector machines
• When the data is linearly separable, there may
be more than one separator (hyperplane)

Which separator
is best?
Support vector machines
• Find hyperplane that maximizes the margin
between the positive and negative examples
x i positive ( yi  1) : xi  w  b  1
x i negative ( yi  1) : x i  w  b  1

For support vectors, x i  w  b  1

Distance between point | xi  w  b |

and hyperplane: || w ||

Therefore, the margin is 2 / ||w||

Support vectors Margin

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
x i positive ( yi  1) : xi  w  b  1
x i negative ( yi  1) : x i  w  b  1
Quadratic optimization problem:

1 2
min w subject to yi ( w  x i  b )  1
w ,b 2

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
SVM parameter learning
1 2
• Separable data: min w subject to yi ( w  x i  b )  1
w ,b 2

Maximize Classify training data correctly

margin

• Non-separable data:
n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1

Maximize Minimize classification mistakes

margin
SVM parameter learning

n
1
w +C å max ( 0,1- yi (w ×x i + b))
2
min
w,b 2 i=1

+1
Margin 0
-1
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can
always be mapped to some higher-dimensional
feature space where the training set is separable

Φ: x → φ(x)

Image source
Nonlinear SVMs
• Linearly separable dataset in 1D:

0 x

• Non-separable dataset in 1D:

0 x

• We can map the data to a higher-dimensional space:

0 x Slide credit: Andrew Moore

The kernel trick
• General idea: the original input space can
always be mapped to some higher-dimensional
feature space where the training set is separable

• The kernel trick: instead of explicitly computing

the lifting transformation φ(x), define a kernel
function K such that
K(x , y) = φ(x) · φ(y)

(to be valid, the kernel function must satisfy

Mercer’s condition)
The kernel trick
• Linear SVM decision function:

w  x  b  i  i yi x i  x  b

learned Support
weight vector

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
The kernel trick
• Linear SVM decision function:

w  x  b  i  i yi x i  x  b

• Kernel SVM decision function:

  y  ( x )   ( x)  b    y K ( x , x)  b
i
i i i
i
i i i

• This gives a nonlinear decision boundary in the

original feature space

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Polynomial kernel: K (x, y )  (c  x  y )
d
Gaussian kernel
• Also known as the radial basis function (RBF)
kernel:
 1 2
K (x, y )  exp  2 x  y 
  

K(x, y)

||x – y||
Gaussian kernel

SV’s
Kernels for histograms
• Histogram intersection:
N
K(h1, h2 ) =å min(h1 (i), h2 (i))
i=1

• Square root (Bhattacharyya kernel):

N
K(h1, h2 ) =å h1 (i)h2 (i)
i=1
SVMs: Pros and cons
• Pros
• Kernel-based framework is very powerful, flexible
• Training is convex optimization, globally optimal solution can
be found
• Amenable to theoretical analysis
• SVMs work very well in practice, even with very small
training sample sizes

• Cons
• No “direct” multi-class SVM, must combine two-class SVMs
(e.g., with one-vs-others)
• Computation, memory (esp. for nonlinear SVMs)
Generalization
• Generalization refers to the ability to correctly
classify never before seen examples
• Can be controlled by turning “knobs” that affect
the complexity of the model

Training set (labels known) Test set (labels

unknown)
Diagnosing generalization ability
• Training error: how does the model perform on the data on
which it was trained?
• Test error: how does it perform on never before seen data?

Underfitting Overfitting
Error

Test error

Training error

Low Model complexity High

Source: D. Hoiem
Underfitting and overfitting
• Underfitting: training and test error are both high
• Model does an equally poor job on the training and the test set
• Either the training procedure is ineffective or the model is too
“simple” to represent the data
• Overfitting: Training error is low but test error is high
• Model fits irrelevant characteristics (noise) in the training data
• Model is too complex or amount of training data is insufficient

Underfitting Good generalization Overfitting

Figure source
Effect of training set size

Few training examples

Test Error

Many training examples

Low Model complexity High

Source: D. Hoiem
Validation
• Split the data into training, validation, and test subsets
• Use training set to optimize model parameters
• Use validation test to choose the best model
• Use test set only to evaluate performance

Stopping point

Test set loss

Error

Validation
set loss

Training set loss

Model complexity

Part C LC
No ratings yet
Part C LC
16 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
Local Features and Bag of Words Models
No ratings yet
Local Features and Bag of Words Models
60 pages
Bag of Feature
No ratings yet
Bag of Feature
75 pages
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
38 pages
Object Recog
No ratings yet
Object Recog
102 pages
Bag of Words
No ratings yet
Bag of Words
72 pages
CV Lecture 07 BagOfFeatures
No ratings yet
CV Lecture 07 BagOfFeatures
42 pages
Bag-Of-Words Models: Noah Snavely
No ratings yet
Bag-Of-Words Models: Noah Snavely
47 pages
lecture6-2 (1)
No ratings yet
lecture6-2 (1)
37 pages
Part 11 MD
No ratings yet
Part 11 MD
53 pages
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
70 pages
Bai09 Descriptors
No ratings yet
Bai09 Descriptors
81 pages
Classifying Images: D.A. Forsyth
No ratings yet
Classifying Images: D.A. Forsyth
24 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
1 page
Bag of Features
No ratings yet
Bag of Features
49 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Bag of Words: The Framework
No ratings yet
Bag of Words: The Framework
44 pages
SWE622 Lecture 3 Classification
No ratings yet
SWE622 Lecture 3 Classification
57 pages
Course Material For cs391
No ratings yet
Course Material For cs391
21 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Image Matching: - Alok Talekar - Sairam Sundaresan
No ratings yet
Image Matching: - Alok Talekar - Sairam Sundaresan
70 pages
Quiz 1 On Wednesday
No ratings yet
Quiz 1 On Wednesday
46 pages
Feature Engineering Handout
No ratings yet
Feature Engineering Handout
33 pages
14
No ratings yet
14
72 pages
Lecture 2
No ratings yet
Lecture 2
101 pages
RO47002 - Lecture 2A - Case Study Visual Object Detection
No ratings yet
RO47002 - Lecture 2A - Case Study Visual Object Detection
24 pages
FR AIA David Forsyth Distribute
No ratings yet
FR AIA David Forsyth Distribute
41 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Lect 08 -Recognition
No ratings yet
Lect 08 -Recognition
34 pages
1.Introduction
No ratings yet
1.Introduction
81 pages
out (8) (1)
No ratings yet
out (8) (1)
145 pages
Visual Categorization With Bags of Keypoints
No ratings yet
Visual Categorization With Bags of Keypoints
17 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Feature Extraction: Dr. Mallikarjun Hangarge
No ratings yet
Feature Extraction: Dr. Mallikarjun Hangarge
17 pages
6_2023_10_01!01_21_55_PM
No ratings yet
6_2023_10_01!01_21_55_PM
43 pages
CV Lecture 11
No ratings yet
CV Lecture 11
147 pages
CS231n Convolutional Neural Networks For Visual Recognition PDF
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition PDF
16 pages
Module 1-3
No ratings yet
Module 1-3
63 pages
15
No ratings yet
15
38 pages
Sep7 Classification
No ratings yet
Sep7 Classification
65 pages
VietNguyen_MasterThesis
No ratings yet
VietNguyen_MasterThesis
66 pages
Pattern Recognition
No ratings yet
Pattern Recognition
45 pages
ppt4dl
No ratings yet
ppt4dl
81 pages
BOW Assignment 210097
No ratings yet
BOW Assignment 210097
10 pages
Lecture 2
No ratings yet
Lecture 2
98 pages
Marina Ivašić-Kos, Mile Pavlić,: Maja Matetić
No ratings yet
Marina Ivašić-Kos, Mile Pavlić,: Maja Matetić
14 pages
_PhD Visual Object Category Recognition
No ratings yet
_PhD Visual Object Category Recognition
193 pages
Tutorial 7 Developing A Simple Image Classifier
No ratings yet
Tutorial 7 Developing A Simple Image Classifier
11 pages
Lec23 Categorization Wide
No ratings yet
Lec23 Categorization Wide
53 pages
Classifier
No ratings yet
Classifier
39 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
08classification I
No ratings yet
08classification I
52 pages
L10 Image Classification
No ratings yet
L10 Image Classification
10 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
VGG Image Classification Practical
No ratings yet
VGG Image Classification Practical
11 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
From Everand
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
Fouad Sabry
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
From Everand
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
Prof. Sham Tickoo
No ratings yet
Data Nasabah Bank Sampah: TAHUN 2022
No ratings yet
Data Nasabah Bank Sampah: TAHUN 2022
8 pages
Lec01 - Intro To Computer Vision
No ratings yet
Lec01 - Intro To Computer Vision
43 pages
Informed Search Methods: Read Chapter 4 Use Text For More Examples: Work Them Out Yourself
No ratings yet
Informed Search Methods: Read Chapter 4 Use Text For More Examples: Work Them Out Yourself
32 pages
Accepted Manuscript International Journal of Information Technology & Decision Making
No ratings yet
Accepted Manuscript International Journal of Information Technology & Decision Making
36 pages
Strategic Decisions in Supply-Chain Intelligence Using Knowledge Management: An Analytic-Network-Process Framework
No ratings yet
Strategic Decisions in Supply-Chain Intelligence Using Knowledge Management: An Analytic-Network-Process Framework
8 pages
AUD679 - Tutorial: Explaination
No ratings yet
AUD679 - Tutorial: Explaination
3 pages
Unit 1
No ratings yet
Unit 1
12 pages
Electrical Schematic Symbols and Definitions
0% (1)
Electrical Schematic Symbols and Definitions
2 pages
Saic D 2025
No ratings yet
Saic D 2025
12 pages
RULA: A Survey Method For The - Irwestigation of World-Related Upper Limb Disorders
No ratings yet
RULA: A Survey Method For The - Irwestigation of World-Related Upper Limb Disorders
10 pages
Manito Olinteg03 Act.9-10
No ratings yet
Manito Olinteg03 Act.9-10
3 pages
3412 Overhaul Parts List
No ratings yet
3412 Overhaul Parts List
5 pages
EJ Holland: Work Experience
No ratings yet
EJ Holland: Work Experience
2 pages
F-IoT Unit-5
No ratings yet
F-IoT Unit-5
50 pages
Hot Gas Defrost Systems For Large Evaporators PDF
100% (1)
Hot Gas Defrost Systems For Large Evaporators PDF
18 pages
Outokumpu Corrosion Management News Acom 2 2014
No ratings yet
Outokumpu Corrosion Management News Acom 2 2014
11 pages
Sepvar
No ratings yet
Sepvar
10 pages
Starikovsky-Plasma-assisted Ignition and Combustion
No ratings yet
Starikovsky-Plasma-assisted Ignition and Combustion
50 pages
Rpad 4 2 Deploy Us en
No ratings yet
Rpad 4 2 Deploy Us en
104 pages
Vietnam Supplier Details
No ratings yet
Vietnam Supplier Details
542 pages
Sample Resume
No ratings yet
Sample Resume
3 pages
Solar Power System Design Applications For Pool Wa
No ratings yet
Solar Power System Design Applications For Pool Wa
6 pages
Events 101
No ratings yet
Events 101
44 pages
Introduction To Visual Basic.
No ratings yet
Introduction To Visual Basic.
13 pages
Rajiv Gandhi Institute of Petroleum Technology, Jais, Amethi, Uttar Pradesh
No ratings yet
Rajiv Gandhi Institute of Petroleum Technology, Jais, Amethi, Uttar Pradesh
12 pages
Rule 70 Forcible Entry and Unlawful Detainer
No ratings yet
Rule 70 Forcible Entry and Unlawful Detainer
6 pages
HemosIL APTT Brochure Rev2 May 07 PDF
100% (1)
HemosIL APTT Brochure Rev2 May 07 PDF
4 pages
The Freiburg Groceries Dataset
No ratings yet
The Freiburg Groceries Dataset
7 pages
Reading Understanding and Applying Nursing Research 4th Edition James A. Fain - Own the complete ebook with all chapters in PDF format
100% (1)
Reading Understanding and Applying Nursing Research 4th Edition James A. Fain - Own the complete ebook with all chapters in PDF format
43 pages
Businees Diagnóstics
No ratings yet
Businees Diagnóstics
14 pages
MongoDB Express Angular Node Development
No ratings yet
MongoDB Express Angular Node Development
13 pages
CS101 Sample Paper (Final)
No ratings yet
CS101 Sample Paper (Final)
28 pages
HRM Assignment 1
No ratings yet
HRM Assignment 1
3 pages
How Web Servers Work
No ratings yet
How Web Servers Work
11 pages

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

Uploaded by

Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others

Uploaded by

Introduction to object recognition

• Training: given a training set of labeled examples

• Features are not learned

Slide credit: Josef Sivic

Slide credit: Josef Sivic

Slide credit: Josef Sivic

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

Lazebnik, Schmid & Ponce (CVPR 2006)

Lazebnik, Schmid & Ponce (CVPR 2006)

level 0 level 1 level 2

Lazebnik, Schmid & Ponce (CVPR 2006)

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

Which classifier is more robust to outliers?

Credit: Andrej Karpathy, http://cs231n.github.io/classification/

Credit: Andrej Karpathy, http://cs231n.github.io/classification/

Find a linear function to separate the classes:

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/

For support vectors, x i  w  b  1

Distance between point | xi  w  b |

Therefore, the margin is 2 / ||w||

Maximize Classify training data correctly

Maximize Minimize classification mistakes

• Non-separable dataset in 1D:

• We can map the data to a higher-dimensional space:

0 x Slide credit: Andrew Moore

• The kernel trick: instead of explicitly computing

(to be valid, the kernel function must satisfy

• Kernel SVM decision function:

• This gives a nonlinear decision boundary in the

• Square root (Bhattacharyya kernel):

Training set (labels known) Test set (labels

Low Model complexity High

Underfitting Good generalization Overfitting

Few training examples

Many training examples

Low Model complexity High

Test set loss

Training set loss

You might also like