0% found this document useful (0 votes)
83 views21 pages

Course Material For cs391

This document discusses image features for object and scene recognition using machine learning. It describes the machine learning framework of training classifiers on labeled image features to perform recognition. Common image features include global descriptors like GIST and bags-of-features built from local patches. Bags-of-features represent images as histograms of visual words learned from clustering local descriptors. Spatial information can improve recognition of actions by extracting space-time interest points. Overall, the document outlines key steps in extracting and encoding image features to train classifiers for visual recognition tasks.

Uploaded by

Why Bother
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views21 pages

Course Material For cs391

This document discusses image features for object and scene recognition using machine learning. It describes the machine learning framework of training classifiers on labeled image features to perform recognition. Common image features include global descriptors like GIST and bags-of-features built from local patches. Bags-of-features represent images as histograms of visual words learned from clustering local descriptors. Spatial information can improve recognition of actions by extracting space-time interest points. Overall, the document outlines key steps in extracting and encoding image features to train classifiers for visual recognition tasks.

Uploaded by

Why Bother
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Image Features for

Recognition
CSC 391: Introduction to Computer Vision
Recognition review
• Recognition tasks
• scene categorization, annotation, detection, activity
recognition, parsing
• Object categorization
• Machine learning framework
• training, testing, generalization
• Example classifiers
• Nearest neighbor
• Linear classifiers
The machine learning framework
y = f(x)
output Image feature
prediction function


• Training: given a training set of labeled examples 

{(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y =
f(x)
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Learned
model
Testing

Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Image features
• Spatial
support:

Pixel or local patch Segmentation region

Bounding box Whole image


Image features
• Global image features for whole-image
classification tasks

• GIST descriptors
• Bags of features
GIST descriptors
• Oliva & Torralba (2001)

http://people.csail.mit.edu/torralba/code/spatialenvelope/
Bags of features
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”

14.
1. Local feature extraction

• Regular grid or interest regions


1. Local feature extraction

Compute
descriptor Normalize patch

Detect patches

Slide credit: Josef Sivic


1. Local feature extraction

Slide credit: Josef Sivic


2. Learning the visual vocabulary

Slide credit: Josef Sivic


2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic


2. Learning the visual vocabulary
Visual vocabulary

Clustering

Slide credit: Josef Sivic


Review: K-means clustering
• Want to minimize sum of squared Euclidean
distances between features xi and their nearest
cluster centers mk


 2
D( X , M ) = ∑ ∑ (x i − mk )

 cluster k point i in
cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features assigned to it
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Visual vocabularies: Details
• How to choose vocabulary size?
• Too small: visual words not representative of all patches
• Too large: quantization artifacts, overfitting
• Right size is application-dependent

• Improving efficiency of quantization


• Vocabulary trees (Nister and Stewenius, 2005)

• Improving vocabulary quality


• Discriminative/supervised training of codebooks
• Sparse coding, non-exclusive assignment to codewords

• More discriminative bag-of-words representations


• Fisher Vectors (Perronnin et al., 2007), VLAD (Jegou et al., 2010)

• Incorporating spatial information


Bags of features for action recognition
Space-time interest points

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Bags of features for action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Credit:
Slide set developed by S. Lazebnik, University
of Illinois at Urbana-Champaign

You might also like