Image Classification AI
Image Classification AI
• Image classification.
• Bag-of-words.
• K-means clustering.
• Classification.
• K nearest neighbors.
• Naïve Bayes.
Lectures 7 – 13
2. Geometry-based vision. See also 16-822: Geometry-based Methods in
Vision
Lectures 14 – 17
See also 16-823: Physics-based Methods in
3. Physics-based vision.
Vision
See also 15-463: Computational Photography
4. Learning-based vision.
We are starting this part
now
Mountain
Trees
Building
Vendors
People
Ground
Object categorization
mountain
tree
building
banner
street lamp
vendor
people
What type of scene is it?
(Scene categorization)
Outdoor
Marketplace
City
Activity / Event Recognition
what are these
people doing?
Object recognition
Is it really so hard?
Find the chair in this image Output of normalized correlation
This is a chair
Object recognition
Is it really so hard?
Find the chair in this image
Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
Why is this hard?
Michelangelo 1475-1564
Challenge: variable illumination
Magritte, 1957
Occlusion
Challenge: background clutter
Svetlana Lazebnik
Image Classification
Image Classification: Problem
Data-driven approach
• Collect a database of images with labels
• Use ML to train an image classifier
• Evaluate the classifier on test images
Bag of words
What object do these parts belong to?
Some local feature
are very informative
An object as
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
Bag-of-features
an old idea
(e.g., texture recognition and information retrieval)
Texture recognition
histogram
1 6 2 1 0 0 0 1
senso
Tartan robot CHIMP CMU bio soft ankle
r
0 4 0 1 4 5 3 2
senso
Tartan robot CHIMP CMU bio soft ankle
r
http://www.fodey.com/generators/newspaper/snippet.asp
A document (datapoint) is a vector of counts over each word (feature)
inverse document
term frequency
frequency
Encode:
build Bags-of-Words (BOW) vectors
for each image
Classify:
Train and test data using BOWs
Dictionary Learning:
Learn Visual Words using clustering
Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]
…
How do we learn the dictionary?
…
…
Clustering
Visual vocabulary
Clustering
K-means clustering
1. Select initial
centroids at random
1. Select initial 2. Assign each object to
centroids at random the cluster with the
nearest centroid.
1. Select initial 2. Assign each object to
centroids at random the cluster with the
nearest centroid.
…
Appearance codebook
Source: B. Leibe
Another dictionary
…
…
…
…
…
Appearance codebook
Source: B. Leibe
Dictionary Learning:
Learn Visual Words using clustering
Encode:
build Bags-of-Words (BOW) vectors
for each image
Classify:
Train and test data using BOWs
1. Quantization: image features
gets associated to a visual word
(nearest cluster center)
Encode:
build Bags-of-Words (BOW) vectors
for each image
Encode:
build Bags-of-Words (BOW) vectors
for each image 2. Histogram: count
the number of visual
word occurrences
frequency
…..
codewords
Dictionary Learning:
Learn Visual Words using clustering
Encode:
build Bags-of-Words (BOW) vectors
for each image
Classify:
Train and test data using BOWs
K nearest neighbors
Naïve Bayes
K nearest neighbors
Distribution of data from two classes
Distribution of data from two classes
• Important to normalize.
Dimensions have different scales
How many K?
Cosine
Chi-squared
Choice of distance metric
• Hyperparameter
Visualization: L2 distance
CIFAR-10 and NN results
k-nearest neighbor
• Find the k closest points from training data
• Labels of the k points “vote” to classify
Hyperparameters
• What is the best distance to use?
• What is the best value of k to use?
• Very problem-dependent
• Must try them all and see what works best
Validation
Cross-validation
How to pick hyperparameters?
• Methodology
– Train and test
– Train, validate, test
Cons
• storage requirements
• Training: O(1)
• Testing: O(MN)
• Hmm…
– Normally need the opposite
– Slow training (ok), fast testing (necessary)
Naïve Bayes
Distribution of data from two classes
For classification, z is a
X is a set of observed
discrete random
features
variable (e.g., features from a single image)
(e.g., car, person, building)
For classification, z is a
Each x is an observed
discrete random
feature
variable (e.g., visual words)
(e.g., car, person, building)
posterior
In our context…
The naive Bayes’ classifier is solving this optimization
Bayes’ Rule
Remove constants
Recall:
To compute the MAP estimate
count 0 4 0 1 4 5 3 2
word Tartan robot CHIMP CMU bio soft ankle sensor
http://www.fodey.com/generators/newspaper/snippet.asp p(x|z) 0.0 0.21 0.0 0.05 0.21 0.26 0.16 0.11
data (histogram)
Convert image to histogram representation
Distribution of data from two classes
Important property:
Free to choose any normalization of w
The line
scale by
Hyperplanes (planes) in 3D
support vectors
margin
Objective Function
Constraints
misclassifie
d point
‘soft’ margin
objective subject to
for
‘soft’ margin
objective subject to
for
for
Basic reading:
• Szeliski, Chapter 14.