Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
Categorization
Computer Vision
Jia-Bin Huang, Virginia Tech
Administrative stuffs
• HW 4
• Due 11:55pm on Wed, Oct 31
• Object Detection
• Action Recognition
• Image features
• Color, texture, gradient, shape, interest points
• Histograms, feature encoding, and pooling
• CNN as feature
Trees
Bear
Camera
Man
Can I put stuff in it?
Rabbit Grass
Forest
Describe, predict, or interact with the object based
on visual cues
Is it dangerous?
Is it alive?
How fast does it run? Is it soft?
Does it have a tail? Can I poke with it?
Why do we care about categories?
• From an object’s category, we can make predictions about its
behavior in the future, beyond of what is immediately
perceived.
• Pointers to knowledge
• Help to understand individual cases not previously encountered
• Communication
Theory of categorization
• Definitional approach
• Prototype approach
• Exemplar approach
Definitional approach:
classical view of categories
• Plato & Aristotle
• Categories are defined by a list of
properties shared by all elements in a
category
• Category membership is binary
• Every member in the category is equal
…
Basic level dog cat cow
German
Doberman
shepherd
Sub Basic Superordinate
Subordinate … “Fido” …
Rosch et a. Principle of categorization, 1978 level
Image categorization
• Cat vs Dog
Image categorization
• Object recognition
• Fine-grained recognition
Visipedia Project
Image categorization
• Place recognition
Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image
• Image features: map images to feature space
x
x
x x
x
x x x
x
o
o x
x
o o x
o o
o
x2 o o oo o
x1
• Classifiers: map feature space to label space
x x
x x
x x x x
x x
x x x x x x
x x
o o
o x o x
x x
o o x o o x
o o o o
o o
x2 o o oo o x2 o o oo o
x1 x1
Different types of classification
• Exemplar-based: transfer category labels from
examples with most similar features
• What similarity function? What parameters?
• Linear classifier: confidence in positive label is a
weighted sum of features
• What are the weights?
• Non-linear classifier: predictions based on more
complex function of features
• What form does the classifier take? Parameters?
• Generative classifier: assign to the label that best
explains the features (makes features most likely)
• What is the probability function and its parameters?
Note: You can always fully design the classifier by hand, but usually this is too
difficult. Typical solution: learn from training examples.
Testing phase
Training Training
Images
Training Labels
Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image
Q: What are good features for…
• recognizing a beach?
Q: What are good features for…
• recognizing cloth fabric?
Q: What are good features for…
• recognizing a mug?
What are the right features?
• Object: shape
• Local shape info, shading, shadows, texture
• Scene : geometric layout
• linear perspective, gradients, line segments
• Material properties: albedo, feel, hardness
• Color, texture
• Action: motion
• Optical flow, tracked points
General principles of representation
• Coverage
• Ensure that all relevant info is
captured
• Concision
• Minimize number of features without
sacrificing coverage
• Directness
• Ideal features are independently
useful for prediction
Image representations
• Templates
• Intensity, gradients, etc.
• Histograms
• Color, texture, SIFT descriptors, Image Gradient
etc. Intensity template
• Average of features
Image representations: histograms
Global histogram
- Represent distribution of features
• Color, texture, depth, … Space Shuttle
Cargo Bay
Images from Dave Kauchak
Image representations: histograms
• Data samples in 2D
Feature 2
Feature 1
Image representations: histograms
• Probability or count of data in each bin
• Marginal histogram on feature 1
Feature 2
Feature 1
bin
Image representations: histograms
• Marginal histogram on feature 2
Feature 2
bin
Feature 1
Image representations: histograms
• Joint histogram
Feature 2
bin
Feature 1
Modeling multi-dimensional data
Feature 2
Feature 2
Feature 1
Feature 1
Feature 2
Feature 1
bin
Feature 1
Computing histogram distance
• Histogram intersection
m 1
(hi , h j )
2
2 m 1 hi (m) h j (m)
[Rubner et al. The Earth Mover's Distance as a Metric for Image Retrieval, IJCV 2000]
Histograms: implementation issues
• Quantization
• Grids: fast but applicable only with few dimensions
• Clustering: slower but can quantize data in higher
dimensions
• Matching
• Histogram intersection or Euclidean may be faster
• Chi-squared often works better
• Earth mover’s distance is good for when nearby bins
represent similar values
What kind of things do we compute histograms of?
• Color
• Image
patches
• BoW
histogram
• Codewords
Image categorization with bag of
words
Training
1. Extract keypoints and descriptors for all training images
2. Cluster descriptors
3. Quantize descriptors using cluster centers to get “visual words”
4. Represent each image by normalized counts of “visual words”
5. Train classifier on labeled examples using histogram values as features
Testing
6. Extract keypoints/descriptors and quantize into visual words
7. Compute visual word histogram
8. Compute label or confidence using classifier
Bag of visual words image classification
• Posterior probability
• Average/max pooling
=avg/max
• Second-order pooling
[Joao et al. PAMI 2014]
=avg/max
2012 ImageNet 1K
(Fall 2012)
40
35
30
25
20
Error
15
10
0
CE d am IA rd ISI isio
n
R-X
R
te r / I NR Ox
fo r V
LEA ms CE e
of
A XR Sup
U.
2012 ImageNet 1K
(Fall 2012)
40
35
30
25
20
Error
15
10
0
CE d am IA rd ISI isio
n
R-X
R
te r / I NR Ox
fo r V
LEA ms CE e
of
A XR Sup
U.
Shallow vs. deep learning
Label
Dense
Dense
• Engineered vs. learned
Dense
Dense
features
Dense
Dense
Convolution
Convolution
Label Convolution
Convolution
Classifier
Classifier Convolution
Convolution
Pooling
Pooling Convolution
Convolution
Feature
Feature extraction
extraction Convolution
Convolution
Image
Image Image
Image
Gradient-Based Learning Applied to Document
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998
+
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998
t a *
D a
Imagenet Classification with Deep Convolutional Neural
* Rectified activations and dropout
Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Slide Credit: L. Zitnick
Convolutional activation features
• Shape of regions
• Image features
• Coverage, concision, directness
• Color, gradients, textures, motion, descriptors
• Histogram, feature encoding, and pooling
• CNN as features
• Image/region categorization
Next lecture –
Foundations of Deep Learning
Training Training
Images
Training Labels
Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image