0% found this document useful (0 votes)
53 views70 pages

Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech

Image categorization involves mapping images to categories or labels based on their visual content. This can be done by first extracting image features that encode visual properties like color, texture, shapes, etc. These features are then fed into a classifier which learns during a training phase to map features to categories/labels. The trained classifier can then take features from new, unlabeled images and predict their categories. Common applications include object, scene, and fine-grained recognition, as well as semantic segmentation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views70 pages

Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech

Image categorization involves mapping images to categories or labels based on their visual content. This can be done by first extracting image features that encode visual properties like color, texture, shapes, etc. These features are then fed into a classifier which learns during a training phase to map features to categories/labels. The trained classifier can then take features from new, unlabeled images and predict their categories. Common applications include object, scene, and fine-grained recognition, as well as semantic segmentation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Image Features and

Categorization

Computer Vision
Jia-Bin Huang, Virginia Tech
Administrative stuffs

• Final project proposal


• Due 11:55 PM on Mon, Oct 29
• Find group members on Piazza.
• Submission via Canvas

• HW 4
• Due 11:55pm on Wed, Oct 31

• Demo of modern interactive image segmentation


Review: Interpreting Intensity
• Light and color
–What an image records
• Filtering in spatial domain
• Filtering = weighted sum of neighboring pixels
• Smoothing, sharpening, measuring texture
• Filtering in frequency domain
• Filtering = change frequency of the input image
• Denoising, sampling, image compression
• Image pyramid and template matching
• Filtering = a way to find a template
• Image pyramids for coarse-to-fine search and multi-
scale detection
• Edge detection
• Canny edge = smooth -> derivative -> thin ->
threshold -> link
Review: Correspondence and Alignment
• Interest points
• Find distinct and repeatable points in images
• Harris-> corners, DoG -> blobs
• SIFT -> feature descriptor
• Feature tracking and optical flow
• Find motion of a keypoint/pixel over time
• Lucas-Kanade:
• brightness consistency, small motion, spatial coherence
• Handle large motion:
• iterative update + pyramid search
• Fitting and alignment
• find the transformation parameters that
best align matched points
• Object instance recognition
• Keypoint-based object instance recognition and search
Review: Perspective and 3D Geometry
• Projective geometry and camera models
• What’s the mapping between image and world
coordiantes? x  K R t  X
• Single view metrology and camera calibration
• How can we measure the size of 3D objects in an image?
• How can we estimate the camera parameters?
• Photo stitching
• What’s the mapping from two images taken
without camera translation?
• Epipolar Geometry and Stereo Vision
• What’s the mapping from two images taken with camera
translation?
• Structure from motion
• How can we recover 3D points from multiple images?
Review: Grouping and Segmentation
• Grouping and Segmentation
• How do we group pixels into meaningful regions?
• Use of segmentation: efficiency, better features, object
region proposal, wanted the segmented object

• EM Algorithm, Mixture of Gaussians


• How do we deal with missing data?
• Maximum likelihood estimation
• Probabilistic inference
• Expectation-Maximization algorithm

• MRFs and Graph Cut


• How do we encode pixel dependencies?
• Markov Random Fields
• Graph Cuts
Recognition and Learning
• Image Features and Categorization

• Foundations of Deep Learning

• Convolutional Neural Networks

• Object Detection

• Part and Pixel Labeling

• Action Recognition

• Vision and Language


Today: Image features and categorization

• General concepts of categorization


• Why? What? How?

• Image features
• Color, texture, gradient, shape, interest points
• Histograms, feature encoding, and pooling
• CNN as feature

• Image and region categorization


What do you see in this image?

Trees

Bear
Camera

Man
Can I put stuff in it?

Rabbit Grass

Forest
Describe, predict, or interact with the object based
on visual cues

Is it dangerous?
Is it alive?
How fast does it run? Is it soft?
Does it have a tail? Can I poke with it?
Why do we care about categories?
• From an object’s category, we can make predictions about its
behavior in the future, beyond of what is immediately
perceived.
• Pointers to knowledge
• Help to understand individual cases not previously encountered
• Communication
Theory of categorization

How do we determine if something is a member of a


particular category?

• Definitional approach

• Prototype approach

• Exemplar approach
Definitional approach:
classical view of categories
• Plato & Aristotle
• Categories are defined by a list of
properties shared by all elements in a
category
• Category membership is binary
• Every member in the category is equal

The Categories (Aristotle) Aristotle by Francesco Hayez

Slide Credit: A. A. Efros


Prototype or sum of exemplars ?
Prototype Model Exemplars Model

Category judgments are made


by comparing a new exemplar
Category judgments are made to all the old exemplars of a category
by comparing a new exemplar or to the exemplar that is the most
to the prototype. appropriate
Slide Credit: Torralba
Levels of categorization [Rosch 70s]
Definition of Basic Level:
• Similar shape: Basic level categories are the highest-level
category for which their members have similar shapes.
• Similar motor interactions: … for which people interact with its

members using similar motor sequences.
• Common attributes: … there are a significant number
of attributes in common between pairs of members. animal
Superordinate
… …
levels
similarity quadruped


Basic level dog cat cow

German
Doberman
shepherd
Sub Basic Superordinate
Subordinate … “Fido” …
Rosch et a. Principle of categorization, 1978 level
Image categorization

• Cat vs Dog
Image categorization
• Object recognition

Caltech 101 Average Object Images


Image categorization

• Fine-grained recognition

Visipedia Project
Image categorization
• Place recognition

Places Database [Zhou et al. NIPS 2014]


Image categorization
• Visual font recognition

[Chen et al. CVPR 2014]


Image categorization
• Dating historical photos

1940 1953 1966 1977

[Palermo et al. ECCV 2012]


Image categorization
• Image style recognition

[Karayev et al. BMVC 2014]


Region categorization
• Layout prediction

Assign regions to orientation


Geometric context [Hoiem et al. IJCV 2007]

Assign regions to depth


Make3D [Saxena et al. PAMI 2008]
Region categorization
• Semantic segmentation from RGBD images

[Silberman et al. ECCV 2012]


Region categorization
• Material recognition

[Bell et al. CVPR 2015]


Training phase
Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier
Testing phase
Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image
• Image features: map images to feature space

x
x
x x
x
x x x
x
o
o x
x
o o x
o o
o
x2 o o oo o
x1
• Classifiers: map feature space to label space
x x
x x
x x x x
x x
x x x x x x
x x
o o
o x o x
x x
o o x o o x
o o o o
o o
x2 o o oo o x2 o o oo o
x1 x1
Different types of classification
• Exemplar-based: transfer category labels from
examples with most similar features
• What similarity function? What parameters?
• Linear classifier: confidence in positive label is a
weighted sum of features
• What are the weights?
• Non-linear classifier: predictions based on more
complex function of features
• What form does the classifier take? Parameters?
• Generative classifier: assign to the label that best
explains the features (makes features most likely)
• What is the probability function and its parameters?
Note: You can always fully design the classifier by hand, but usually this is too
difficult. Typical solution: learn from training examples.
Testing phase
Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image
Q: What are good features for…
• recognizing a beach?
Q: What are good features for…
• recognizing cloth fabric?
Q: What are good features for…
• recognizing a mug?
What are the right features?

Depend on what you want to know!

• Object: shape
• Local shape info, shading, shadows, texture
• Scene : geometric layout
• linear perspective, gradients, line segments
• Material properties: albedo, feel, hardness
• Color, texture
• Action: motion
• Optical flow, tracked points
General principles of representation
• Coverage
• Ensure that all relevant info is
captured

• Concision
• Minimize number of features without
sacrificing coverage

• Directness
• Ideal features are independently
useful for prediction
Image representations

• Templates
• Intensity, gradients, etc.

• Histograms
• Color, texture, SIFT descriptors, Image Gradient
etc. Intensity template

• Average of features
Image representations: histograms

Global histogram
- Represent distribution of features
• Color, texture, depth, … Space Shuttle
Cargo Bay
Images from Dave Kauchak
Image representations: histograms
• Data samples in 2D
Feature 2

Feature 1
Image representations: histograms
• Probability or count of data in each bin
• Marginal histogram on feature 1
Feature 2

Feature 1
bin
Image representations: histograms
• Marginal histogram on feature 2
Feature 2

bin

Feature 1
Image representations: histograms
• Joint histogram
Feature 2

bin

Feature 1
Modeling multi-dimensional data
Feature 2

Feature 2
Feature 1
Feature 1

Feature 2

Feature 1

Joint histogram Marginal histogram


• Requires lots of data • Requires independent features
• Loss of resolution to • More data/bin than
avoid empty bins
joint histogram
Modeling multi-dimensional data
• Clustering
• Use the same cluster centers for all images
Feature 2

bin

Feature 1
Computing histogram distance
• Histogram intersection

histint( hi , h j )  1   min  hi (m), h j (m) 


K

m 1

• Chi-squared Histogram matching distance


1 [hi (m)  h j (m)]
K 2

 (hi , h j )  
2

2 m 1 hi (m)  h j (m)

• Earth mover’s distance


(Cross-bin similarity measure)
• minimal cost paid to transform one distribution into the other

[Rubner et al. The Earth Mover's Distance as a Metric for Image Retrieval, IJCV 2000]
Histograms: implementation issues
• Quantization
• Grids: fast but applicable only with few dimensions
• Clustering: slower but can quantize data in higher
dimensions

Few Bins Many Bins


Need less data Need more data
Coarser representation Finer representation

• Matching
• Histogram intersection or Euclidean may be faster
• Chi-squared often works better
• Earth mover’s distance is good for when nearby bins
represent similar values
What kind of things do we compute histograms of?

• Color

L*a*b* color space HSV color space


• Texture (filter banks or HOG over regions)
What kind of things do we compute
histograms of?
• Histograms of descriptors

SIFT – [Lowe IJCV 2004]

• “Bag of visual words”


Analogy to documents
China is forecasting a trade surplus of $90bn
Of all the sensory impressions proceeding to
(£51bn) to $100bn this year, a threefold
the brain, the visual experiences are the
increase on 2004's $32bn. The Commerce
dominant ones. Our perception of the world
Ministry said the surplus would be created by
around us is based essentially on the
a predicted 30% jump in exports to $750bn,
messages that reach the brain from our eyes.
compared with a 18% rise in imports to
For a long time it was thought that the retinal
$660bn. The figures are likely to further
image was transmitted sensory, brain,
point by point to visual China, trade,
annoy the US, which has long argued that
centers in the brain; the cerebral cortex was
visual, perception,
a movie screen, so to speak, upon which the
China's exports are surplus, commerce,
unfairly helped by a
retinal, cerebral deliberately undervalued yuan. Beijing
image in the eye was projected. Throughcortex,
the exports, imports, US,
agrees the surplus is too high, but says the
discoveries of Hubeleye, cell,weoptical
and Wiesel now yuan, bank, domestic,
yuan is only one factor. Bank of China
know that behind the origin of the visual
perception in the brain nerve, image foreign,
governor Zhou Xiaochuan saidincrease,
the country
there is a considerably
also needed to do more to boost domestic
more complicated course Hubel, Wiesel
of events. By
demand so more goodstrade, value
stayed within the
following the visual impulses along their path
country. China increased the value of the
to the various cell layers of the optical cortex,
yuan against the dollar by 2.1% in July and
Hubel and Wiesel have been able to
permitted it to trade within a narrow band, but
demonstrate that the message about the
the US wants the yuan to be allowed to trade
image falling on the retina undergoes a step-
freely. However, Beijing has made it clear
wise analysis in a system of nerve cells
that it will take its time and tread carefully
stored in columns. In this system each cell
before allowing the yuan to rise further in
has its specific function and is responsible for
value.
a specific detail in the pattern of the retinal
image.

ICCV 2005 short course, L. Fei-Fei


Bag of visual words

• Image
patches

• BoW
histogram

• Codewords
Image categorization with bag of
words

Training
1. Extract keypoints and descriptors for all training images
2. Cluster descriptors
3. Quantize descriptors using cluster centers to get “visual words”
4. Represent each image by normalized counts of “visual words”
5. Train classifier on labeled examples using histogram values as features

Testing
6. Extract keypoints/descriptors and quantize into visual words
7. Compute visual word histogram
8. Compute label or confidence using classifier
Bag of visual words image classification

[Chatfieldet al. BMVC 2011]


Feature encoding
• Hard/soft assignment to clusters

Histogram encoding Kernel codebook encoding

Locality constrained encoding Fisher encoding


[Chatfieldet al. BMVC 2011]
Fisher vector encoding
• Fit Gaussian Mixture Models

• Posterior probability

• First and second order differences to cluster k

[Perronnin et al. ECCV 2010]


Performance comparisons

• Fisher vector encoding outperforms others


• Higher-order statistics helps

[Chatfieldet al. BMVC 2011]


But what about spatial layout?

All of these images have the same color histogram


Spatial pyramid

Compute histogram in each spatial bin


Spatial pyramid

High number of features – PCA to reduce dimensionality

[Lazebnik et al. CVPR 2006]


Pooling

• Average/max pooling

=avg/max

Source: Unsupervised Feature


Learning and Deep Learning

• Second-order pooling
[Joao et al. PAMI 2014]

=avg/max
2012 ImageNet 1K
(Fall 2012)

40

35

30

25

20
Error

15

10

0
CE d am IA rd ISI isio
n
R-X
R
te r / I NR Ox
fo r V
LEA ms CE e
of
A XR Sup
U.
2012 ImageNet 1K
(Fall 2012)

40

35

30

25

20
Error

15

10

0
CE d am IA rd ISI isio
n
R-X
R
te r / I NR Ox
fo r V
LEA ms CE e
of
A XR Sup
U.
Shallow vs. deep learning
Label
Dense
Dense
• Engineered vs. learned
Dense
Dense
features
Dense
Dense

Convolution
Convolution

Label Convolution
Convolution

Classifier
Classifier Convolution
Convolution

Pooling
Pooling Convolution
Convolution

Feature
Feature extraction
extraction Convolution
Convolution

Image
Image Image
Image
Gradient-Based Learning Applied to Document
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998

Imagenet Classification with Deep Convolutional Neural


Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Slide Credit: L. Zitnick
U s
GP
Gradient-Based Learning Applied to Document

+
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998

t a *
D a
Imagenet Classification with Deep Convolutional Neural
* Rectified activations and dropout
Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Slide Credit: L. Zitnick
Convolutional activation features

[Donahue et al. ICML 2013]

CNN Features off-the-shelf:


an Astounding Baseline for Recognition
[Razavian et al. 2014]
Region representation
• Segment the image into superpixels
• Use features to represent each image segment

Joseph Tighe and Svetlana Lazebnik


Region representation
• Color, texture, BoW
• Only computed within the local region

• Shape of regions

• Position in the image


Working with regions
• Spatial support is important –
multiple segmentation

Geometric context [Hoiem et al. ICCV 2005]


• Spatial consistency – MRF smoothing
Things to remember

• Visual categorization help transfer knowledge

• Image features
• Coverage, concision, directness
• Color, gradients, textures, motion, descriptors
• Histogram, feature encoding, and pooling
• CNN as features

• Image/region categorization
Next lecture –
Foundations of Deep Learning
Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image

You might also like