IT5409 Ch7 Part1 Object Detection v2
IT5409 Ch7 Part1 Object Detection v2
3
Object Detection
• Problem: Detecting and localizing generic objects
from various categories, such as cars, people, etc.
• Challenges:
‒ Illumination,
‒ viewpoint,
‒ deformations,
‒ Intra-class
variability
4
Window-based generic
object detection
Basic pipeline
5
Generic category recognition:
basic framework
• Build/train object model
‒ Choose a representation
‒ Learn or fit parameters of model / classifier
6
Window-based models
Building an object model
Given the representation, train a binary classifier
Car/non-car
Classifier
No,Yes,
notcar.
a car.
Car/non-car
Classifier
9
Window-based models
Generating and scoring candidates
• But what if we were looking
for buses?
No bus found!
Bus found
10
Multi-scale sliding window
• Work with multiple size windows
11
Window-based object detection: recap
Training:
1. Obtain training data
2. Define features
3. Define classifier
Car/non-car
Classifier
Feature
extraction
Features
• HOG
• Haar features, …
Discriminative classifier construction
Nearest neighbor Neural networks
106 examples
14
Boosting classifiers
15
Boosting intuition
Weak
Classifier 1
Weights
Increased
17
Boosting illustration
Weak
Classifier 2
18
Boosting illustration
Weights
Increased
19
Boosting illustration
Weak
Classifier 3
20
Boosting illustration
Final classifier is
a combination of weak
classifiers
21
Boosting: training
• Initially, weight each training example equally
• In each boosting round:
‒ Find the weak learner that achieves the lowest weighted training error
‒ Raise weights of training examples misclassified by current weak
learner
23
Viola-Jones face detector
24
Viola-Jones face detector
Main idea:
‒ Represent local texture with efficiently
computable “rectangular” features within window
of interest
‒ Select discriminative features to be weak classifiers
‒ Use boosted combination of them as final classifier
‒ Form a cascade of such classifiers, rejecting clear
negatives quickly
25
Viola-Jones detector: features
• “Rectangular” filters
Feature output is difference
between adjacent regions
Value at (x,y) is
• Efficiently computable sum of pixels
above and to the
with integral image: left of (x,y)
any sum can be
computed in constant
time.
Integral image
Lana Lazebnik
27
Computing the integral image
ii(x, y-1)
s(x-1, y)
i(x, y)
Lana Lazebnik
29
Viola-Jones detector: features
• “Rectangular” filters
Feature output is difference
between adjacent regions
Value at (x,y) is
• Efficiently computable with sum of pixels
above and to the
integral image: any sum left of (x,y)
can be computed in
constant time
Avoid scaling images →
scale features directly for Integral image
same cost
30
Viola-Jones detector: features
Considering all
possible filter
parameters: position,
scale, and type:
180,000+ possible
features associated
with each 24 x 24
window
Which subset of these features should we use
to determine if a window has a face?
Use AdaBoost both to select the informative features
and to form the classifier
31
Viola-Jones detector: AdaBoost
• Want to select the single rectangle feature and threshold
that best separates positive (faces) and negative (non-
faces) training examples, in terms of weighted error.
Evaluate
weighted error
for each feature,
pick best.
34
• Even if the filters are fast to compute, each
new image has a lot of possible windows to
search.
35
Cascading classifiers for detection
37
Viola-Jones detector: summary
Train cascade of
classifiers with
AdaBoost
Faces
New image
Selected features,
Non-faces thresholds, and weights
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
39
Viola-Jones Face Detector: Results
40
Viola-Jones Face Detector: Results
41
Viola-Jones Face Detector: Results
42
Detecting profile faces?
Can we use the same detector?
43
Viola-Jones Face Detector: Results
45
46
Slide: Kristen Grauman
47
Consumer application: iPhoto
http://www.apple.com/ilife/iphoto/
Slide credit: Lana Lazebnik
48
Consumer application: iPhoto
http://www.maclife.com/article/news/iphotos_faces_recognizes_cats
50
Privacy Gift Shop – CV Dazzle
• http://www.wired.com/2015/06/facebook-can-recognize-even-dont-show-face/
• Wired, June 15, 2015
51
Boosting: pros and cons
• Advantages of boosting
‒ Integrates classification with feature selection
‒ Complexity of training is linear in the number of training examples
‒ Flexibility in the choice of weak learners, boosting scheme
‒ Testing is fast
‒ Easy to implement
• Disadvantages
‒ Needs many training examples
‒ Other discriminative models may outperform in practice (SVMs,
CNNs,…)
• especially for many-class problems
53
SVM + HOG for human detection
as case study
54
Linear classifiers
55
Linear classifiers
• Find linear function to separate positive and negative
examples
x i positive : xi w + b 0
x i negative : xi w + b 0
Which line
is best?
56
Support Vector Machines (SVMs)
• Discriminative
classifier based on
optimal separating
line (for 2d case)
57
Support vector machines
• Want line that maximizes the margin
x i positive ( yi = 1) : xi w + b 1
x i negative ( yi = −1) : x i w + b −1
58
Support vector machines
• Want line that maximizes the margin
x i positive ( yi = 1) : xi w + b 1
x i negative ( yi = −1) : x i w + b −1
59
Support vector machines
• Want line that maximizes the margin
x i positive ( yi = 1) : xi w + b 1
x i negative ( yi = −1) : x i w + b −1
60
Finding the maximum margin line
1. Maximize margin 2/||w||
2. Correctly classify all training data points:
x i positive ( yi = 1) : xi w + b 1
x i negative ( yi = −1) : x i w + b −1
1 T
Minimize 2
w w
Subject to yi(w·xi+b) ≥ 1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
61
Finding the maximum margin line
• Solution: w = i i yi x i
learned Support
weight vector
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
62
Finding the maximum margin line
• Solution: w = i i yi x i
b = yi – w·xi (for any support vector)
w x + b = i i yi x i x + b
• Classification function:
f ( x) = sign (w x + b)
= sign ( y x x + b)
i i i i
63
Person detection
with HoG’s & linear SVM’s
• Histogram of oriented gradients (HoG):
‒ Map each grid cell in the input window to a histogram
counting the gradients per orientation.
• Train a linear SVM
‒ using training set of pedestrian vs. non-pedestrian
windows.
65
Window-based detection: strengths
• Sliding window detection and global appearance
descriptors:
‒ Simple detection protocol to implement
‒ Good feature choices critical
‒ Past successes for certain classes
72
Object proposals
Main idea:
• Learn to generate category-independent regions/boxes
that have object-like properties.
• Let object detector search over “proposals”, not
exhaustive sliding windows
Multi-scale
saliency
Color
contrast
79
Deformable Part Model (DPM)
• References
‒ Pedro F. Felzenszwalb & Daniel P. Huttenlocher, Pictorial Structures for Object
Recognition, IJCV 2005
• https://www.cs.cornell.edu/~dph/papers/pict-struct-ijcv.pdf
‒ P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object
detection with discriminatively trained part based models. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010
80
Object detection: Evaluation
81
Object Detection Benchmarks
• PASCAL VOC Challenge
• ImageNet Large Scale Visual Recognition Challenge
(ILSVR)
‒ 200 Categories for detection
82
How do we evaluate object detection?
predictions
ground truth
True positive:
- The overlap of the prediction
with the ground truth is MORE
than a threshold value (0.5)
83
How do we evaluate object detection?
predictions
ground truth
True positive:
False positive:
- The overlap of the prediction
with the ground truth is LESS
than a threshold value (0.5)
84
How do we evaluate object detection?
predictions
ground truth
True positive:
False positive:
False negative:
- The objects that our model
doesn’t find
85
How do we evaluate object detection?
predictions
ground truth
True positive:
False positive:
False negative:
- The objects that our model
doesn’t find
86
𝑇𝑃
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
87
How do we evaluate object detection?
predictions
ground truth
True positive: 1
False positive: 2
False negative: 1
So what is the
- precision?
- recall?
88
Precision versus recall
• Precision:
‒ how many of the object detections
are correct?
𝑇𝑃
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
• Recall:
‒ how many of the ground truth objects 𝑇𝑃
can the model detect? 𝑟𝑒𝑐𝑎𝑙𝑙 =
‒ True Positive Rate (TPR)
𝑇𝑃 + 𝐹𝑁
89
• In reality, our model makes a lot of predictions with varying scores
between 0 and 1
predictions
ground truth
90
How do we evaluate object detection?
predictions
ground truth
91
Precision – recall curve (PR curve)
92
Which model is the best?
93
Which model is the best?
94
Which model is the best?
AP: The metric calculates the average precision (AP) for each
class individually across all of the IoU thresholds
95
Summary
• Object recognition as classification task
‒ Boosting (face detection ex)
‒ Support vector machines and HOG (human detection
ex)
‒ Sliding window search paradigm
• Pros and cons
• Speed up with attentional cascade
• Object proposals, proposal regions as alternative
96
References
Most of these slides were adapted from:
97
Thank
you!
98