Object detection
CV3DST | Prof. Leal-Taixé 1
Task definition
• Object detection problem
Bounding box. (x,y,w,h)
(x,y)
CV3DST | Prof. Leal-Taixé 2
Task definition
• Object detection problem
Bounding box. (x,y,w,h)
+
class
CV3DST | Prof. Leal-Taixé 3
A bit of history
CV3DST | Prof. Leal-Taixé 4
Traditional object detection methods
• 1. Template matching + sliding window
Template
Image
CV3DST | Prof. Leal-Taixé 5
Traditional object detection methods
• 1. Template matching + sliding window
Image
CV3DST | Prof. Leal-Taixé 6
Traditional object detection methods
• 1. Template matching + sliding window
For every position
you evaluate how
much do the pixels
in the image and
LOW template correlate
correlation
Image
CV3DST | Prof. Leal-Taixé 7
Traditional object detection methods
• 1. Template matching + sliding window
For every position
you evaluate how
much do the pixels
in the image and
template correlate
Image HIGH
CV3DST | Prof. Leal-Taixé
correlation 8
Traditional object detection methods
• Problems of 1. Template matching + sliding window
For every position
you evaluate how
much do the pixels
in the image and
template correlate
Image LOW
CV3DST | Prof. Leal-Taixé
correlation 9
Traditional object detection methods
• Problems of 1. Template matching + sliding window
– Occlusions: we need to see the WHOLE object
– This works to detect a given instance of an object but not
a class of objects
Appearance and
shape changes
Pose changes
CV3DST | Prof. Leal-Taixé 10
Traditional object detection methods
• Problems of 1. Template matching + sliding window
– Occlusions: we need to see the WHOLE object
– This works to detect a given instance of an object but not
a class of objects
– Objects have an unknown position, scale and aspect
ratio, the search space is searched inefficiently with
sliding window
CV3DST | Prof. Leal-Taixé 11
Traditional object detection methods
• 2. Feature extraction + classification
CV3DST | Prof. Leal-Taixé 12
Viola-Jones detector
• 2. Feature extraction + classification
– Learning multiple weak learners to build a strong
classifier
– That is, make many small decisions and combine them
for a stronger final decision
Viola and Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.
CV3DST | Prof. Leal-Taixé 13
Viola-Jones detector
• 2. Feature extraction + classification
Haar features
Viola and Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.
CV3DST | Prof. Leal-Taixé 14
Viola-Jones detector
• 2. Feature extraction + classification
– Step 1: Select your Haar-like features
– Step 2: Integral image for fast feature evaluation
• I can evaluate which parts of the image have highest cross-
correlation with my feature (template)
– Step 3: AdaBoost for to find weak learner
• I cannot possibly evaluate all features at test time for all
image locations
• Learn the best set of weak learners
• Our final classifier is the linear combination of all weak
learners
Viola and Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.
CV3DST | Prof. Leal-Taixé 15
Viola-Jones detector
Viola and Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.
CV3DST | Prof. Leal-Taixé 16
Histogram of Oriented Gradients
• 2. Feature extraction + classification
Gradient: blue arrows show the
gradient, i.e., the direction of
greatest change of the image.
Average gradient image over training samples à gradients provide
shape information. Let us create a descriptor that exploits that.
CV3DST | Prof. Leal-Taixé Dalal and Triggs. Histogram of oriented gradients for human detection. CVPR 2005. 17
Histogram of Oriented Gradients
• 2. Feature extraction + classification
HOG descriptor à Histogram of oriented gradients.
Compute gradients in dense grids, compute gradients and create a
histogram based on gradient direction.
CV3DST | Prof. Leal-Taixé Dalal and Triggs. Histogram of oriented gradients for human detection. CVPR 2005. 18
Histogram of Oriented Gradients
• 2. Feature extraction + classification
– Step 1: Choose your training set of images that contain
the object you want to detect.
– Step 2: Choose a set of images that do NOT contain that
object.
– Step 3: Extract HOG features on both sets.
– Step 4: Train an SVM classifier on the two sets to detect
whether a feature vector represents the object of interest
or not (0/1 classification).
CV3DST | Prof. Leal-Taixé Dalal and Triggs. Histogram of oriented gradients for human detection. CVPR 2005. 19
Histogram of Oriented Gradients
• 2. Feature extraction + classification
HOG features weighted by the positive SVM weights – the ones
used for the pedestrian object classifier.
CV3DST | Prof. Leal-Taixé Dalal and Triggs. Histogram of oriented gradients for human detection. CVPR 2005. 20
Deformable Part Model
• Also based on HOG features, but based on body part
detection à more robust to different body poses
CV3DST | Prof. Leal-Taixé Felzenszwalb et al. A discriminatively trained, multiscale, deformable part model. CVPR 2008. 21
How to move
towards general
object detection?
CV3DST | Prof. Leal-Taixé 22
What defines an object?
• We need a generic, class-agnostic objectness
measure: how likely it is for an image region to
contain an object
Very likely to be
an object
Maybe it is an
object
CV3DST | Prof. Leal-Taixé 23
What defines an object?
• We need a generic, class-agnostic objectness
measure: how likely it is for an image region to
contain an object
• Using this measure yields a number of candidate
object proposals or regions of interest (RoI) where to
focus.
+ classifier
CV3DST | Prof. Leal-Taixé 24
Object proposal methods
• Selective search: van de Sande et al. Segmentation
as selective search for object recognition. ICCV 2011.
• Edge boxes: Zitnick and Dollar. Edge boxes: locating
object proposals from edges. ECCV 2014.
CV3DST | Prof. Leal-Taixé 25
Do we want all proposals?
• Many boxes trying to explain one object
• We need a method to keep only the “best” boxes
CV3DST | Prof. Leal-Taixé 26
Non-Maximum Suppression (NMS)
• Many boxes trying to explain one object
• We need a method to keep only the “best” boxes
CV3DST | Prof. Leal-Taixé 27
Non-Maximum Suppression (NMS)
Start with anchor box i
For another box j
If they overlap
Discard box i if the
score is lower than
the score of j
Overlap = to be defined Score = depends on the task
CV3DST | Prof. Leal-Taixé 28
Region overlap
• We measure region overlap with the Intersection
over Union (IoU) or Jaccard Index:
|A \ B|
J(A, B) =
|A [ B|
B B B
A A A
Intersection Union
CV3DST | Prof. Leal-Taixé 29
Non-Maximum Suppression (NMS)
Start with anchor box i
For another box j
If they overlap
Discard box i if the
score is lower than
the score of j
Overlap = to be defined Score = depends on the task
CV3DST | Prof. Leal-Taixé 30
NMS: the problem
Ground truth positions
CV3DST | Prof. Leal-Taixé Hosang, Benenson and Schiele. A Convnet for Non-Maximum Suppression. 2015 31
NMS: the problem
• Choosing a narrow threshold
Ground truth positions
False positives
Low Precision
CV3DST | Prof. Leal-Taixé Hosang, Benenson and Schiele. A Convnet for Non-Maximum Suppression. 2015 32
NMS: the problem
• Choosing a wider threshold
Ground truth position
False negative
False positive
Low Recall
CV3DST | Prof. Leal-Taixé Hosang, Benenson and Schiele. A Convnet for Non-Maximum Suppression. 2015 33
Non-Maximum Suppression (NMS)
• NMS will be used at test time. Most detection
methods (even Deep Learning ones) use NMS!
CV3DST | Prof. Leal-Taixé 34
Learning-based
detectors
CV3DST | Prof. Leal-Taixé 35
Types of object detectors
• One-stage detectors
Class score (cat,
Classification
dog, person)
Feature
Image
extraction Bounding box
Localization
(x,y,w,h)
• Two-stage detectors
Class score (cat,
Classification
Extraction of dog, person)
Feature
Image object
extraction
proposals Localization Refine bounding box
(Δx, Δy, Δw, Δh)
CV3DST | Prof. Leal-Taixé 36
Types of object detectors
• One-stage detectors
– YOLO, SSD, RetinaNet
– CenterNet, CornerNet, ExtremeNet
• Two-stage detectors
– R-CNN, Fast R-CNN, Faster R-CNN
– SPP-Net, R-FCN, FPN
CV3DST | Prof. Leal-Taixé 37
Object detection
CV3DST | Prof. Leal-Taixé 38