Object Detection
Object Detection
Biplab Banerjee
The object detection problem
Datasets
• Face detection
• One category: face
• Frontal faces
• Fairly rigid, unoccluded
1990’s
Human Face Detection in Visual Scenes. H. Rowley, S. Baluja, T. Kanade. 1995.
Pedestrians
• One category:
pedestrians
• Slight pose variations
and small distortions
• Partial occlusions
Faces
1990’s 2000’
s Histograms of Oriented Gradients for Human Detection. N. Dalal and B. Triggs. CVPR 2005
PASCAL VOC
• 20 categories
• 10K images
• Large pose variations,
heavy occlusions
• Generic scenes
• Cleaned
Faces
up
performance metric
1990’s 2000’ 2007 -
s 2012
Coco
• 80 diverse categories
• 100K images
• Heavy occlusions,
many objects per
image, large scale
variations
Faces
• Precise localization
Why is detection hard(er)?
• Counting
Why is detection hard(er)?
• Small objects
Object Detection
deer
cat
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
with Sliding Window
deer?
CNN cat?
background?
Problems with sliding window approach
What makes for effective detection proposals? J. Hosang, R. Benenson, P. Dollar, B. Schiele. In TPAMI
What do we do with proposals?
• Each proposal is a group of pixels
• Take tight fitting box and classify it
• Can leverage any image classification approach
Horse
Proposal methods results
VOC 2007 VOC 2010
Classification + Regression
R-CNN: Regions with CNN features
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
R. Girshick, J. Donahue, T. Darrell, J. Malik Slide credit : Ross
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014 Girshick
R-CNN at test time: Step 2
227 x 227
c. Forward propagate
1. Crop b. Scale (anisotropic) Slide credit : Ross
Output: “fc7” features Girshick
R-CNN at test time: Step 3
person? 1.6
...
horse? -0.3
...
Linear regression
on CNN features
Original Predicted
proposal object bounding box
Bounding-box regression
0.
9
0. How do we deal with
8 multiple detections on the
same object?
Other details - Non-max suppression
• Go down the list of detections starting from highest scoring
• Eliminate any detection that overlaps highly with a higher scoring
detection
• Separate, heuristic step
Selective search
Fine-tune the CNN
Bounding box regressor
Bounding box regressor
Normalized difference between predicted and true box
Learnable
parameter
Fast r-CNN
Fast r-CNN a closer look
Time comparison
Two issues
• How to find the location in the feature maps for a given roi
• How to re-shape the rois in the feature maps so they can be fed to
the fc layers
Transform the original roi into feature maps
Problems
• The conversion may have quantization problem.
• Remember each box is represented by (x, y, w, h)
• Since the reduction is 1/16th the original image size in VGG, x/16, y/16
may be fractions.
Green – displacement
Blue – loss of information
Roipool and Roialign
Roi-Pool
Quantization twice
Faster r-CNN
Can we get rid off the proposal
generation by an ad-hoc technique?
Region proposal network
RPN
Faster rCNN training
Mask r-CNN