0% found this document useful (0 votes)

14 views

Object Detection

The document discusses object detection and summarizes the evolution of datasets and methods over time. It describes early datasets focused on single categories like faces and pedestrians. It then summarizes broader datasets like PASCAL VOC, COCO that cover more categories and challenges. The document reviews techniques like sliding windows, object proposals using segmentation, and the R-CNN approach that leveraged region proposals and CNN features for classification and bounding box regression.

Uploaded by

Atul Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Object Detection

Uploaded by

Atul Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Object detection

Biplab Banerjee
The object detection problem
Datasets

• Face detection
• One category: face
• Frontal faces
• Fairly rigid, unoccluded

1990’s
Human Face Detection in Visual Scenes. H. Rowley, S. Baluja, T. Kanade. 1995.
Pedestrians

• One category:
pedestrians
• Slight pose variations
and small distortions
• Partial occlusions
Faces

1990’s 2000’
s Histograms of Oriented Gradients for Human Detection. N. Dalal and B. Triggs. CVPR 2005
PASCAL VOC

• 20 categories
• 10K images
• Large pose variations,
heavy occlusions
• Generic scenes
• Cleaned
Faces
up
performance metric
1990’s 2000’ 2007 -
s 2012
Coco

• 80 diverse categories
• 100K images
• Heavy occlusions,
many objects per
image, large scale
variations
Faces

1990’s 2000’ 2007 - 2014 -

s 2012
Evaluation metric
Matching detections to ground truth
Why is detection hard(er)?

• Precise localization
Why is detection hard(er)?

• Much larger impact of pose

Why is detection hard(er)?
• Occlusion makes localization difficult
Why is detection hard(er)?

• Counting
Why is detection hard(er)?

• Small objects
Object Detection
deer

cat
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
with Sliding Window
deer?
CNN cat?
background?
Problems with sliding window approach

1. Fine-tune the CNN with this new training data

2. Pass the sliding windows through the CNN for binary classification
3. Huge computational cost! Can we do better?
Dealing with scale
Dealing with scale
• Use same window size, but run on image pyramid
Scanning window results on PASCAL
VOC 2007 VOC 2010

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

35.1%
2013)
Regionlets (Wang et al. 2013) 41.7% 39.7%

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Slide credit : Ross
Girshick
Idea 2: Object proposals
• Use segmentation to produce ~5K candidates

Selective Search for Object Recognition

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders
In International Journal of Computer Vision 2013.
Idea 2: object proposals
• Many different segmentation algorithms (k-means on color, k-means
on color+position, N-cuts….)
• Many hyperparameters (number of clusters, weights on edges)
• Try everything!
• Every cluster is a candidate object
• Thousands of segmentations -> thousands of candidate objects
Idea 2: Object proposals
• Tens of ways of
generating candidates
(“proposals”)
• What fraction of ground
truth objects have
proposals near them?

What makes for effective detection proposals? J. Hosang, R. Benenson, P. Dollar, B. Schiele. In TPAMI
What do we do with proposals?
• Each proposal is a group of pixels
• Take tight fitting box and classify it
• Can leverage any image classification approach

Horse
Proposal methods results
VOC 2007 VOC 2010

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

35.1%
2013)
Regionlets (Wang et al. 2013) 41.7% 39.7%

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Slide credit : Ross
Girshick
Proposal methods results
VOC 2007 VOC 2010

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

35.1%
2013)
Regionlets (Wang et al. 2013) 41.7% 39.7%

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Slide credit : Ross
Girshick
So, we do this
A better approach

Classification + Regression
R-CNN: Regions with CNN features

Input Extract region Compute CNN Classify regions

image proposals (~2k / image) features (linear SVM)

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
R. Girshick, J. Donahue, T. Darrell, J. Malik Slide credit : Ross
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014 Girshick
R-CNN at test time: Step 2

Input Extract region Compute CNN

image proposals (~2k / image) features

a. Crop Slide credit : Ross

Girshick
R-CNN at test time: Step 2

Input Extract region Compute CNN

image proposals (~2k / image) features

227 x 227

a. Crop b. Scale (anisotropic) Slide credit : Ross

Girshick
R-CNN at test time: Step 2

Input Extract region Compute CNN

image proposals (~2k / image) features

c. Forward propagate
1. Crop b. Scale (anisotropic) Slide credit : Ross
Output: “fc7” features Girshick
R-CNN at test time: Step 3

Input Extract region Compute CNN Classify

image proposals (~2k / image) features regions

person? 1.6
...

horse? -0.3
...

4096-dimensional linear classifiers

Warped proposal Slide credit : Ross
fc7 feature vector (SVM or softmax) Girshick
Step 4: Object proposal refinement

Linear regression

on CNN features

Original Predicted
proposal object bounding box

Bounding-box regression

Slide credit : Ross

Girshick
R-CNN results on PASCAL
VOC 2007 VOC 2010

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

35.1%
2013)
Regionlets (Wang et al. 2013) 41.7% 39.7%

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Slide credit : Ross
Girshick
R-CNN results on PASCAL
VOC 2007 VOC 2010

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

35.1%
2013)
Regionlets (Wang et al. 2013) 41.7% 39.7%

SegDPM (Fidler et al. 2013) 40.4%

R-CNN 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Slide credit : Ross
Girshick
Training R-CNN
• Train convolutional network on ImageNet classification
• Finetune on detection
• Classification problem!
• Proposals with IoU > 50% are positives
• Sample fixed proportion of positives in each batch because of imbalance
Other details - Non-max suppression

0.
9
0. How do we deal with
8 multiple detections on the
same object?
Other details - Non-max suppression
• Go down the list of detections starting from highest scoring
• Eliminate any detection that overlaps highly with a higher scoring
detection
• Separate, heuristic step
Selective search
Fine-tune the CNN
Bounding box regressor
Bounding box regressor
Normalized difference between predicted and true box

Learnable
parameter
Fast r-CNN
Fast r-CNN a closer look
Time comparison
Two issues
• How to find the location in the feature maps for a given roi

• How to re-shape the rois in the feature maps so they can be fed to
the fc layers
Transform the original roi into feature maps
Problems
• The conversion may have quantization problem.
• Remember each box is represented by (x, y, w, h)
• Since the reduction is 1/16th the original image size in VGG, x/16, y/16
may be fractions.
Green – displacement
Blue – loss of information
Roipool and Roialign
Roi-Pool
Quantization twice
Faster r-CNN
Can we get rid off the proposal
generation by an ad-hoc technique?
Region proposal network
RPN
Faster rCNN training
Mask r-CNN

 Faster r-CNN detector + FCN segmentation

 Binary segmentation inside each bounding box
Yolo
• Extremely fast (45 frames per second)

• Reason globally on the entire image

• Can learn generalizable representations

Stages
Each cell predicts B=2 boxes(x,y,w,h) and confidences of each
box: P(Object)
Hence, this cell is responsible for
predicting the detection for DOG
Comparisons
Evaluations
Precision and recall
• True positive
• True negative
• False positive
• False negative
Some interpretations
Some interpretations
PR curve, for different confidence
Average Precision and mean AP (for N classes)

If all points are taken, AP is also called the AUC

Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Object Detection
No ratings yet
Object Detection
76 pages
Dlcvd3l4objects 160803161336
No ratings yet
Dlcvd3l4objects 160803161336
31 pages
YOLO FAMILY
No ratings yet
YOLO FAMILY
40 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
Dlcv2017d2l4objectdetection 170622143747
No ratings yet
Dlcv2017d2l4objectdetection 170622143747
50 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
Object Detection1
No ratings yet
Object Detection1
29 pages
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
No ratings yet
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
6 pages
CS60010_CNN 4
No ratings yet
CS60010_CNN 4
32 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
10 R CNN
No ratings yet
10 R CNN
28 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
The Framework For Object Detection: Generalized R-CNN
No ratings yet
The Framework For Object Detection: Generalized R-CNN
127 pages
mv_cs4243_2024_amir_6_p2 (1)
No ratings yet
mv_cs4243_2024_amir_6_p2 (1)
95 pages
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
No ratings yet
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
4 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
R-CNN Minus R: Karel Lenc Andrea Vedaldi
No ratings yet
R-CNN Minus R: Karel Lenc Andrea Vedaldi
9 pages
Object Detection
No ratings yet
Object Detection
57 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
1412.1441v3
No ratings yet
1412.1441v3
10 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Week 5 - Fast RCNN
No ratings yet
Week 5 - Fast RCNN
17 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Project 3 Q&A: Jonathan Krause
No ratings yet
Project 3 Q&A: Jonathan Krause
58 pages
Deep Learning Algorithms For Object Detection
No ratings yet
Deep Learning Algorithms For Object Detection
43 pages
Yolo
No ratings yet
Yolo
24 pages
Lecture 7 Deep Learning in Object Detection 2025
No ratings yet
Lecture 7 Deep Learning in Object Detection 2025
43 pages
DINTA Object Recognition
No ratings yet
DINTA Object Recognition
47 pages
2.ObjectDetection Two Stage
No ratings yet
2.ObjectDetection Two Stage
66 pages
L10-Lecture-Detection.Segmentation-v2.5
No ratings yet
L10-Lecture-Detection.Segmentation-v2.5
35 pages
Lecture Paola Object Detection
No ratings yet
Lecture Paola Object Detection
29 pages
7 11 - Apr - DL
No ratings yet
7 11 - Apr - DL
82 pages
lenc15rcnn(1)
No ratings yet
lenc15rcnn(1)
12 pages
Module 6
No ratings yet
Module 6
83 pages
Faster R-CNN_ Deep Dive Into Object Detection.pptx
No ratings yet
Faster R-CNN_ Deep Dive Into Object Detection.pptx
31 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Introduction To Object Detection
No ratings yet
Introduction To Object Detection
24 pages
5638 Faster R CNN Towards Real Time Object Detection With Region Proposal Networks
No ratings yet
5638 Faster R CNN Towards Real Time Object Detection With Region Proposal Networks
9 pages
Unit 3-Non CNN approaches to object recognition
No ratings yet
Unit 3-Non CNN approaches to object recognition
26 pages
8 ObectDectection
No ratings yet
8 ObectDectection
60 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
Bottom-Up Object Detection by Grouping Extreme and Center Points
No ratings yet
Bottom-Up Object Detection by Grouping Extreme and Center Points
10 pages
Understanding and Implementing Faster R-CNN _ by Rishabh Singh _ Medium
No ratings yet
Understanding and Implementing Faster R-CNN _ by Rishabh Singh _ Medium
14 pages
Fully Convolutional Neural Network
No ratings yet
Fully Convolutional Neural Network
31 pages
Center Net
No ratings yet
Center Net
12 pages
Xequenceai Social Media Manager Internship November 12 2022
No ratings yet
Xequenceai Social Media Manager Internship November 12 2022
1 page
Omt HB CGT Pub at Merged
No ratings yet
Omt HB CGT Pub at Merged
6 pages
3 cs772 Skip Gram Perceptron Week of 17jan22
No ratings yet
3 cs772 Skip Gram Perceptron Week of 17jan22
69 pages
1 cs772 Introduction Week of 3jan22
No ratings yet
1 cs772 Introduction Week of 3jan22
53 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Paper - Review - 2 - EfficientDet - Scalable and Efficient Object Detection
No ratings yet
Paper - Review - 2 - EfficientDet - Scalable and Efficient Object Detection
2 pages
cs224n-2021-LSTM NN
No ratings yet
cs224n-2021-LSTM NN
59 pages
VAE Continued: Biplab Banerjee
No ratings yet
VAE Continued: Biplab Banerjee
23 pages
Loss and Opt
No ratings yet
Loss and Opt
22 pages
Exam Schedule BA BBA
No ratings yet
Exam Schedule BA BBA
1 page
Generative Adversarial Networks: Biplab Banerjee
No ratings yet
Generative Adversarial Networks: Biplab Banerjee
54 pages
Paper Review 1 Gradcam++
No ratings yet
Paper Review 1 Gradcam++
2 pages
MA540 Sheet 6
No ratings yet
MA540 Sheet 6
5 pages
Question Bank Pdes
No ratings yet
Question Bank Pdes
26 pages
Wells Far Go
No ratings yet
Wells Far Go
42 pages
Flipkart Product Recommendation System: T. Keerthana, T. Bhavani, N. Suma Priya, V. Sai Prathyusha, K.Santhi Sri
No ratings yet
Flipkart Product Recommendation System: T. Keerthana, T. Bhavani, N. Suma Priya, V. Sai Prathyusha, K.Santhi Sri
8 pages
Midsem1 Merged
No ratings yet
Midsem1 Merged
21 pages
3 MA515 Notes
No ratings yet
3 MA515 Notes
38 pages
2 MA515 Notes
No ratings yet
2 MA515 Notes
44 pages

Object Detection

Uploaded by

Object Detection

Uploaded by

Object detection

1990’s 2000’ 2007 - 2014 -

• Much larger impact of pose

1. Fine-tune the CNN with this new training data

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Selective Search for Object Recognition

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

Input Extract region Compute CNN Classify regions

Input Extract region Compute CNN

a. Crop Slide credit : Ross

Input Extract region Compute CNN

a. Crop b. Scale (anisotropic) Slide credit : Ross

Input Extract region Compute CNN

Input Extract region Compute CNN Classify

4096-dimensional linear classifiers

Slide credit : Ross

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

SegDPM (Fidler et al. 2013) 40.4%

R-CNN Reference systems 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

DPM v5 (Girshick et al. 2011) 33.7% 29.6%

UVA sel. search (Uijlings et al.

SegDPM (Fidler et al. 2013) 40.4%

R-CNN 54.2% 50.2%

R-CNN + bbox regression 58.5% 53.7%

metric: mean average precision (higher is better)

 Faster r-CNN detector + FCN segmentation

• Reason globally on the entire image

• Can learn generalizable representations

If all points are taken, AP is also called the AUC

You might also like