0% found this document useful (0 votes)
7 views

Object Detection1

Uploaded by

singhkirti61634
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Object Detection1

Uploaded by

singhkirti61634
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

• Object Detection

• The RCNN Object Detector (2014)


• The Fast RCNN Object Detector (2015)
• The Faster RCNN Object Detector (2016)

• YOLO (CVPR 2016)


• SSD (ECCV 2016)
Object Detection

deer

cat
Object Detection

Class Scores
Deer: 0.9
Fully Connected: Cat: 0.05
4096 to k Umbrella: 0.01

Fully Connected:
4096 to 4 Box Coordinates
(x, y, w, h)
Object Detection

4096 Deer: (x, y, w, h)


Cat: (x, y, w, h)
Object Detection

Penguin: (x, y, w, h)
4096 Penguin: (x, y, w, h)
Penguin: (x, y, w, h)
Penguin: (x, y, w, h)

Object Detection as Classification

deer?
CNN cat?
background?
Object Detection as Classification

deer?
CNN cat?
background?
Object Detection as Classification

deer?
CNN cat?
background?
Object Detection as Classification with Sliding
Window

deer?
CNN cat?
background?
Object Detection as Classification with Box
Proposals
RCNN

https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Rich feature hierarchies for accurate object detection and semantic segmentation.
Girshick et al. CVPR 2014.
RCNN
First stage: generate category-
independent region proposals.
• 2000 Region proposals for every image

Selective Search: combine the strength of


both an exhaustive search and segmentation.
Uijlings et al. IJCV 2013.
ref
RCNN
First stage: generate category-
independent region proposals.
• 2000 Region proposals for every image

Second stage: extracts a fixed-length


feature vector from each region.
• a 4096-dimensional feature vector
from each region proposal
warp feature vector
CNN

Arbitrary rectangles? 5 conv layers + 2 fully


A fixed size input? 227 x 227 connected layers
RCNN
First stage: generate category-
independent region proposals.
• 2000 Region proposals for every image

Second stage: extracts a fixed-length


feature vector from each region.
• a 4096-dimensional feature vector
from each region proposal people?
feature vector
linear horse?
svm
Third stage: a set of class- specific background?
linear SVMs.
x
• object category and location Bounding box y
regression w
h
proposal
location
RCN Fast-
• Nand scalable.
Simple
RCNN
• improves mAP.

• A multistage pipeline.
• Training is expensive in
space and time (features
are extracted from each
region proposal in each
?
image and written into
disk).
• Object detection is slow.
Fast-RCNN

Idea: No need to recompute fea-


https://arxiv.org/abs/1504.08083 tures for every box independently
Fast R-CNN. Girshick. ICCV 2015.
Fast-RCNN

Process the whole image with


several convolutional (conv) and
max pooling layers to produce a a region of interest (RoI)
conv feature map. pooling layer extracts a
fixed-length feature vector
from the region feature map. FC+
K + 1 categories
feature vector softmax

+ four real-valued
FC+ numbers for each of
regressor the K object classes.


RCNN vs Fast-RCNN

Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf


RCN Fast- Faster-
• Nand scalable.
Simple •
RCNN
Higher mAP. RCNN
• improves mAP. • Single stage, end-to-end
training.
• No disk storage is required
• A multistage pipeline. for feature caching.
• Training is expensive in
space and time (features
are extracted from each
region proposal in each
• proposals are the
computational
?
image and written into bottleneck in
disk). detection systems.
• Object detection is slow.
Faster-RCNN

Idea: Integrate the Bounding Box


Proposals as part of the CNN predic-
tions
https://arxiv.org/abs/1506.01497
Ren et al. NIPS 2015.
Faster-RCNN
Region Proposal Networks:

k anchors boxes
2k scores 4k coordinates

object or not object bounding box proposal RPN


1x1 conv layer 1x1 conv layer
cls layer reg layer

nxn conv layer Shared conv layers

Fast-RCNN

feature map
sliding window, nxn

RCNN vs Fast-RCNN

Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf


RCN Fast- Faster-
• Nand scalable.
Simple •
RCNN
Higher mAP. • RCNN
compute proposals with a
• improves mAP. • Single stage, end-to-end deep convolutional neural
training. network --Region Proposal
• No disk storage is required Network (RPN)
• A multistage pipeline. for feature caching. • merge RPN and Fast R-CNN
• Training is expensive in into a single network,
space and time (features enabling nearly cost-free
are extracted from each • proposals are the
region proposals.
region proposal in each computational
image and written into bottleneck in
detection systems.

?
disk).
• Object detection is slow.
YOLO- You Only Look Once

Idea: No bounding box proposal.


A single regression problem,
straight from image pixels to
bounding box coordinates and
class probabilities.

• extremely fast
• reason globally
• learn generalizable represen-
tations

https://arxiv.org/abs/1506.02640
Redmon et al. CVPR 2016.
YOLO- You Only Look Once

Divide the image into 7x7 cells.


Each cell trains a detector.
The detector needs to predict the object’s class distributions.
The detector has 2 bounding-box predictors to predict
bounding-boxes and confidence scores.
Non-Max Suppression
Non-Max Suppression
Questions?

You might also like