0% found this document useful (0 votes)

18 views47 pages

CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)

Uploaded by

krishna s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views47 pages

CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)

Uploaded by

krishna s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

CS7015 (Deep Learning) : Lecture 12

Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once
(YOLO)

Mitesh M. Khapra

Department of Computer Science and Engineering

Indian Institute of Technology Madras

1/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Acknowledgements
Some images borrowed from Ross Girshick’s original slides on RCNN, Fast
RCNN, etc.
Some ideas borrowed from the presentation of Kaustav Kundu∗
∗
Deep Object Detection

2/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Module 12.1 : Introduction to object detection

3/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
So far we have looked at Image Classification
We will now move on to another Image Processing Task - Object Detection

4/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Task Image classification Object Detection

Output Car Car, exact bound-

ing box contain-
ing car

5/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Classifier
person flag ball none

x1 x2 ... xd

Let us see a typical pipeline for object detection

It starts with a region proposal stage where we identify potential regions which
may contain objects
We could think of these regions as mini-images 6/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Bounding box regression

h
h∗
x1 x2 ... xd
w w∗
h h∗
w w∗
h h∗
w∗
w

In addition we would also like to correct the proposed bounding boxes

This is posed as a regression problem (for example, we would like to predict w∗ ,
h∗ from the proposed w and h) 7/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Classifier

Let us see how these three compon-

ents have evolved over time
Pre 2012 Propose all possible regions in the
image of varying sizes (almost brute
RCNN
force)
Fast RCNN Use handcrafted features (SIFT,
HOG)
Faster RCNN
Train a linear classifier using these
features
We will now see three algorithms that
progressively improve these compon-
ents

8/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Module 12.2 : RCNN model for object detection

9/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region
Region
Proposals
Proposals
tion

10
5
Bounding Box
10 5
Regression

Selective Search for region proposals

Does hierarchical clustering at different scales
For example the figures from left to right show
clusters of increasing sizes
Such a hierarchical clustering is important
as we may find different objects at different
scales 10/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region
Region
Proposals
Proposals
tion

10
5
Bounding Box
10 5
Regression

Proposed regions are cropped to form mini im-

ages
Each mini image is scaled to match the CNN’s
(feature extractor) input size

11/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region Proposals
tion

10
5
Bounding Box
10 5
Regression

For feature extraction any CNN

trained for Image Classification can
fc7 be used (AlexNet/ VGGNet etc.)
Outputs from fc7 layer are taken as
10

10 5
5
features
CNN is fine tuned using ground truth
(cropped) object images
12/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region Proposals
tion

10
5
Bounding Box
10 5
Regression

...

Linear models (SVMs) are used for classification (1 model per class)
13/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region Proposals
tion

10
5
Bounding Box
10 5
Regression

N
X x∗ − x
min − w1T z
w
i=1
h (x,y) h∗(x∗ ,y ∗ ) The proposed regions may not be perfect
w w∗
We want to learn four regression models which will
learn to predict x∗ , y ∗ , w∗ , h∗
Proposed Box True Box We will see their respective objective functions
z : features from pool5 layer of the network N ∗
X x −x 2
min − w1T z
w 14/47
∗ −x
i=1
MiteshxM. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

WCON V Wclassif ier

..
Feature Extrac- .
Input Region
Region
Proposals
Proposals
tion

10
5
Bounding Box
10 5
Regression
Wregression

What are the parameters of this model?

WCON V is taken as it is from a CNN trained for Image classification (say on
ImageNet)
WCON V is then fine tuned using ground truth (cropped) object images
Wclassif ier is learned using ground truth (cropped) object images
Wregression is learned using ground truth bounding boxes
15/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classifier

..
Feature Extrac- .
Input Region
Region
Proposals
Proposals
tion

10
5
Bounding Box
10 5
Regression

What is the computational cost for processing one image at test time?
Inference Time = Proposal Time + # Proposals × Convolution Time + #
Proposals × classification + # Proposals × regression

16/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
On average selective search
gives 2K region proposal
Each of these pass through
the CNN for feature extrac-
tion
Followed by classification
and regression

Source: Ross Girshick

17/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
No joint learning
Use ad hoc training objectives
Fine tune network with softmax
classifier (log loss)
Train post-hoc linear SVMs (hinge
loss)
Train post-hoc bounding-box re-
gressors (squared loss)
Training (≈ 3 days) and testing (47s
per image) is slow1 .
Takes a lot of disk space

1
Source: Ross Girshick
1
Using VGG-Net 18/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Classifier

Region Proposals: Selective

Search
Pre 2012 Feature Extraction: CNNs
Classifier: Linear
RCNN

19/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Module 12.3 : Fast RCNN model for object detection

20/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Suppose we apply a 3 × 3 kernel on
an image
What is the region of influence of each
pixel in the resulting output ?
Each pixel contributes to a 5 × 5 re-
gion
Suppose we again apply a 3×3 kernel
on this output?
What is the region of influence of the
original pixel from the input ? (a 7×7
region)

21/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
softmax
4

2
22

14
14
7

7
28
56

56
112

112
512
224

224

512 512
256 512
128 256
maxpool Conv maxpool Conv maxpool
64 128 maxpool Conv
64 maxpool Conv
1000
Input Conv fc fc
4096 4096

22/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Using this idea we could get a bound-
ing box’s region of influence on any
layer in the CNN
The projected Region of Interest
(RoI) may be of different sizes
Divide them into k equally sized re-
gions of dimension H × W and do
max pooling in each of those regions
to construct a k dimensional vector
Source: Ross Girshick Connect the k dimensional vector to
a fully connected layer
This max pooling operation is call
RoI pooling
23/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Once we have the FC layer it gives us
the representation of this region pro-
posal
We can then add a softmax layer on
top of it to compute a probability
distribution over the possible object
classes
Similarly we can add a regression
layer on top of it to predict the new
Source: Ross Girshick bounding box (w∗ , h∗ , x∗ , y ∗ )

24/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Recall that the last pooling layer of
W
VGGNet-16 results in an output of
ROI size 512 × 7 × 7
We replace the last max pooling layer
by a RoI pooling layer
Max-pool We set H = W = 7 and divide each
Conv
of these RoIs into (k = 49) regions
We do this for every feature map res-
Input ulting in an ouput of size 512 × 49
This output is of the same size as the
output of the original max pooling
layer

It is thus compatible with the dimen-

sions of the weight matrix connecting
Mitesh M. Khapra
the original pooling layer to the first25/47
CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Classifier

Region Proposals: Selective

Search
Pre 2012 Feature Extraction: CNN
Classifier: CNN
RCNN

Fast RCNN

26/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Module 12.4 : Faster RCNN model for object detection

27/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
classifier So far the region proposals were be-
ing made using Selective Search al-
RoI pooling gorithm
Idea: Can we use a CNN for making
proposals region proposals also?
How? Well it’s slightly tricky
Region Proposal Network We will illustrate this using
feature maps VGGNet

conv layers

image

28/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Consider the output of the last con-
volutional layer of VGGNet
h
Now consider one cell in one of the
512 512 feature maps
w If we apply a 3 × 3 kernel around this
cell then we will get a 1D representa-
tion for this cell
x1 x2 x512
·
If we repeat this for all the 512 feature
maps then we will get a 512 dimen-
x1 x2 x512
· sional representation for this position
We use this process to get a 512 di-
mensional representation for each of
x1 x2 x512
· the w × h positions

29/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
x1 x2 · · · · · x512

We now consider k bounding boxes

(called anchor boxes) of different sizes
& aspect ratio
We are interested in the following two
questions:
Max-pool
Given the 512d representation of a
position, what is the probability that
Conv
a given anchor box centered at this
position contains an object?
(Classification)
How do you predict the true bound-
Input
ing box from this anchor box? (Re-
gression)

30/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
x1 x2 · · · · · x512

We train a classification model and a

regression model to address these two
Max-pool questions
Conv
How do we get the ground truth data?
What is the objective function used
for training?

Input

31/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Consider a ground truth object and
its corresponding bounding box
Classification Regression
Consider the projection of this image
onto the conv5 layer
x1x2 · · · · · ·
Consider one such cell in the output
This cell corresponds to a patch in the
original image
Consider the center of this patch
Max-pool
We consider anchor boxes of different
Conv
sizes
For each of these anchor boxes, we
would want the classifier to predict
Input
1 if this anchor box has a reason-
able overlap (IoU > 0.7) with the true
grounding box 32/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Classification Regression

x1x2 · · · · · ·

We train a classification model and a

regression model to address these two
questions
Max-pool
How do we get the ground truth data?
Conv
What is the objective function used
for training?

Input

33/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
The full network is trained using the following objective.
1 X λ X ∗
L (pi , ti ) = Lcls (pi , p∗i ) + pi Lreg (ti , t∗i )
Ncls Nreg
i i

p∗i = 1 if anchor box contains ground truth object

=0 otherwise
pi = predicted probability of anchor box containing an object
Ncls = batch-size
Nreg = batch-size × k
k = anchor boxes

34/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
So far we have seen a CNN based ap-
Fast RCNN proach for region proposals instead of
using selective search
Region Proposals
We can now take these region propos-
Classification Regression als and then add fast RCNN on top
of it to predict the class of the object
x1x2 · · · · ·x512
And regress the proposed bounding
box

Max-pool

Conv

Input

35/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
But the fast RCNN would again use
Fast RCNN a VGG Net
Region Proposals Can’t we use a single VGG Net and
share the parameters of RPN and
Classification Regression RCNN
x1x2 · · · · ·x512
Yes, we can
In practice, we use a 4 step alternat-
ing training process

Max-pool

Conv

Input

36/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Faster RCNN:Training
Fast RCNN
Fine-tune RPN using a pre-trained
Region Proposals ImageNet network
Fine-tune fast RCNN from a pre-
Classification Regression
trained ImageNet network using
x1x2 · · · · ·x512
bounding boxes from step 1
Keeping common convolutional layer
parameters fixed from step 2, fine-
tune RPN (post conv5 layers)
Keeping common convolution layer
parameters fixed from step 3, fine-
Max-pool
tune fc layers of fast RCNN
Conv

Input

37/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Faster RCNN and RPN are the basis of several 1st place entries in the ILSVRC
and COCO tracks on :
Imagenet detection
COCO Segmentation
Imagenet localization
COCO detection

38/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Region proposals Feature extraction Classifier

Region Proposals: CNN

Feature Extraction: CNN
Pre 2012 Classifier: CNN
RCNN

Fast RCNN

Faster RCNN

39/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Object Detection Performance

Source: Ross Girshick

40/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Module 12.5 : YOLO model for object detection

41/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
classifier
The approaches that we have seen so
far are two stage approaches
RoI pooling They involve a region proposal stage
and then a classification stage
proposals
Can we have an end-to-end architec-
ture which does both proposal and
classification simultaneously ?
Region Proposal Network
This is the idea behind YOLO-You
feature maps
Only Look Once.

conv layers

image

42/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
P (cow) P (truck)
Divide an image into S × S grids
c w h x y · · (S=7)
P (dog)
For each such cell we are interested in
predicting 5 + k quantities
Probability (confidence) that this cell
is indeed contained in a true bound-
ing box
Bounding boxes + confidence
Width of the bounding box
Height of the bounding box
Center (x,y) of the bounding box
S × S grid on input ProbabilityFinalofdetections
the object in the
bounding box belonging to the k th
class (k - values)
The output layer thus contains S ×
Class probability map
S × (5 + k) elements 43/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
How do we interpret this S×S×(5+k)
dimensional output?
For each cell, we are computing a
bounding box, its confidence and the
confidence Boundingboxes
Bounding boxes++confidence
confidence
object in it
We then retain the most confident
bounding boxes and the correspond-
ing object label
SSFinal
××SSgrid
grid oninput
on input
detections Finaldetections
Final detections
Input Image
Bounding Boxes & Confidence

ity map Classprobability

Class probabilitymap
map
44/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
P (cow)
How do we train this network ?
P (truck)
y · · Consider a cell such that the center
cĉ w
ŵ h
ĥ x
x̂ ŷ `ˆ1 `ˆ2 `ˆk
of the true bonding box lies in it
P (dog)
The network is initialized randomly
and it will predict some values for
c, w, h, x, y & `
We can then compute the following
Bounding boxes + confidence
losses
(x − x̂)2
(y − ŷ)2
√ √
( w − ŵ)2
S × S grid on input √ p Final detections
( h − ĥ)2
(1 − ĉ)2
Pk ˆ 2
i=1 (`i − `i )
Class probability map
And train the network to minimize45/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
ĉ ŵ ĥ x̂ ŷ `ˆ1 `ˆ2 · · `ˆk
Now consider a grid which does not
contain any object
For this grid we do not care about the
predictions w, h, x, y & `
But we want the confidence to be low
Bounding boxes + confidence
So we minimize only the following loss

(0 − ĉ)2

S × S grid on input Final detections

Class probability map

46/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12
Method Pascal 2007 mAP Speed
DPM v5 33.7 0.07 FPS — 14 sec/ image
RCNN 66.0 0.05 FPS — 20 sec/ image
Fast RCNN 70.0 0.5 FPS — 2 sec/ image
Faster RCNN 73.2 7 FPS — 140 msec/ image
YOLO 69.0 45 FPS — 22 msec/ image

47/47
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 12

Complete IELTS Bands 4-5 Teachers Manual
100% (1)
Complete IELTS Bands 4-5 Teachers Manual
129 pages
Nationalism in India Shobhit Nirwan
92% (13)
Nationalism in India Shobhit Nirwan
16 pages
The Theology of Jonathan Edwards (PDFDrive)
100% (3)
The Theology of Jonathan Edwards (PDFDrive)
857 pages
DL4CV BonusBundle
No ratings yet
DL4CV BonusBundle
79 pages
Synopsis On Cyber Cafe Management
80% (25)
Synopsis On Cyber Cafe Management
22 pages
Rebels and Devils - The Psychology or Lib - Wilson - Robert Anton
86% (14)
Rebels and Devils - The Psychology or Lib - Wilson - Robert Anton
204 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Deep Learning Algorithms For Object Detection
No ratings yet
Deep Learning Algorithms For Object Detection
43 pages
Object Detection
No ratings yet
Object Detection
96 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Object Detection1
No ratings yet
Object Detection1
29 pages
Lecture Paola Object Detection
No ratings yet
Lecture Paola Object Detection
29 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
YOLO FAMILY
No ratings yet
YOLO FAMILY
40 pages
Object Detection
No ratings yet
Object Detection
76 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
10 R CNN
No ratings yet
10 R CNN
28 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
CS60010_CNN 4
No ratings yet
CS60010_CNN 4
32 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
No ratings yet
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
6 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
Object Detection
No ratings yet
Object Detection
57 pages
7 11 - Apr - DL
No ratings yet
7 11 - Apr - DL
82 pages
Dlcv2017d2l4objectdetection 170622143747
No ratings yet
Dlcv2017d2l4objectdetection 170622143747
50 pages
5638 Faster R CNN Towards Real Time Object Detection With Region Proposal Networks
No ratings yet
5638 Faster R CNN Towards Real Time Object Detection With Region Proposal Networks
9 pages
IMINT Target Acquisition Using Deep Learning
No ratings yet
IMINT Target Acquisition Using Deep Learning
5 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
2.ObjectDetection Two Stage
No ratings yet
2.ObjectDetection Two Stage
66 pages
CornerNet Detecting Objects As Paired Keypoints
No ratings yet
CornerNet Detecting Objects As Paired Keypoints
14 pages
mv_cs4243_2024_amir_6_p2 (1)
No ratings yet
mv_cs4243_2024_amir_6_p2 (1)
95 pages
Dlcvd3l4objects 160803161336
No ratings yet
Dlcvd3l4objects 160803161336
31 pages
Project 3 Q&A: Jonathan Krause
No ratings yet
Project 3 Q&A: Jonathan Krause
58 pages
DINTA Object Recognition
No ratings yet
DINTA Object Recognition
47 pages
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
No ratings yet
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
6 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
The Framework For Object Detection: Generalized R-CNN
No ratings yet
The Framework For Object Detection: Generalized R-CNN
127 pages
Keypoint Density-Based Region Proposal For Fine-Grained Object Detection Using Regions With Convolutional Neural Network Features
No ratings yet
Keypoint Density-Based Region Proposal For Fine-Grained Object Detection Using Regions With Convolutional Neural Network Features
6 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Fast R-CNN
No ratings yet
Fast R-CNN
9 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Fast_R-CNN
No ratings yet
Fast_R-CNN
9 pages
Understanding and Implementing Faster R-CNN _ by Rishabh Singh _ Medium
No ratings yet
Understanding and Implementing Faster R-CNN _ by Rishabh Singh _ Medium
14 pages
Presentation (Theoretical Evaluation) (1)
No ratings yet
Presentation (Theoretical Evaluation) (1)
107 pages
Center Net
No ratings yet
Center Net
12 pages
Faster R-CNN_ Deep Dive Into Object Detection.pptx
No ratings yet
Faster R-CNN_ Deep Dive Into Object Detection.pptx
31 pages
0 Computer Vision Panikzettel
No ratings yet
0 Computer Vision Panikzettel
28 pages
Yolo
No ratings yet
Yolo
24 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
L10-Lecture-Detection.Segmentation-v2.5
No ratings yet
L10-Lecture-Detection.Segmentation-v2.5
35 pages
Real-Time Object Detection Using Deep Learning and Open CV
No ratings yet
Real-Time Object Detection Using Deep Learning and Open CV
4 pages
ref16
No ratings yet
ref16
14 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Deep Learning for Remote Sensing Images with Open Source Software (Rémi Cresson) (Z-Library)
No ratings yet
Deep Learning for Remote Sensing Images with Open Source Software (Rémi Cresson) (Z-Library)
165 pages
DeepLearning_RobotVision
No ratings yet
DeepLearning_RobotVision
9 pages
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
No ratings yet
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
8 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Lab Manual 2019-2020
No ratings yet
Lab Manual 2019-2020
96 pages
Sketchbookv 7 Low
No ratings yet
Sketchbookv 7 Low
34 pages
Introduction To Energy Powerpoint
0% (1)
Introduction To Energy Powerpoint
23 pages
44 PP VS Labuguen
No ratings yet
44 PP VS Labuguen
7 pages
Alphabet-Words-Picture-Quiz-2
No ratings yet
Alphabet-Words-Picture-Quiz-2
6 pages
Health Assessment Week 2 Skills - Updated
No ratings yet
Health Assessment Week 2 Skills - Updated
50 pages
Western Experience 10th Edition Chambers Test Bank - Download Now And Never Miss A Chapter
100% (2)
Western Experience 10th Edition Chambers Test Bank - Download Now And Never Miss A Chapter
58 pages
MID-TERM2 TA3 - Smart Start 3 2022-2023
No ratings yet
MID-TERM2 TA3 - Smart Start 3 2022-2023
8 pages
November 2024
No ratings yet
November 2024
3 pages
"Marketing Plan of Mixed Fruit Juice: Blended" A New Product Added To The Line of PRAN Juice
No ratings yet
"Marketing Plan of Mixed Fruit Juice: Blended" A New Product Added To The Line of PRAN Juice
19 pages
Clavano
No ratings yet
Clavano
47 pages
Parents and Learning: by Sam Redding
No ratings yet
Parents and Learning: by Sam Redding
36 pages
Subject-Verb Agreement Lesson Plan IV
No ratings yet
Subject-Verb Agreement Lesson Plan IV
6 pages
Objective Genitive Faith of Christ PDF
No ratings yet
Objective Genitive Faith of Christ PDF
7 pages
Case Study Research Design
No ratings yet
Case Study Research Design
12 pages
Bad Sex Good Loving
No ratings yet
Bad Sex Good Loving
63 pages
NSTP Reviewer Finals
No ratings yet
NSTP Reviewer Finals
4 pages
Act 6
No ratings yet
Act 6
11 pages
PAT 301 MCQ + Qn
No ratings yet
PAT 301 MCQ + Qn
18 pages
Literature Review of Three Articles That Pertain To Graphic Novels in The Classroom
No ratings yet
Literature Review of Three Articles That Pertain To Graphic Novels in The Classroom
8 pages
Emission Spectrum of Hydrogen, and Dual Nature of Matter: Charlito R. Aligado
No ratings yet
Emission Spectrum of Hydrogen, and Dual Nature of Matter: Charlito R. Aligado
24 pages
Security Binds Her CH 08
No ratings yet
Security Binds Her CH 08
5 pages
Birth of The Church
No ratings yet
Birth of The Church
15 pages
Cognitive Approach To Electronic Music Theoretical
No ratings yet
Cognitive Approach To Electronic Music Theoretical
4 pages
Chapter 2 Sales Force Management
88% (8)
Chapter 2 Sales Force Management
51 pages

CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)

Uploaded by

CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)

Uploaded by

CS7015 (Deep Learning) : Lecture 12

Department of Computer Science and Engineering

Output Car Car, exact bound-

Let us see a typical pipeline for object detection

In addition we would also like to correct the proposed bounding boxes

Let us see how these three compon-

Selective Search for region proposals

Proposed regions are cropped to form mini im-

For feature extraction any CNN

WCON V Wclassif ier

What are the parameters of this model?

Source: Ross Girshick

Region Proposals: Selective

It is thus compatible with the dimen-

Region Proposals: Selective

We now consider k bounding boxes

We train a classification model and a

We train a classification model and a

p∗i = 1 if anchor box contains ground truth object

Region Proposals: CNN

Source: Ross Girshick

ity map Classprobability

S × S grid on input Final detections

Class probability map

You might also like