0% found this document useful (0 votes)
18 views40 pages

Yolo Family

The document discusses various object detection methods including R-CNN, Fast R-CNN, and Faster R-CNN, highlighting their processes of generating region proposals and classifying objects. It explains the use of region proposal networks (RPN) in Faster R-CNN for improved efficiency and performance, and compares the training functions for each method. Additionally, it touches on the COCO dataset and the grid cell approach for predicting object locations within images.

Uploaded by

student -1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

Yolo Family

The document discusses various object detection methods including R-CNN, Fast R-CNN, and Faster R-CNN, highlighting their processes of generating region proposals and classifying objects. It explains the use of region proposal networks (RPN) in Faster R-CNN for improved efficiency and performance, and compares the training functions for each method. Additionally, it touches on the COCO dataset and the grid cell approach for predicting object locations within images.

Uploaded by

student -1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

https://youtu.be/zgbPj4lSc58?

list=PL1u-h-YIOL0sZJsku-vq7cUGbqDEeDK0a

https://in.mathworks.com/help/vision/ug/getting-started-with-r-cnn-fast-r-cnn-and-faster-r-
cnn.html
RCNN, FAST RCNN: 2 Stage networks

1.They generate the bounding box first and


2. Apply classification next

In the above figure the image is given input to CNN which is a backbone finds the feature map

The boxes are predicted first using RPN. (region proposal network) It doesn’t know the
category of these boxes.
The Faster R-CNN detector adds a region proposal network (RPN) to
generate region proposals directly in the network instead of using an
external algorithm like Edge Boxes. The RPN uses Anchor Boxes for
Object Detection.

In the 2 nd stage

The same feature map


Region of Interest Pooling

R-CNN

The R-CNN detector [2] first generates region proposals using an algorithm such as Edge
Boxes[1]. The proposal regions are cropped out of the image and resized. Then, the CNN
classifies the cropped and resized regions. Finally, the region proposal bounding boxes are
refined by a support vector machine (SVM) that is trained using CNN features.

Use the trainRCNNObjectDetector function to train an R-CNN object detector. The function
returns an rcnnObjectDetector object that detects objects in an image.

Fast R-CNN

As in the R-CNN detector , the Fast R-CNN[3] detector also uses an algorithm like Edge
Boxes to generate region proposals. Unlike the R-CNN detector, which crops and resizes
region proposals, the Fast R-CNN detector processes the entire image. Whereas an R-CNN
detector must classify each region, Fast R-CNN pools CNN features corresponding to each
region proposal. Fast R-CNN is more efficient than R-CNN, because in the Fast R-CNN
detector, the computations for overlapping regions are shared.

Use the trainFastRCNNObjectDetector function to train a Fast R-CNN object detector. The
function returns a fastRCNNObjectDetector that detects objects from an image.

Faster R-CNN
The Faster R-CNN[4] detector adds a region proposal network (RPN) to generate region
proposals directly in the network instead of using an external algorithm like Edge Boxes. The
RPN uses Anchor Boxes for Object Detection. Generating region proposals in the network is
faster and better tuned to your data.

Use the trainFasterRCNNObjectDetector function to train a Faster R-CNN object detector.


The function returns a fasterRCNNObjectDetector that detects objects from an image.

Comparison of R-CNN Object Detectors


This family of object detectors uses region proposals to detect objects within images. The
number of proposed regions dictates the time it takes to detect objects in an image. The
Fast R-CNN and Faster R-CNN detectors are designed to improve detection performance with
a large number of regions.

R-CNN Detector Description

trainRCNNObjectDetector ●
Slow training and detection
● Allows custom region proposal

trainFastRCNNObjectDetector ●
Allows custom region proposal

trainFasterRCNNObjectDetector ●
Optimal run-time performance
● Does not support a custom region proposal

Region of Interest Pooling, or RoIPool, is an operation for extracting a


small feature map (e.g., 7 × 7 ) from each RoI in detection and
segmentation based tasks. Features are extracted from each
candidate box, and thereafter in models like Fast R-CNN, are then
classified and bounding box regression performed.

COCO Dataset: A Step-by-Step Guide to


Loading and Visualizing with Custom Code

exploring the COCO (Common Objects in Context) dataset can be a valuable


learning experience. This dataset is dedicated to object detection, segmentation,
and captioning models, making it a popular choice for developers and
researchers alike.
The COCO dataset contains 330K images and 2.5 million object instances,
making it a valuable resource for developing and testing computer vision
algorithms. For further information on the COCO dataset, please visit its official
website at http://cocodataset.org/.
INPUT image size is 416X 416

In 82 TH LAYER SCALE 1 FEATURE DETECTION HAPPENED BY STRIDE 32

416/32=13X 13

THAT IS GIVEN TO UP SAMPLING


Scale 2 Detection happened at 94 th layer the strid of 16

416/16= 26 x 26( image size)

Scale 3

416/8=52 x 52 ( image size)


Reframe object detection as a single stage regression problem
Take a image the size is 480 X 640

RESIZED INTO 448 X 448


448/7=64 pixels.

Grid cell 7 x7 ( given in paper)

Within a grid 64 pixel is present (448/7=64)


Each cell is responsible for predicting one object

Here in this image 2 objects are present: 1.person 2. Horse


Which grid cell is responsible for detecting the objects.
The centre of object falls into which grid cell is responsible for object
detection.
Blue colour indicates the grid center for person object , and red color
circle indicates the grid center for horse objects.

How to predict the responsibility.

How the targets are getting calculated for training.

The image is divided into 7 x7 grid cell. Each cell is responsible for
one prediction.
Each cell has its own targets.
Targets means the values which we compare with the network
prediction.

Targets are what we need to predict.


In the network : the targets and predictions are closed to each other.
(200,311, 142,250) these are the ground truth values for the person
class.
Delta X and Delta Y relative to the center point so it is divided by the
cell (64)

Width and hight relative to the whole image so it is taken as 448.


Absolute value into relative value with respect to grid cell.
The same way calculate for all the grid cell:

Some grid cells might not have the objects , only 2 grid cells have the
objects,
From A1 to A49( 7 X7) ONLY AII and A32 are having objects.
Remaining are zeros.
Pascal VOC Data 20 classes.

Which of the classes are present we put 1 otherwise zero using label
or one hot encoding.
How we can get the person’s box values from Delta
Google net model
24 convolution layers
Convolution and max pool layers
Finally 2 fully connected layers

Used for generating feature maps.


If the box contains Objects, then we need to find the 3 calculations.,
If the box contains Objects, then we need to find the 3 calculations.,

1. Bounding box loss 2.Objectness confidence loss, 3.Classification


loss,

If the box contains Objects, then we need to find the 3 calculations.,


2. Bounding box loss 2.Objectness confidence loss, 3.Classification
loss,

If the box contains Objects, then we need to find the 3 calculations.,

3. Bounding box loss 2.Objectness confidence loss, 3.Classification


loss,
The whole problem is squared as regression problem.

Pi,c prediction vector( using one hot encoding)


No object : confidence score loss

You might also like