0% found this document useful (0 votes)

54 views43 pages

Deep Learning Algorithms For Object Detection

RCNN uses selective search to extract regions of interest from images, runs each region through a CNN to extract features, and uses SVM and regression models to classify regions and adjust bounding boxes. This makes it slow, taking around 50 seconds per image. Fast RCNN improves speed by running the CNN once per image to extract all regions of interest simultaneously. However, it still relies on selective search for region proposals. Faster RCNN introduces a region proposal network that generates object proposals from the CNN feature map, further improving speed and reducing reliance on selective search. It takes around 0.2 seconds per image.

Uploaded by

Vaijayanthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views43 pages

Deep Learning Algorithms For Object Detection

Uploaded by

Vaijayanthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

DEEP LEARNING

ALGORITHMS FOR OBJECT DETECTION

Looking for a lost room key in an untidy and messy house?

✔ A simple computer algorithm

could locate your keys in a
matter of milliseconds

✔ That is the power of object

detection algorithms

✔ In short, these are powerful

deep learning algorithms.
Table of Contents
1. A Simple Way of Solving an Object Detection Task (using Deep Learning)
2. RCNN
3. Fast RCNN
4. Faster RCNN
5. Summary of the Algorithms covered
6. YOLO
OBJECT DETECTION

Detecting the objects in an image along with their location, typically using a
bounded box.
1. A Simple Way of Solving an Object Detection
Task using CNN
1. First we take an image as input
2. Then we divide the image into various regions:
3. We will then consider each region as a separate image.

4. Pass all these regions (images) to the CNN and classify

them into various classes.

5. Once we have divided each region into its corresponding

class, we can combine all these regions to get the original
image with the detected objects
PROBLEM

• Objects in the image can have different aspect ratios and

spatial locations (Object might be covering most of the
image / Object might only be covering a small percentage of
the image).
• So we would require a very large number of regions
• Huge amount of computational time

So to solve this problem and reduce the number of regions, we

can use region-based CNN
RCNN - Region-Based
Convolutional Neural Network
• Instead of working on a massive number of regions, the RCNN
algorithm proposes a bunch of boxes in the image and checks if
any of these boxes contain any object.
• First an image is taken as input
• Then, we get the Regions of Interest (ROI) using some
proposal method (selective search):
• RCNN uses selective search to extract these boxes from an
image (these boxes are called regions) because it is fast and
has a very high recall.

• Selective Search is a region proposal algorithm used in object

detection. It is designed to be fast with a very high recall. It is based
on computing hierarchical grouping of similar regions based on color,
texture, size and shape compatibility.
• Selective search is a region proposal algorithm used in object
detection
• Selective Search starts by over-segmenting the image based on
intensity of the pixels using a segmentation method

Input Output
Image Image
• Selective Search algorithm takes these over-segments as initial input
and performs the following steps
1. Add all bounding boxes corresponding to segmented parts to the list of
regional proposals
2. Group adjacent segments based on similarity
(Selective Search uses 4 similarity measures based on color, texture, size and
shape compatibility.)
3. Go to step 1

• At each iteration, larger segments are formed and added to the list of
region proposals. Hence we create region proposals from smaller
segments to larger segments in a bottom-up approach. This is what
we mean by computing “hierarchical” segmentations
This image shows the initial, middle and last step of the
hierarchical segmentation process
• All these regions are then warped to have a fixed size as
required by CNN, and each region is passed to the ConvNet

Here image is warped

to have a fixed size.
• CNN then extracts features for each region and SVMs
are used to divide these regions into different classes:
• Finally, a bounding box regression (Bbox reg) is used to
predict the bounding boxes for each identified region:
Summary of RCNN
• Extracting 2,000 regions for each image based on selective search
• Extracting features using CNN for every image region. Suppose we
have N images, then the number of CNN features will be N*2,000
• The entire process of object detection using RCNN has three models:
• CNN for feature extraction
• Linear SVM classifier for identifying objects
• Regression model for tightening the bounding boxes.
• All these processes combine to make RCNN very slow.
Problems with RCNN
• Training an RCNN model is expensive and slow
• It takes around 40-50 seconds to make predictions for each new
image, which essentially makes the model cumbersome and
practically impossible to build when faced with a gigantic dataset
Fast RCNN
To reduce the computational time
• Instead of running a CNN 2,000 times per image, we can run it just
once per image and get all the regions of interest (regions containing
some object).
• First an image is taken as input
• This image is passed to a ConvNet which returns the
region of interests accordingly:
• Then we apply the RoI pooling layer on the extracted regions of
interest to make sure all the regions are of the same size:
• Finally, these regions are passed on to a fully connected network which
classifies them, as well as returns the bounding boxes using softmax and
linear regression layers simultaneously:
This is how Fast RCNN resolves two major
issues of RCNN
• Passing one instead of 2,000 regions per image to the ConvNet
• Using one instead of three different models for extracting features,
classification and generating bounding boxes.
Problems with Fast RCNN
• It also uses selective search as a proposal method to find the Regions
of Interest, which is a slow and time consuming process
• It takes around 2 seconds per image to detect objects, which is much
better compared to RCNN. But when we consider large real-life
datasets, then even a Fast RCNN doesn’t look so fast anymore.
Faster RCNN
To reduce the computational time
• Faster RCNN uses “Region Proposal Network”, aka RPN. RPN takes
image feature maps as an input and generates a set of object
proposals, each with an objectness score as output.
The below steps are typically followed in a Faster RCNN approach:
• We take an image as input and pass it to the ConvNet which returns the
feature map for that image.
• Region proposal network is applied on these feature maps. This returns the
object proposals along with their objectness score.
• A RoI pooling layer is applied on these proposals to bring down all the
proposals to the same size.
• Finally, the proposals are passed to a fully connected layer which has a
softmax layer and a linear regression layer at its top, to classify and output
the bounding boxes for objects.
Softmax classifier Linear + Softmax Linear Bounding box regressors

Fully Connected layer

is applied on these
proposals to bring
objectness object it to same size
score

is applied
Determines the
probability of a
proposal having Regresses the
target object coordinates of
For ZF model(an the proposal
ext of Alexnet)
dimension is 256-d

Anchor – centre point of

Has outputs 2k scores Has 4k outputs encoding
that estimate the coordinates of k boxes
probability of object
or not object for each
proposal

Here developer has

chose 3 scale & s aspect
ratio. So total of 9
proposals are possible
for each pixel. So k=9 =
no of anchors
For whole image, no of
anchors is W*H*K
• Faster RCNN takes the feature maps from CNN and passes them on to
the Region Proposal Network. RPN uses a sliding window over these
feature maps, and at each window, it generates k Anchor boxes of
different shapes and sizes:

• Anchor boxes are fixed sized boundary boxes that are placed
throughout the image and have different shapes and sizes.

• For each anchor, RPN predicts two things:

1. The first is the probability that an anchor is an object(it does not consider
which class the object belongs to)
2. Second is the bounding box regressor for adjusting the anchors to better fit
the object
• We now have bounding boxes of different shapes and sizes which are
passed on to the RoI pooling layer. Now it might be possible that after
the RPN step, there are proposals with no classes assigned to them.
We can take each proposal and crop it so that each proposal contains
an object. This is what the RoI pooling layer does. It extracts fixed
sized feature maps for each anchor:

• Then these feature maps are passed to a fully connected layer which
has a softmax and a linear regression layer. It finally classifies the
object and predicts the bounding boxes for the identified objects.
• All of the object detection algorithms we have discussed so far use
regions to identify the objects. The network does not look at the
complete image in one go, but focuses on parts of the image
sequentially. This creates two complications:
• The algorithm requires many passes through a single image to extract all the
objects
• As there are different systems working one after the other, the performance
of the systems further ahead depends on how the previous systems
performed
5. Summary of the Algorithms covered

Algorithm Features Prediction time / image Limitations

Divides the image into Needs a lot of regions to

multiple regions and then predict accurately and
CNN –
classify each region into hence high computation
various classes. time.
High computation time as
Uses selective search to each region is passed to
generate regions. the CNN separately also
RCNN 40-50 seconds
Extracts around 2000 it uses three different
regions from each image. model for making
predictions.
5. Summary of the Algorithms covered

Algorithm Features Prediction time / image Limitations

Each image is passed only

once to the CNN and feature
maps are extracted. Selective
Selective search is slow and
search is used on these maps
Fast RCNN 2 seconds hence computation time is still
to generate predictions.
high.
Combines all the three
models used in RCNN
together.
Object proposal takes time
and as there are different
Replaces the selective search
systems working one after the
Faster method with region proposal
0.2 seconds other, the performance of
RCNN network which made the
systems depends on how the
algorithm much faster.
previous system has
performed.
6. YOLO
(You Only Look Once)
IMAGE

Split it into an SxS grid

Within each of the grid we take m bounding boxes

The network outputs a class probability and offset values for the bounding box

Bounding boxes having the class probability above a threshold value is

selected and used to locate the object within the image
Positive:
YOLO is orders of magnitude faster(45 frames per second) than other
object detection algorithms.

Limitation:
The limitation of YOLO algorithm is that it struggles with small objects
within the image, for example it might have difficulties in detecting a
flock of birds. This is due to the spatial constraints of the algorithm.

R-CNN and Selective Search Overview
No ratings yet
R-CNN and Selective Search Overview
6 pages
R-CNN: Overview of Object Detection Models
No ratings yet
R-CNN: Overview of Object Detection Models
28 pages
Object Detection
No ratings yet
Object Detection
76 pages
RCNN
No ratings yet
RCNN
25 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
R-CNN vs Fast R-CNN Analysis
No ratings yet
R-CNN vs Fast R-CNN Analysis
4 pages
Faster R-CNN - Deep Dive Into Object Detection
No ratings yet
Faster R-CNN - Deep Dive Into Object Detection
31 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
Face Detection With The Faster R-CNN
No ratings yet
Face Detection With The Faster R-CNN
6 pages
Beginner's Guide to R-CNN Basics
No ratings yet
Beginner's Guide to R-CNN Basics
6 pages
Region-Based Object Detection and Classification Using Faster R-CNN
No ratings yet
Region-Based Object Detection and Classification Using Faster R-CNN
6 pages
Object Detection1
No ratings yet
Object Detection1
29 pages
Advanced Object Detection Guide
No ratings yet
Advanced Object Detection Guide
90 pages
Lecture Paola Object Detection
No ratings yet
Lecture Paola Object Detection
29 pages
Object Recognition with Deep Learning
No ratings yet
Object Recognition with Deep Learning
47 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
RCNN: Pros, Cons, and Applications
No ratings yet
RCNN: Pros, Cons, and Applications
6 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Unit 3
No ratings yet
Unit 3
45 pages
Presentation (Theoretical Evaluation)
No ratings yet
Presentation (Theoretical Evaluation)
107 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Understanding Object Detection Techniques
No ratings yet
Understanding Object Detection Techniques
46 pages
R-CNN Minus R: Karel Lenc Andrea Vedaldi
No ratings yet
R-CNN Minus R: Karel Lenc Andrea Vedaldi
9 pages
Object Detection
No ratings yet
Object Detection
57 pages
R-CNN Variants in Object Detection
No ratings yet
R-CNN Variants in Object Detection
8 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
Faster R-CNN with Region Proposal Networks
No ratings yet
Faster R-CNN with Region Proposal Networks
9 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
IMINT Target Acquisition Using Deep Learning
No ratings yet
IMINT Target Acquisition Using Deep Learning
5 pages
Lenc 15 RCNN
No ratings yet
Lenc 15 RCNN
12 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
An Improved Faster R-CNN For Same Object
No ratings yet
An Improved Faster R-CNN For Same Object
12 pages
Last Lab Report
No ratings yet
Last Lab Report
6 pages
Lecture 4 Detection
No ratings yet
Lecture 4 Detection
148 pages
Faster R-CNN: Real-Time Object Detection
No ratings yet
Faster R-CNN: Real-Time Object Detection
13 pages
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
No ratings yet
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
6 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
Ref 16
No ratings yet
Ref 16
14 pages
Generalized R-CNN for Researchers
No ratings yet
Generalized R-CNN for Researchers
127 pages
R CNN Regions With Convolutional Neural Network Features
No ratings yet
R CNN Regions With Convolutional Neural Network Features
8 pages
Multilateral OCC with CNN Models
No ratings yet
Multilateral OCC with CNN Models
9 pages
Mask R-CNN: Instance Segmentation Framework
No ratings yet
Mask R-CNN: Instance Segmentation Framework
9 pages
He Mask R-CNN ICCV 2017 Paper PDF
No ratings yet
He Mask R-CNN ICCV 2017 Paper PDF
9 pages
AI-Powered Object Segmentation
No ratings yet
AI-Powered Object Segmentation
12 pages
Object Detection for the Visually Impaired
No ratings yet
Object Detection for the Visually Impaired
4 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
No ratings yet
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
14 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Securing Multi-Path Routing Using Trust Management in Heterogeneous Wireless Sensor Network
No ratings yet
Securing Multi-Path Routing Using Trust Management in Heterogeneous Wireless Sensor Network
32 pages
Syllabus - 18CSC203J - Computer Organization and Architecture
No ratings yet
Syllabus - 18CSC203J - Computer Organization and Architecture
4 pages
CSE Anna University Chennai Syllabus
No ratings yet
CSE Anna University Chennai Syllabus
108 pages
Implication and Importance of Gis in Public Administration: Public Safety and Public Health
No ratings yet
Implication and Importance of Gis in Public Administration: Public Safety and Public Health
7 pages
CS6659 UNIT 4 Notes
0% (1)
CS6659 UNIT 4 Notes
11 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Professional Ethics in Engineering: Question Bank
No ratings yet
Professional Ethics in Engineering: Question Bank
9 pages
DS Lab
No ratings yet
DS Lab
94 pages
Biometric Applications
No ratings yet
Biometric Applications
9 pages
Threat Modelling for Web Security
50% (2)
Threat Modelling for Web Security
48 pages
Understanding Domain Constraints in SQL
No ratings yet
Understanding Domain Constraints in SQL
56 pages
Artificial Intelligence in Breast Cancer
No ratings yet
Artificial Intelligence in Breast Cancer
17 pages
Computer Graphics & Image Processing Course
No ratings yet
Computer Graphics & Image Processing Course
12 pages
Babu G. Computational Imaging and Analytics in Biomedical Engineering... 2024
No ratings yet
Babu G. Computational Imaging and Analytics in Biomedical Engineering... 2024
356 pages
C Ibp 2311
100% (1)
C Ibp 2311
72 pages
Neural Networks
No ratings yet
Neural Networks
6 pages
Leaf Disease Spot Segmentation
No ratings yet
Leaf Disease Spot Segmentation
9 pages
Attention Mechanisms in Computer Vision: A Survey
No ratings yet
Attention Mechanisms in Computer Vision: A Survey
38 pages
DeepFashion2 - A Versatile Benchmark For Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
No ratings yet
DeepFashion2 - A Versatile Benchmark For Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
9 pages
Plant Health App for Farmers
No ratings yet
Plant Health App for Farmers
12 pages
Virtual Hair Dye for Tech Enthusiasts
No ratings yet
Virtual Hair Dye for Tech Enthusiasts
1 page
Pattern Recognition: Dr. Farah Qais Al-Khalidi
100% (1)
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
Vehicle Tracking and Classification in Challenging Scenarios Via Slice Sampling
No ratings yet
Vehicle Tracking and Classification in Challenging Scenarios Via Slice Sampling
17 pages
Lecture 1
No ratings yet
Lecture 1
34 pages
Gujarat Technological University Advance Image Processing SUBJECT CODE: 3710506 Semester I
No ratings yet
Gujarat Technological University Advance Image Processing SUBJECT CODE: 3710506 Semester I
3 pages
Fast and Robust Pose Estimation Algorithm For Bin Picking Using Point Pair Feature
No ratings yet
Fast and Robust Pose Estimation Algorithm For Bin Picking Using Point Pair Feature
6 pages
eCognition Developer User Manual
No ratings yet
eCognition Developer User Manual
80 pages
Image Processing Final Exam 2014 SOLUTION
No ratings yet
Image Processing Final Exam 2014 SOLUTION
5 pages
(EBOOK PDF) The Latest Developments and Challenges in Biomedical Engineering Proceedings of the 23rd P 1st Edition 3031384296 9783031384295 full chapters - Quickly download the ebook to never miss any content
100% (14)
(EBOOK PDF) The Latest Developments and Challenges in Biomedical Engineering Proceedings of the 23rd P 1st Edition 3031384296 9783031384295 full chapters - Quickly download the ebook to never miss any content
76 pages
Khanda Photogrammetry Paraphrase
No ratings yet
Khanda Photogrammetry Paraphrase
14 pages
An Effective Weight Initialization Method For Deep Learning
No ratings yet
An Effective Weight Initialization Method For Deep Learning
12 pages
A Review of Underwater Mine Detection and Classifi
No ratings yet
A Review of Underwater Mine Detection and Classifi
22 pages
CNN Segmentation of Stroke Infarcts
No ratings yet
CNN Segmentation of Stroke Infarcts
8 pages
TT XLA Chapter3
No ratings yet
TT XLA Chapter3
50 pages
Zhang Semantic Segmentation by Early Region Proxy CVPR 2022 Paper
No ratings yet
Zhang Semantic Segmentation by Early Region Proxy CVPR 2022 Paper
11 pages
3-Matic Tutorial PDF
100% (1)
3-Matic Tutorial PDF
113 pages
Satellite Image Segmentation
No ratings yet
Satellite Image Segmentation
11 pages
DA Segmentation
No ratings yet
DA Segmentation
4 pages
A Flexible Deep Learning Crater Detection Scheme Using Segment Anything Model (SAM)
No ratings yet
A Flexible Deep Learning Crater Detection Scheme Using Segment Anything Model (SAM)
8 pages
Image Segmentation Techniques Explained
No ratings yet
Image Segmentation Techniques Explained
3 pages
Dip 15ec72 Module1
No ratings yet
Dip 15ec72 Module1
27 pages

Deep Learning Algorithms For Object Detection

Uploaded by

Deep Learning Algorithms For Object Detection

Uploaded by

DEEP LEARNING

ALGORITHMS FOR OBJECT DETECTION

✔ A simple computer algorithm

✔ That is the power of object

✔ In short, these are powerful

4. Pass all these regions (images) to the CNN and classify

5. Once we have divided each region into its corresponding

• Objects in the image can have different aspect ratios and

So to solve this problem and reduce the number of regions, we

• Selective Search is a region proposal algorithm used in object

Here image is warped

Fully Connected layer

Anchor – centre point of

Here developer has

• For each anchor, RPN predicts two things:

Algorithm Features Prediction time / image Limitations

Divides the image into Needs a lot of regions to

Algorithm Features Prediction time / image Limitations

Each image is passed only

Split it into an SxS grid

Within each of the grid we take m bounding boxes

Bounding boxes having the class probability above a threshold value is

You might also like