Deep Learning: Dr. Sanjeev Sharma

This document discusses object detection in deep learning. It begins by explaining the difference between image classification, which identifies a single target object in an image, and object detection, which identifies multiple target objects and their positions. It then describes the general framework for object detection using deep learning models, including generating region proposals, extracting features from those regions, classifying and locating objects, and suppressing duplicate detections. It also discusses common metrics used to evaluate object detection models, such as mean average precision (mAP) and intersection over union (IoU).

Uploaded by

Rushikesh Dhete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views61 pages

Deep Learning: Dr. Sanjeev Sharma

Uploaded by

Rushikesh Dhete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Deep Learning

Dr. Sanjeev Sharma

Object Detection
• In image classification, we assume that there is
only one main target object in the image, and the
model’s sole focus is to identify the target
category.
• However, in many situations, we are interested in
multiple targets in the image. We want to not only
classify them, but also obtain their specific
positions in the image. In computer vision, we
refer to such tasks as object detection.
• Figure explains the difference between image
classification and object detection tasks.
• Object detection is widely used in many fields.
For example, in self-driving technology, we
need to plan routes by identifying the locations
of vehicles, pedestrians, roads, and obstacles in
a captured video image.
• Robots often perform this type of task to detect
targets of interest. And systems in the security
field need to detect abnormal targets, such as
intruders or bombs.
General object detection framework
• Typically, an object detection framework has four
components:
• Region proposal—An algorithm or a DL model is used
to generate regions of interest (RoIs) to be further
processed by the system.
• These are regions that the network believes might
contain an object; the output is a large number of
bounding boxes, each of which has an objectness score.
• Boxes with large objectness scores are then passed
along the network layers for further processing.
• Feature extraction and network predictions—
Visual features are extracted for each of the
bounding boxes.
• They are evaluated, and it is determined
whether and which objects are present in the
proposals based on visual features (for
example, an object classification component).
• Non-maximum suppression (NMS)—In this
step, the model has likely found multiple
bounding boxes for the same object.
• NMS helps avoid repeated detection of the
same instance by combining overlapping
boxes into a single bounding box for each
object.
• Evaluation metrics—Similar to accuracy,
precision, and recall metrics in image
classification tasks (see chapter 4), object
detection systems have their own metrics to
evaluate their detection performance. In this
section, we will explain the most popular
metrics, like mean average precision (mAP),
precision-recall curve (PR curve), and
intersection over union (IoU).
Region proposals
• RoIs are regions that the system believes have
a high likelihood of containing an object,
called the objectness score (figure).
• Regions with high objectness scores are passed
to the next steps; regions with low scores are
abandoned.
• The important thing to note is that this step
produces a lot (thousands) of bounding boxes to
be further analyzed and classified by the network.
During this step, the network analyzes these
regions in the image and classifies each region as
foreground (object) or background (no object)
based on its objectness score. If the objectness
score is above a certain threshold, then this region
is considered a foreground and pushed forward in
the network. Note that this threshold is
configurable based on your problem.
• If the threshold is too low, your network will
exhaustively generate all possible proposals, and
you will have a better chance of detecting all
objects in the image.
• On the flip side, this is very computationally
expensive and will slow down detection. So, the
trade-off with generating region proposals is the
number of regions versus computational
complexity—and the right approach is to use
problem-specific information to reduce the
number of RoIs.
Network predictions
• This component includes the pretrained CNN network
that is used for feature extraction to extract features
from the input image that are representative for the task
at hand and to use these features to determine the class
of the image. In object detection frameworks, people
typically use pretrained image classification models to
extract visual features, as these tend to generalize fairly
well.
• For example, a model trained on the MS COCO or
ImageNet dataset is able to extract fairly generic
features.
• In this step, the network analyzes all the
regions that have been identified as having a
high likelihood of containing an object and
makes two predictions for each region:
• Bounding-box prediction—The coordinates that
locate the box surrounding the object. The
bounding box coordinates are represented as the
tuple (x, y, w, h), where x and y are the
coordinates of the center point of the bounding
box and w and h are the width and height of the
box.
• Class prediction: The classic softmax function
that predicts the class probability for each object.
• Since thousands of regions are proposed, each
object will always have multiple bounding
boxes surrounding it with the correct
classification.
• For example, take a look at the image of the
dog in figure 7.3. The network was clearly able
to find the object (dog) and successfully
classify it. But the detection fired a total of five
times because
• the dog was present in the five RoIs produced in the
previous step: hence the five bounding boxes around
the dog in the figure. Although the detector was able to
successfully locate the dog in the image and classify it
correctly, this is not exactly what we need. We need just
one bounding box for each object for most problems.
• In some problems, we only want the one box that fits
the object the most. What if we are building a system to
count dogs in an image? Our current system will count
five dogs. We don’t want that. This is when the non-
maximum suppression technique comes in handy.
Non-maximum suppression (NMS)
• As you can see in figure 7.4, one of the problems of an
object detection algorithm is that it may find multiple
detections of the same object. So, instead of creating
only one bounding box around the object, it draws
multiple boxes for the same object.
• NMS is a technique that makes sure the detection
algorithm detects each object only once. As the name
implies, NMS looks at all the boxes surrounding an
object to find the box that has the maximum prediction
probability, and it suppresses or eliminates the other
boxes (hence the name).
• The general idea of NMS is to reduce the
number of candidate boxes to only one
bounding box for each object.
• For example, if the object in the frame is fairly
large and more than 2,000 object proposals
have been generated, it is quite likely that
some of them will have significant overlap
with each other and the object.
Object-detector evaluation metrics
• FRAMES PER SECOND (FPS) TO MEASURE
DETECTION SPEED
• The most common metric used to measure
detection speed is the number of frames per
second (FPS). For example, Faster R-CNN
operates at only 7 FPS, whereas SSD operates at
59 FPS. In benchmarking experiments, you will
see the authors of a paper state their network
results as: “Network X achieves mAP of Y% at Z
FPS,” where X is the network name, Y is the mAP
percentage, and Z is the FPS.
• MEAN AVERAGE PRECISION (MAP) TO
MEASURE NETWORK PRECISION
• The most common evaluation metric used in object
recognition tasks is mean average precision (mAP). It is
a percentage from 0 to 100, and higher values are
typically better, but its value is different from the
accuracy metric used in classification.
• To understand how mAP is calculated, you first need to
understand intersection over union (IoU) and the
precision-recall curve (PR curve). Let’s explain IoU
and the PR curve and then come back to mAP.
• INTERSECTION OVER UNION (IOU)
• This measure evaluates the overlap between two
bounding boxes: the ground truth bounding box
(Bground truth) and the predicted bounding box
(Bpredicted). By applying the IoU, we can tell
whether a detection is valid (True Positive) or not
(False Positive).
• Figure 7.5 illustrates the IoU between a ground
truth bounding box and a predicted bounding box.
• The intersection over the union value ranges
from 0 (no overlap at all) to 1 (the two
bounding boxes overlap each other 100%).
The higher the overlap between the two
bounding boxes (IoU value), the better (figure
7.6).
• To calculate the IoU of a prediction, we need the
following:
• The ground truth bounding box (Bground
truth): the hand-labeled bounding box
• created during the labeling process
• The predicted bounding box (Bpredicted) from
our model
• We calculate IoU by dividing the area of overlap
by the area of the union, as in the following
• equation:
• IoU is used to define a correct prediction,
meaning a prediction (True Positive) that has an
IoU greater than some threshold. This threshold is
a tunable value depending on the challenge, but
0.5 is a standard value. For example, some
challenges, like Microsoft COCO, use [email protected]
(IoU threshold of 0.5) or [email protected] (IoU
threshold of 0.75). If the IoU value is above this
threshold, the prediction is considered a True
Positive (TP); and if it is below the threshold, it is
considered a False Positive (FP).
PRECISION-RECALL CURVE (PR
CURVE)
• With the TP and FP defined, we can now
calculate the precision and recall of our
detection for a given class across the testing
dataset. As explained in chapter 4, we calculate
the precision and recall as follows (recall that
FN stands for False Negative):
• With the TP and FP defined, we can now
calculate the precision and recall of our
• detection for a given class across the testing
dataset. As explained in chapter 4, we calculate
• the precision and recall as follows (recall that
FN stands for False Negative):
Region-based convolutional neural
networks (R-CNNs)
• The R-CNN family of object detection
techniques usually referred to as R-CNNs,
which is short for region-based convolutional
neural networks, was developed by Ross
Girshick et al. in 2014.
• The R-CNN family expanded to include Fast-
RCNN2 and Faster-RCN3 in 2015 and 2016,
respectively
R-CNN
• R-CNN is the least sophisticated region-based
architecture in its family, but it is the basis for
understanding how multiple object-recognition
algorithms work for all of them.
• It was one of the first large, successful
applications of convolutional neural networks
to the problem of object detection and
localization, and it paved the way for the other
advanced detection algorithms.

Part 2
No ratings yet
Part 2
225 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
98 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Object Detection
No ratings yet
Object Detection
76 pages
Object Detection
No ratings yet
Object Detection
96 pages
Module 6
No ratings yet
Module 6
83 pages
Unit 3
No ratings yet
Unit 3
45 pages
Object Detection and Segmentation
No ratings yet
Object Detection and Segmentation
37 pages
Unit 3 - 1 - 1709014556934
No ratings yet
Unit 3 - 1 - 1709014556934
49 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
45 pages
Lecture 7 Deep Learning in Object Detection 2025
No ratings yet
Lecture 7 Deep Learning in Object Detection 2025
43 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
09object Detection I
No ratings yet
09object Detection I
49 pages
8 ObectDectection
No ratings yet
8 ObectDectection
60 pages
DL Unit-5
No ratings yet
DL Unit-5
34 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
45 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Unit 3-Non CNN Approaches To Object Recognition
No ratings yet
Unit 3-Non CNN Approaches To Object Recognition
26 pages
Yolo
No ratings yet
Yolo
24 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
Lesson 07
No ratings yet
Lesson 07
59 pages
Box Loss
No ratings yet
Box Loss
16 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Fairmot Explained 1
No ratings yet
Fairmot Explained 1
19 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
Overview of Object Detection Evaluation Metrics - by Youssef Hosni - Towards AI
No ratings yet
Overview of Object Detection Evaluation Metrics - by Youssef Hosni - Towards AI
10 pages
Unit 3
No ratings yet
Unit 3
17 pages
ML Study Design - Google Street View Blurring System
No ratings yet
ML Study Design - Google Street View Blurring System
11 pages
Learning To Detect Objects in Images Via A Sparse, Part-Based Representation
No ratings yet
Learning To Detect Objects in Images Via A Sparse, Part-Based Representation
28 pages
IP Report Final
No ratings yet
IP Report Final
20 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
No ratings yet
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
10 pages
Paper Survey On Performance Metrics For Object Detection Algorithms
No ratings yet
Paper Survey On Performance Metrics For Object Detection Algorithms
6 pages
NLP and IR Tanvier Siddiqui, U.S. Tiwary
No ratings yet
NLP and IR Tanvier Siddiqui, U.S. Tiwary
18 pages
Research Paper G19
No ratings yet
Research Paper G19
5 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
Aws RP
No ratings yet
Aws RP
11 pages
10 1109@iwssip48289 2020 9145130
No ratings yet
10 1109@iwssip48289 2020 9145130
6 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
CV Project
No ratings yet
CV Project
7 pages
Performance Indicator Survey For Object Detection
No ratings yet
Performance Indicator Survey For Object Detection
5 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
6 pages
Introduction To Object Detection
No ratings yet
Introduction To Object Detection
24 pages
Object Detection
No ratings yet
Object Detection
3 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
SR22804211151
No ratings yet
SR22804211151
8 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Real-Time Object Detection Using Deep Learning and Open CV
No ratings yet
Real-Time Object Detection Using Deep Learning and Open CV
4 pages
Deep Learning Object Detection IoU
No ratings yet
Deep Learning Object Detection IoU
2 pages
Module 5 - Information Retrieval and Lexical Resources
0% (1)
Module 5 - Information Retrieval and Lexical Resources
80 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
6 pages
Information Retrieval System
No ratings yet
Information Retrieval System
4 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
Full Proceedings
No ratings yet
Full Proceedings
79 pages
IR Unit 5
No ratings yet
IR Unit 5
5 pages
Efficient Content-Based Image Retrieval Using Integrated Dual Deep Convolutional Neural Network
No ratings yet
Efficient Content-Based Image Retrieval Using Integrated Dual Deep Convolutional Neural Network
8 pages
Ch5 Retrieval Evaluation 2021
No ratings yet
Ch5 Retrieval Evaluation 2021
26 pages
93512information Retrieval LecturesNotes2024
No ratings yet
93512information Retrieval LecturesNotes2024
153 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
IR - 754 All Practical
No ratings yet
IR - 754 All Practical
21 pages
Chapter 5 Retrieval Efective
No ratings yet
Chapter 5 Retrieval Efective
24 pages
Multilingual Information Retrieval
No ratings yet
Multilingual Information Retrieval
18 pages
CS8080 Irt Unit 4 23 24
No ratings yet
CS8080 Irt Unit 4 23 24
36 pages
L15 IRSW Evaluation
No ratings yet
L15 IRSW Evaluation
49 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Unit Iv - Irt
No ratings yet
Unit Iv - Irt
62 pages
Wildfire - and - Smoke - Detection - Using - YOLO - NAS - ICMI - 2024
No ratings yet
Wildfire - and - Smoke - Detection - Using - YOLO - NAS - ICMI - 2024
6 pages
Ir Practical 4
No ratings yet
Ir Practical 4
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Evaluating Recommender Sytems
No ratings yet
Evaluating Recommender Sytems
39 pages
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
No ratings yet
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
4 pages
1 s2.0 S2666521225000031 Main - DessyNovita
No ratings yet
1 s2.0 S2666521225000031 Main - DessyNovita
9 pages
Chapter 5 IR Evaluation
No ratings yet
Chapter 5 IR Evaluation
45 pages
Chapter-5: Retrieval Effectiveness
No ratings yet
Chapter-5: Retrieval Effectiveness
25 pages
Deep Ensemble Architectures With Heterogeneous Approach For An Efficient Content-Based Image Retrieval
No ratings yet
Deep Ensemble Architectures With Heterogeneous Approach For An Efficient Content-Based Image Retrieval
13 pages
CS 3308 Learning Journal Unit 6
No ratings yet
CS 3308 Learning Journal Unit 6
7 pages
Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata
No ratings yet
Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata
10 pages
1 s2.0 S2666651021000176 Main
No ratings yet
1 s2.0 S2666651021000176 Main
6 pages
Hindi To English and Marathi To English Cross Lang
No ratings yet
Hindi To English and Marathi To English Cross Lang
9 pages

Deep Learning: Dr. Sanjeev Sharma

Uploaded by

Deep Learning: Dr. Sanjeev Sharma

Uploaded by

Deep Learning

Dr. Sanjeev Sharma

You might also like