0% found this document useful (0 votes)

18 views22 pages

139 Pretrained Networks Object Detection

Uploaded by

greenmyworld1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views22 pages

139 Pretrained Networks Object Detection

Uploaded by

greenmyworld1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

ULC665

Deep Learning

Convolutional Neural Networks (CNN) for

Object detection

(DL, Dr. Ashish Gupta) ULC665 : Introduction 1 / 22

Agenda

■ What is Object Recognition?

■ R-CNN Model Family
■ YOLO Model Family

(DL, Dr. Ashish Gupta) ULC665 : Introduction 2 / 22

Key Terms/ Tasks

■ Object recognition: a general term to describe a collection of related

computer vision tasks that involve identifying objects in digital pho-
tographs.
• Image classification typically involves predicting the class of one
object in an image.
• Object localization refers to identifying the location of one or more
objects in an image and drawing a bounding box around their
extent.
• Object detection combines these two tasks and localizes and clas-
sifies one or more objects in an image.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 3 / 22

Classification

■ Image Classification: Predict the type or class of an object in an image.

• Input: An image with a single object, such as a photograph.
• Output: A class label (e.g. one or more integers that are mapped
to class labels).

(DL, Dr. Ashish Gupta) ULC665 : Introduction 4 / 22

■ Object Localization: Locate the presence of objects in an image and
indicate their location with a bounding box.
• Input: An image with one or more objects, such as a photograph.
• Output: One or more bounding boxes (e.g. defined by a point,
width, and height).

(DL, Dr. Ashish Gupta) ULC665 : Introduction 5 / 22

Object Detection

■ Object Detection: Locate the presence of objects with a bounding box

and types or classes of the located objects in an image.
• Input: An image with one or more objects.
• Output: Algorithms produce a list of object categories present in
the image along with an axis-aligned bounding box (e.g. defined
by a point, width, and height) indicating the position and scale of
every instance of each object category.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 6 / 22

Object Detection Datasets

(DL, Dr. Ashish Gupta) ULC665 : Introduction 7 / 22

Dense prediction tasks: Image segmentation and its types

■ Object segmentation: instances of recognized objects are indicated

by highlighting the specific pixels of the object instead of a coarse
bounding box.
■ Object proposal models aim at producing a small set, typically a few
hundreds or thousands, of overlapping candidate object bounding boxes
or region proposals.
■ Salient object detection: detecting and accuretaly segmenting the
most salient object regions in the image.
■ Fixation prediction models typically try to predict where humans
look, i.e., a small set of fixation points.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 8 / 22

■ Semantic segmentation aims at accurately partitioning each object
region from the background region, i.e., not only locates all the target
objects but also accurately delineates their boundaries.
■ Instance segmentation aims to detect each object as an individual in
the image.
■ Panoptic segmentation has the highest goal, which assigns a seman-
tic label and an instance label to each pixel.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 9 / 22

Overview of Object Recognition Computer Vision Tasks

Object recognition refers to a suite of challenging computer vision tasks as

shown below

(DL, Dr. Ashish Gupta) ULC665 : Introduction 10 / 22

R-CNN Model Family

■ R-CNN family of methods refers to

• Regions with CNN Features or
• Region-Based Convolutional Neural Network (Ross Girshick, et al.,
2014)
■ Includes the techniques
• R-CNN,
• Fast R-CNN, and
• Faster-RCNN
designed and demonstrated for object localization and object recogni-
tion.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 11 / 22

R-CNN

■ First large and successful application of CNNs to the problem of object

localization, detection, and segmentation.
■ Their proposed R-CNN model is comprised of three modules:
• Region Proposal: Generate and extract category independent
region proposals, e.g. candidate bounding boxes.
To propose candidate regions or bounding boxes of potential ob-
jects in the image called selective search.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 12 / 22

R-CNN

■ Their proposed R-CNN model is comprised of three modules:

• Region Proposal
• Feature Extractor: Extract features from each candidate region,
e.g. using AlexNet DCNN. Output of AlexNet : 4096 element
vector fed to next stage.
• Classifier: Classify features as one of the known classes, e.g. linear
SVM classifier model.

Downside: it is slow, requiring a CNN-based feature extraction pass on

each of the candidate regions generated by the region proposal algorithm.
(DL, Dr. Ashish Gupta) ULC665 : Introduction 13 / 22
Fast R-CNN

■ Limitations of R-CNN
• Training is a multi-stage pipeline. Involves the preparation and
operation of three separate models.
• Training is expensive in space and time. Training a deep CNN on
so many region proposals per image is very slow.
• Object detection is slow. Make predictions using a deep CNN on
so many region proposals is very slow.
■ Fast R-CNN is proposed as a single model instead of a pipeline to learn
and output regions and classifications directly.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 14 / 22

Fast R-CNN

• Input: An Image and a set of region proposals.

• Passed through a DCNN. A pre-trained CNN, such as a VGG-16, is
used for feature extraction.
• The end of the deep CNN is a custom layer called a Region of Interest
Pooling Layer, or RoI Pooling, that extracts features specific for a given
input candidate region.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 15 / 22

RoI Pooling in Fast-RCNN

The model is significantly faster to train and to make predictions, yet still
requires a set of candidate regions to be proposed along with each input
image.
(DL, Dr. Ashish Gupta) ULC665 : Introduction 16 / 22
Faster R-CNN

■ Although it is a single unified model, the architecture is comprised of

two modules:
• Region Proposal Network: A CNN based architecture to both pro-
pose and refine region proposals as part of the training process.
• Fast R-CNN: These regions are then used in concert with a Fast
R-CNN model in a single model design. Extracting features from
the proposed regions → outputting the bounding box and class
labels.
■ Both modules operate on the same output of a deep CNN.
■ The region proposal network acts as an attention mechanism for the
Fast R-CNN network, informing the second network of where to look
or pay attention.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 17 / 22

(DL, Dr. Ashish Gupta) ULC665 : Introduction 18 / 22
Feature Pyramid networks (FPN)

■ Most of the DL-based detectors run detection only on the feature maps
of the networks’ top layer.
■ Although the features in deeper layers of a CNN are beneficial for
category recognition, it is not conducive to localizing objects.
■ FPN
• leverages a ConvNet’s pyramidal feature hierarchy, which has se-
mantics from low to high levels, and
• build a feature pyramid with high-level semantics throughout.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 19 / 22

(DL, Dr. Ashish Gupta) ULC665 : Introduction 20 / 22
You only look once (YOLO)
The approach involves a single neural network trained end-to-end that
takes a image as input and predicts bounding boxes and class labels for
each bounding box directly.
1
The model works by first splitting
2
Each grid cell predicts a bounding box
the input image into a grid of involving the x, y coordinate and the width and height
cells, where each cell is responsible
For example, an image may be divided into a 7 × 7 grid
for predicting a bounding box if and each cell in the grid may predict 2 bounding boxes,
the center of a bounding box falls resulting in 98 proposed bounding box predictions.

within it.

4
3
Each grid cell predicts a bounding box
The class probabilities map and the bounding boxes with
confidences are then combined into a final set of
bounding boxes and class labels.
involving the x, y coordinate and the width and height
and the confidence.

A class prediction is also based on each cell.

(DL, Dr. Ashish Gupta) ULC665 : Introduction 21 / 22

(DL, Dr. Ashish Gupta) ULC665 : Introduction 22 / 22

Unit 3
No ratings yet
Unit 3
45 pages
Presentation (Theoretical Evaluation)
No ratings yet
Presentation (Theoretical Evaluation)
107 pages
Object Detection
No ratings yet
Object Detection
76 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Faster R-CNN - Deep Dive Into Object Detection
No ratings yet
Faster R-CNN - Deep Dive Into Object Detection
31 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
Od Segment 221219 043435
No ratings yet
Od Segment 221219 043435
40 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
No ratings yet
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
14 pages
7 11 - Apr - DL
No ratings yet
7 11 - Apr - DL
82 pages
Region-Based Object Detection and Classification Using Faster R-CNN
No ratings yet
Region-Based Object Detection and Classification Using Faster R-CNN
6 pages
Blockchain Hacking Preview
100% (1)
Blockchain Hacking Preview
37 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Face Detection With The Faster R-CNN
No ratings yet
Face Detection With The Faster R-CNN
6 pages
Deep Learning Algorithms For Object Detection
No ratings yet
Deep Learning Algorithms For Object Detection
43 pages
Object Detection
No ratings yet
Object Detection
96 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
The Framework For Object Detection: Generalized R-CNN
No ratings yet
The Framework For Object Detection: Generalized R-CNN
127 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
DINTA Object Recognition
No ratings yet
DINTA Object Recognition
47 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
Yolo
No ratings yet
Yolo
24 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
Object Detection Report
No ratings yet
Object Detection Report
27 pages
BbbbbbbbE Project Research PPR 1-4-6
No ratings yet
BbbbbbbbE Project Research PPR 1-4-6
3 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
W11 Lecture ITS69204 Image Recognition
No ratings yet
W11 Lecture ITS69204 Image Recognition
44 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
10 R CNN
No ratings yet
10 R CNN
28 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
No ratings yet
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
6 pages
M10 - Introduction To TensorFlow, Deep Learning and Application
No ratings yet
M10 - Introduction To TensorFlow, Deep Learning and Application
25 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
No ratings yet
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
22 pages
Last Lab Report
No ratings yet
Last Lab Report
6 pages
Report 34
No ratings yet
Report 34
22 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
Sae Arp741c 2016
No ratings yet
Sae Arp741c 2016
22 pages
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
No ratings yet
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
6 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
6 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
Toyota Engineering Standard
100% (2)
Toyota Engineering Standard
10 pages
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
No ratings yet
CNN Models To Detect Multiple Leds For Multilateral Occ.: Project: Ieee P802.15 Ig Vat
9 pages
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
No ratings yet
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
4 pages
Surveillance Systems
No ratings yet
Surveillance Systems
17 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Object Detection
No ratings yet
Object Detection
57 pages
Grade 6 Scheme 2020term 1
No ratings yet
Grade 6 Scheme 2020term 1
158 pages
IMINT Target Acquisition Using Deep Learning
No ratings yet
IMINT Target Acquisition Using Deep Learning
5 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Starzplay Dec Data
No ratings yet
Starzplay Dec Data
423 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Specifiying Technology Readiness Levels For The Chemical Industry 2019 Buchner
100% (1)
Specifiying Technology Readiness Levels For The Chemical Industry 2019 Buchner
13 pages
Acsai Test Score
No ratings yet
Acsai Test Score
3 pages
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
No ratings yet
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
53 pages
Forklift Inspection
No ratings yet
Forklift Inspection
4 pages
Genene Proposal
No ratings yet
Genene Proposal
32 pages
G Suite Interview Questions
No ratings yet
G Suite Interview Questions
7 pages
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
No ratings yet
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
5 pages
PanduitProductDetails UTP28SP2MBU
No ratings yet
PanduitProductDetails UTP28SP2MBU
2 pages
Circulation
No ratings yet
Circulation
56 pages
Lecture 01.1 Introduction To Website Development
No ratings yet
Lecture 01.1 Introduction To Website Development
22 pages
BCI Patient Monitoring Catalogue Goodwin
No ratings yet
BCI Patient Monitoring Catalogue Goodwin
36 pages
Individual Accomplishment Report 10
No ratings yet
Individual Accomplishment Report 10
5 pages
Q6 6r80manual PDF
No ratings yet
Q6 6r80manual PDF
49 pages
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
100% (1)
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
3 pages
Technical Manual: Includes
No ratings yet
Technical Manual: Includes
13 pages
Welding
No ratings yet
Welding
3 pages
Manual de Usuario Suzuki Grand Vitara (2008) (337 Páginas)
No ratings yet
Manual de Usuario Suzuki Grand Vitara (2008) (337 Páginas)
2 pages
Cyber Insurance Policy
No ratings yet
Cyber Insurance Policy
4 pages
Pointers Reviewer For Second Periodical Exam
No ratings yet
Pointers Reviewer For Second Periodical Exam
2 pages
Toshiba 500gb Dt01aca Dt01aca050!3!5 Internal Hard Hdkpc01 282179 User Manual
No ratings yet
Toshiba 500gb Dt01aca Dt01aca050!3!5 Internal Hard Hdkpc01 282179 User Manual
2 pages
A Master Gunmakers Guide To Building Bolt-Action Rifles
97% (33)
A Master Gunmakers Guide To Building Bolt-Action Rifles
153 pages
Matlab Demo Instructions
No ratings yet
Matlab Demo Instructions
1 page
1.7.1.8 Flow Switch - 2
No ratings yet
1.7.1.8 Flow Switch - 2
3 pages
Application For Nda Alumni Association: Affix Photograph
No ratings yet
Application For Nda Alumni Association: Affix Photograph
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet