0% found this document useful (0 votes)

15 views

Object Detection With Deep Learning_ A Review Summary

Uploaded by

RAFIQ FREELANCING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Object Detection With Deep Learning_ A Review Summary

Uploaded by

RAFIQ FREELANCING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Object Detection With Deep Learning: A

Review Summary

Deep learning-based object detection is an effective and fastest way to

predict, recognize, and identify the exact object location in an image. It
consists of different subtasks like face detection, pedestrian detection,
and skeleton detection. Object detection provides the data for semantic
understanding of images and videos which are associated with different
applications such as image classification, face recognition, human
behavior estimation, and autonomous driving.

The system is inherited and related to the neural network and other
corresponding learning strategies, the improvement in this pasture will
develop the algorithm of neural networks and will heavily impact the
object detection system that can be evaluated as a learning method.

However, object detection with an extra object localization task is difficult

due to the lighting conditions, poses, occlusions, and considerable
variations in viewpoints. Though much concentration has been given to
this field in recent years, it has problems like object localization and
classification. Therefore, the traditional object detection prototype can be
outlined in three phases: informative region selection, feature extraction,
and classification.

A. Informative Region Selection

It means scanning the most informative region of the objects from

different objects with different sizes and ratios using a multi-scale sliding
window.

Though this strategy can detect all possible object locations it has some
shortcomings like expensiveness due to many candidate windows and
redundant windows. Regardless, limiting the sliding window templates
may produce unsatisfactory regions.

B. Feature Extraction

Feature extraction includes extracting visual features of the object to get

a semantic and robust representation. The representative features are a
scale-invariant feature transform, histograms of oriented gradients(HOG),
and Haar-like. These features can generate representations like human
brain cells. However, designing a manual robust features descriptor for all
kinds of objects is difficult for the diversity of appearances, backgrounds,
and illumination conditions.

C. Classification

A classifier is important to differentiate the target object from another

object to make the representation more informative, hierarchical, and
semantic. The supported vector machines( SVM), AdaBoost, and
deformable part-based model (DPM) are some of the best object
classifiers. Among those classifiers, DPM is the most flexible one to apply.

Among different local features, descriptors and shallow learnable

architecture state-of-the-art developed a real-time embedded system on
PASCAL visual object classes (VOC) that has less burden on hardware.
Some success has been obtained in the field during 2010-2012 but that
has some remarkable limitations like bounding boxes (BBs) and failure to
bridge the semantic gap. The deep neural networks (DNNs) obtained
significant gains with convolutional neural networks (CNNs) features(R-
CNN). The DNNs and CNNs stunt differently than the traditional
approaches.

Our assignment will illustrate the brief history of deep learning, basic CNN
structure, generic object detection architecture, and CNN application
reviews including object detection, face detection, and pedestrian
detection. At last, there will remain some future guidelines and concluding
remarks.

Brief Overview Of Deep Learning

Before proceeding to the deeper discussion on deep learning-based object

detection approaches, the history of deep learning, and the introduction
and pros of CNN are represented.

A. History: Birth, Decline, and Prosperity

The journey of neural networks started in the 1940s and became

popular in the 1980s and 1990s with backpropagation algorithms by
Rumelhart. The intention of discovering neural networks was to
simulate the human brain systems to solve general learning
difficulties. Neural networks lost their popularity in the early 2000s
due to the lack of training and other limitations. Deep learning
regained its popularity in 2006 with the innovation of speech
recognition. Deep learning is assigned to the below factors:

1) The emergence of large-scale annotated training data like

ImageNet exhibits its huge learning capacity.
2) The quick progress of high-performance parallel computing
systems like GPU clusters.
3) Advancement in the design of network structures and training
strategies to remove overfitting problems with data
augmentation and dropout. The batch normalization ( BN)
made the DNNs training efficient and AlexNet, Over feat,
GoogLe Net, Visual Geometry Group(VGG), and Residual Net
(ResNet) studied vastly to improve the performance.

B. Architecture and Advantages of CNN

The CNN architecture is referred to as VGG16 and each layer of it is

known as a feature map. The feature map of the input layer is a 3-D
matrix with pixel intensities for different colors and channels(e.g., RGB).
Transformations like filtering and pooling can be conducted on feature
maps to convolute filter matrices.

Pooling like max pooling, average pooling, L2 pooling, and local contrast
normalization summarises the receptive field response to create a more
robust feature description. The VGG16 has 13 convolutional layers, 3 FC
layers, 3 max-pooling layers, and a softmax classification layer.

The advantages of CNN over traditional methods are:

1) Hierarchical feature representation.

2) Deeper architecture with exponentially increased expression
capacity.
3) Opportunity to do several tasks together like R-CNN classification
and bounding box regression.
4) Larger learning capacity and high dimensional data transform
ability.

For those advantages, CNN is widely used in different research fields such
as image super-resolution reconstruction, image classification, image
retrieval, face recognition, pedestrian detection, and video analysis.

III. Generic Object Detection:

It is used to locate and classify objects in an image and label them with
rectangular BBs to show confidence in existence. The generic object
detection framework is classified into two groups: one of which follows the
conventional object detection pipeline( e.g. R-CNN an SPP, Fast R-CNN,
etc), and the other focus on object detection as a regression and
classification problem( e.g. Multi-box, AttentionNet, G-CNN, YOLO, single
shot, YOLOv2, etc). The interrelationships between these two pipelines are
bridged using anchors. These two methods have a vast explanation for
this the key points are given below.

A. Region proposal-Based Framework

It is a two-step process to match the attentional mechanism of the human

brain which results in a course scan of the whole scenario and focuses on
the region of interest (ROIs).

1) R-CNN: It improves the quality of the candidate BBs and receives a

deep architecture to extract high-level features. R-CNN was
proposed by Girshick and gained mean Average Precision(mAP) of
53.3% with 30% more than the previous record. PASCAL VOC shows
the R-CNN flowchart of three stages such as

a) Region Proposal Generation

b) CNN-Based Deep Feature Extraction
c) Classification and Localization

Despite the significant improvement over traditional methods, CNN has

some disadvantages too.

1) CNN requires a definite size( e.g., 227×227) of the input image.

2) Multistage R-CNN training pipeline.
3) Expensive training due to space, time, and memory storage.
4) Redundant and time-consuming procedure.

To solve the above-mentioned problems many proposals have been

introduced like geodesic object proposals, multi-scale combinatorial
grouping, etc. In addition, Bayesian optimization-based search algorithms
were introduced to solve the inaccurate localization problem by Zhang et
al.
Ouyang et al proposed deformable deep CNN (DeepID-Net) to introduce a
novel deformation-constrained pooling (def-pooling) layer to impose the
geometric penalty. The overall goal is achieved by biassing sampling to
match the statistics for the ground truth BBs with K-means clustering.

2) SPP-Net:

FC layers must take a fixed-size input. That is why R-CNN chooses to warp
or crop each region's proposal into the same size.

To solve the partly existing cropped region and unwanted geometric

distortion He et al. took the theory of spatial pyramid matching (SPM), into
consideration and proposed a novel CNN architecture named SPP-net.SPM
takes several finer to coarser scales to partition the image into many
divisions and aggregates of quantized local features into mid-level
representations. Different from R-CNN, SPP-net reuses feature maps of the
fifth Conv layer (conv5) to project region proposals of arbitrary sizes to
fixed-length feature vectors. For conv5 is 256, a three-level pyramid, the
SPP layer has a dimension of 256 × (12 + 22 + 42) = 5376.

SPP-net gains better results with a correct estimation of different region

proposals in their corresponding scales and improves detection efficiency.

3) Fast R-CNN:

To reduce the accuracy drop of very deep networks Girshick introduced a

multitask loss on classification and bounding box regression and proposed
a novel CNN architecture named Fast R-CNN. The architecture of Fast R-
CNN is Similar to SPP-net, the whole image is processed with Conv layers
to produce feature maps. The RoI pooling layer is a special case of the SPP
layer, which has only one pyramid level.

4) Faster R-CNN: In the Faster R-CNN, anchors of three scales and three
aspect ratios are adopted. With the proposal of Faster R-CNN, region
proposal-based CNN architectures for object detection can be trained in
an end-to-end way. The alternate training algorithm is very time-
consuming and RPN produces objectlike regions (including backgrounds)
instead of object instances and is not skilled in dealing with objects with
extreme scales or shapes.
5) R-FCN: Recent state-of-the-art image classification networks, such as
ResNets and GoogLeNetsare fully convolutional. With R-FCN, more
powerful classification networks can be adopted to accomplish object
detection in a fully convolutional architecture by sharing nearly all the
layers, and the state-of-the-art results are obtained on both PASCAL VOC
and Microsoft COCO data sets at a test speed of 170 ms per image.

6) FPN: Feature pyramids built upon image pyramids (featured image

pyramids) have been widely applied in many object detection systems to
improve scale invariance. FPN holds an architecture with a BU pathway, a
top-down (TD) pathway, and several lateral connections to combine low-
resolution and semantically strong features with high-resolution and
semantically weak features

7) Mask R-CNN: To solve the instant segmentation problem, parallel to

the existing branches in Faster R-CNN for classification and bounding box
regression, the Mask R-CNN adds a branch to predict segmentation masks
in a pixel-to-pixel manner. Mask R-CNN is simple to implement with good
instance segmentation and object detection results.

8) Multitask Learning, Multiscale Representation, and Con-textual

Modeling:

To tackle problems in multitasking with several proposals, it is necessary

to perform object detection with multitask learning, multiscale
representation, and context modeling to combine complementary
information from multiple sources. Multitask learning learns a useful
representation for multiple correlated tasks from the same input.

9) Thinking in Deep Learning-Based Object Detection.

Although there are different methods of deep learning, there are many
factors for continuous improvement. Still, there is a huge imbalance
between the annotated object numbers and background examples.

B. Regression/Classification-Based Framework:

One-step frameworks based on global regression/classification, mapping

straightly from image pixels to bounding box coordinates and class
probabilities, can reduce the time expense. However, there are several
types of regression such as 1) Pioneer Works, 2) YOLO, and 3) SSD.

C. Experimental Evaluation:

Experimental evaluation includes the proposal, learning method, loss

function, programming language, and platform of the prominent
architectures.

IV. SALIENT OBJECT DETECTION

Visual saliency detection is one of the most critical and challenging tasks
in computer vision, aiming to highlight the most dominant object regions
in an image. Numerous applications are incorporated to improve visual
saliency performance such as image cropping and segmentation image
retrieval, and object detection.

There are two branches of approaches in salient object detection, namely,

BU and TD. TD saliency can be viewed as a focus-of-attention mechanism,
which prunes BU's salient points that are unlikely to be parts of the object.
Deep learning is associated with:

A. Deep Learning in Salient Object Detection

B. Experimental Evaluation

V. FACE DETECTION:

Face detection and pedestrian detection are closely related to generic

object detection and are accomplished with multi-scale adoption and
multi-feature boosting forest respectively. Pedestrian and face recognition
images have a stable structure but the general images and scene images
have complex geometric structures and layouts.

Face detection is an important preprocessing procedure to face

recognition, face synthesis, and facial expression analysis. It recognizes a
large face region covering scales (30-3000 pts versus 10-1000 pts). The
most famous face detector proposed by Viola and Jonas trains cascaded
classifiers with Haar-like features and AdaBoost, achieving good
performance with real-time
Efficiency.

However, this detector may degrade significantly in real-world

applications due to larger visual variations of human faces. Different from
this cascade structure, Felzen-szwalb et al. proposed a deformable part
model (DPM) for face detection. But these traditional face-detection
methods have high computational expenses and large quantities of
annotations. In addition, their performance is greatly bounded by
manually designed features and shallow architecture.

A. Deep Learning in Face Detection

Recently, some CNN-based face detection approaches have been

proposed. Different researchers proposed various deep learning-based
face detection processes, for example, Yang et al. proposed a novel deep
learning-based face detection framework, which collects the responses
from local facial parts (e.g., eyes, nose, and mouths) to address face
detection under severe occlusions and unconstrained pose variations.

Some authors trained CNNs with other complementary tasks, such as 3-D
modeling and face landmarks, in a multitask learning manner.

B. Experimental Evaluation

The FDDB data set has 2845 pictures in which 5171 faces are annotated
with an elliptical shape. Here, two types of evaluations are used: the
discrete score and the continuous score.

VI. PEDESTRIAN DETECTION

In recent years pedestrian detection has been studied vastly which

includes pedestrian tracking, person reidentification, and robot navigation.
Before the recent progress in deep CNN (DCNN)-based techniques some
researchers combined boosted decision forests with hand-crafted features
to obtain pedestrian detectors.

A. Deep Learning in Pedestrian Detection

Although the DCNNs have outstanding performance on generic object

detection, none of these strategies have achieved better results than the
best hand-crafted feature-based method for a long time, even when part-
based information and occlusion handling are incorporated.
Zhang et al. attempted to adapt generic Faster R-CNN to pedestrian
detection. Other researchers also endeavored to combine complementary
information from multiple data sources.

R-CNN, Liu, et al. proposed multispectral DNNs-based learning DNNs for

pedestrian detection to combine complementary information from color
and thermal images.

B. Experimental Evaluation

The evaluation is executed on the most prevalent Caltech Pedestrian

dataset which was compiled from the videos of a vehicle driving through
an urban environment and consists of 250000 frames with about 2300
unique pedestrians and 350000 annotated BBs. Here the performance is
measured with the log-average miss rate(L-AMR).

VII. PROMISING FUTURE DIRECTIONS AND TASKS

Though some rapid development and progress are achieved in object

detection, there are many open ways to progress in this field, especially in
the small object detection in COCO data set and face detection. To
improve small objects localization accuracy the following aspects need to
be researched and developed.

1) Multitask Joint Optimization and Multimodal Information Fusion

2) Scale Adaption
3) Spatial Correlations and Contextual Modelling

The second scope is to reduce manual labor in real-time object detection.

The below measures can be taken in this regard:

1) Cascade Network
2) Unsupervised and Weakly Supervised Learning
3) Network Optimization

The third scope of research is to detect 2-D, and 3-D objects, and video
object detection.

Video Object Detection:

The video accuracy suffers from degenerated object impressions (e.g.,

motion blur and video defocus) in videos, and the network is usually not
experienced end to end. So video object detection is necessary.
VIII. CONCLUSION

Deep learning-based object detection has become one of the best

research hotspots in recent years for its powerful learning abilities and
advantages in dealing with occlusion, scale transformation, and
background Switches. This article is the summary of the review on deep
learning-based object detection frameworks including different
subproblems, such as occlusion, clutter, and low resolution, with different
degrees of modifications on R-CNN.

Besides, it represented a brief description, developments, analysis, and

scopes of research on neural networks and associated learning system.

Microsoft Test4prep AI-900 v2020-09-07 by Abdullah 25q
No ratings yet
Microsoft Test4prep AI-900 v2020-09-07 by Abdullah 25q
19 pages
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
No ratings yet
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
25 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
No ratings yet
An_Investigation_of_Deep_Neural_Network_based_Techniques_for_Object_Detection_an
6 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Object Detection ppt-1
100% (2)
Object Detection ppt-1
16 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Object Detection With DL
No ratings yet
Object Detection With DL
17 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
ObjectDetectionwithConvolutionalNeuralNetworks
No ratings yet
ObjectDetectionwithConvolutionalNeuralNetworks
12 pages
Overview_of_object_detection_based_on_deep_learnin
No ratings yet
Overview_of_object_detection_based_on_deep_learnin
7 pages
Object Detection Using CNN
No ratings yet
Object Detection Using CNN
5 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Real-Time Object Detection Using Deep Learning and Open CV
No ratings yet
Real-Time Object Detection Using Deep Learning and Open CV
4 pages
5-IJLEMR-77839
No ratings yet
5-IJLEMR-77839
5 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
Object Detection With Deep Learning
No ratings yet
Object Detection With Deep Learning
3 pages
Du_2018_J._Phys.__Conf._Ser._1004_012029
No ratings yet
Du_2018_J._Phys.__Conf._Ser._1004_012029
9 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
A Brief Survey and An Application of Sem
No ratings yet
A Brief Survey and An Application of Sem
38 pages
(SOTA) Deep Learning in Multi-Object Detection and Tracking State of The Art
No ratings yet
(SOTA) Deep Learning in Multi-Object Detection and Tracking State of The Art
30 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Application of Deep Learning For Object Detection
No ratings yet
Application of Deep Learning For Object Detection
12 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Region-Based Convolutional Networks For Accurate Object Detection and Segmentation
No ratings yet
Region-Based Convolutional Networks For Accurate Object Detection and Segmentation
21 pages
1 PB
No ratings yet
1 PB
8 pages
Final Report - Removed
No ratings yet
Final Report - Removed
43 pages
I Jeter 039112021
No ratings yet
I Jeter 039112021
8 pages
Comprehensive_Review_of_R-CNN_and_its_Variant_Arch
No ratings yet
Comprehensive_Review_of_R-CNN_and_its_Variant_Arch
8 pages
1-realtimeobjectdetection
No ratings yet
1-realtimeobjectdetection
6 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
An Analysis On Object Recognition Using Convolutional Neural Networks
No ratings yet
An Analysis On Object Recognition Using Convolutional Neural Networks
8 pages
IMINT Target Acquisition Using Deep Learning
No ratings yet
IMINT Target Acquisition Using Deep Learning
5 pages
Admin,+4554 Article+Text 17736 2 10 20210928
No ratings yet
Admin,+4554 Article+Text 17736 2 10 20210928
13 pages
M10 - Introduction To TensorFlow, Deep Learning and Application
No ratings yet
M10 - Introduction To TensorFlow, Deep Learning and Application
25 pages
CV Mot
No ratings yet
CV Mot
69 pages
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
No ratings yet
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
6 pages
Multi-Layered Deep Convolutional Neural Network For Object Detection
No ratings yet
Multi-Layered Deep Convolutional Neural Network For Object Detection
6 pages
Real Time Object Detection System Using Deep Learning: University Institute of Engineering, Chandigarh University
No ratings yet
Real Time Object Detection System Using Deep Learning: University Institute of Engineering, Chandigarh University
6 pages
Computer Vision 3
No ratings yet
Computer Vision 3
38 pages
Team-4 DL
No ratings yet
Team-4 DL
5 pages
Object Detection and Its Implementation On Android Devices
No ratings yet
Object Detection and Its Implementation On Android Devices
8 pages
Social Distance
No ratings yet
Social Distance
18 pages
Fin Irjmets1684232858
No ratings yet
Fin Irjmets1684232858
9 pages
Image Sorting Using Object Detection and Face Recognition
No ratings yet
Image Sorting Using Object Detection and Face Recognition
6 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
No ratings yet
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
7 pages
Vitamin Deficiency Detection(Base Paper)
No ratings yet
Vitamin Deficiency Detection(Base Paper)
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Final Project Paper Akash
No ratings yet
Final Project Paper Akash
5 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Journsl To Publish Research Paper
No ratings yet
Journsl To Publish Research Paper
15 pages
Tensor Flow
No ratings yet
Tensor Flow
5 pages
A novel model to detect and categorize objects from images by using a hybrid machine learning model
No ratings yet
A novel model to detect and categorize objects from images by using a hybrid machine learning model
13 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
ESL Conversation Questions 2
No ratings yet
ESL Conversation Questions 2
11 pages
Biologically Inspired Cognitive Architectures 2019: Proceedings of the Tenth Annual Meeting of the BICA Society Alexei V. Samsonovich 2024 scribd download
100% (1)
Biologically Inspired Cognitive Architectures 2019: Proceedings of the Tenth Annual Meeting of the BICA Society Alexei V. Samsonovich 2024 scribd download
55 pages
Term Paper On Genetic Algorithm and Its Applications
No ratings yet
Term Paper On Genetic Algorithm and Its Applications
6 pages
ETB Unit 2
No ratings yet
ETB Unit 2
23 pages
Intelligent Design of Reconfigurable Machines: Majid Tolouei-Rad
No ratings yet
Intelligent Design of Reconfigurable Machines: Majid Tolouei-Rad
5 pages
AiS Module 1
No ratings yet
AiS Module 1
5 pages
Client Target - Docx Shubh Kumar
No ratings yet
Client Target - Docx Shubh Kumar
62 pages
De-GAN A Conditional Generative Adversarial Network For Document Enhancement
No ratings yet
De-GAN A Conditional Generative Adversarial Network For Document Enhancement
12 pages
Ai Cat I
No ratings yet
Ai Cat I
3 pages
Using Matlab and Simulink For Robotics
No ratings yet
Using Matlab and Simulink For Robotics
16 pages
Van Liebergen - Machine Learning in Compliance Risk Management PDF
No ratings yet
Van Liebergen - Machine Learning in Compliance Risk Management PDF
8 pages
JJIMflaweb
No ratings yet
JJIMflaweb
16 pages
ai-900_4
No ratings yet
ai-900_4
19 pages
Stin2044 Knowledge Discovery in Databases (Group A) SECOND SEMESTER SESSION 2020/2021 (A202) Group Assignment
No ratings yet
Stin2044 Knowledge Discovery in Databases (Group A) SECOND SEMESTER SESSION 2020/2021 (A202) Group Assignment
11 pages
Iqra Connect
No ratings yet
Iqra Connect
8 pages
Swot Analysis of Companies
No ratings yet
Swot Analysis of Companies
22 pages
Vite React
No ratings yet
Vite React
5 pages
Inception-V3 For Flower Classification
No ratings yet
Inception-V3 For Flower Classification
5 pages
GAIS Post Event Report Global AI Show 2024
No ratings yet
GAIS Post Event Report Global AI Show 2024
15 pages
Improving Facial Expression Recognition Through Data Preparation and Merging
No ratings yet
Improving Facial Expression Recognition Through Data Preparation and Merging
22 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Pratham Balodi FlowCV Resume 20250321 (1)
No ratings yet
Pratham Balodi FlowCV Resume 20250321 (1)
1 page
Exploring_the_Latest_Trends_in_Artificial_Intellig
No ratings yet
Exploring_the_Latest_Trends_in_Artificial_Intellig
13 pages
AIT TASKS2 Merged
No ratings yet
AIT TASKS2 Merged
24 pages
Java Notes - TutorialsDuniya
No ratings yet
Java Notes - TutorialsDuniya
135 pages
435-Article Text-4549-2-10-20230630
No ratings yet
435-Article Text-4549-2-10-20230630
15 pages
Lazy Learners PDF
No ratings yet
Lazy Learners PDF
15 pages
COMM1120 Assessment Guide- T3 2024
No ratings yet
COMM1120 Assessment Guide- T3 2024
20 pages

Object Detection With Deep Learning_ A Review Summary

Uploaded by

Object Detection With Deep Learning_ A Review Summary

Uploaded by

Object Detection With Deep Learning: A

Deep learning-based object detection is an effective and fastest way to

However, object detection with an extra object localization task is difficult

A. Informative Region Selection

It means scanning the most informative region of the objects from

Feature extraction includes extracting visual features of the object to get

A classifier is important to differentiate the target object from another

Among different local features, descriptors and shallow learnable

Brief Overview Of Deep Learning

Before proceeding to the deeper discussion on deep learning-based object

A. History: Birth, Decline, and Prosperity

The journey of neural networks started in the 1940s and became

1) The emergence of large-scale annotated training data like

B. Architecture and Advantages of CNN

The CNN architecture is referred to as VGG16 and each layer of it is

The advantages of CNN over traditional methods are:

1) Hierarchical feature representation.

III. Generic Object Detection:

A. Region proposal-Based Framework

It is a two-step process to match the attentional mechanism of the human

1) R-CNN: It improves the quality of the candidate BBs and receives a

a) Region Proposal Generation

Despite the significant improvement over traditional methods, CNN has

1) CNN requires a definite size( e.g., 227×227) of the input image.

To solve the above-mentioned problems many proposals have been

To solve the partly existing cropped region and unwanted geometric

SPP-net gains better results with a correct estimation of different region

To reduce the accuracy drop of very deep networks Girshick introduced a

6) FPN: Feature pyramids built upon image pyramids (featured image

7) Mask R-CNN: To solve the instant segmentation problem, parallel to

8) Multitask Learning, Multiscale Representation, and Con-textual

To tackle problems in multitasking with several proposals, it is necessary

9) Thinking in Deep Learning-Based Object Detection.

One-step frameworks based on global regression/classification, mapping

Experimental evaluation includes the proposal, learning method, loss

IV. SALIENT OBJECT DETECTION

There are two branches of approaches in salient object detection, namely,

A. Deep Learning in Salient Object Detection

Face detection and pedestrian detection are closely related to generic

Face detection is an important preprocessing procedure to face

However, this detector may degrade significantly in real-world

A. Deep Learning in Face Detection

Recently, some CNN-based face detection approaches have been

VI. PEDESTRIAN DETECTION

In recent years pedestrian detection has been studied vastly which

A. Deep Learning in Pedestrian Detection

Although the DCNNs have outstanding performance on generic object

R-CNN, Liu, et al. proposed multispectral DNNs-based learning DNNs for

The evaluation is executed on the most prevalent Caltech Pedestrian

VII. PROMISING FUTURE DIRECTIONS AND TASKS

Though some rapid development and progress are achieved in object

1) Multitask Joint Optimization and Multimodal Information Fusion

The second scope is to reduce manual labor in real-time object detection.

Video Object Detection:

The video accuracy suffers from degenerated object impressions (e.g.,

Deep learning-based object detection has become one of the best

Besides, it represented a brief description, developments, analysis, and

You might also like