0% found this document useful (0 votes)
1 views5 pages

Object Detection Using Deep Learning

The document discusses advancements in object detection using deep learning, highlighting the limitations of traditional methods and the benefits of deep learning architectures like Faster R-CNN. It details various techniques for improving object detection accuracy, such as using Bayesian optimization and subcategory-aware CNNs, and outlines a proposed system for real-time object tracking. The paper also describes the modular design of the system, including dataset generation, model training, and testing processes.

Uploaded by

Koushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views5 pages

Object Detection Using Deep Learning

The document discusses advancements in object detection using deep learning, highlighting the limitations of traditional methods and the benefits of deep learning architectures like Faster R-CNN. It details various techniques for improving object detection accuracy, such as using Bayesian optimization and subcategory-aware CNNs, and outlines a proposed system for real-time object tracking. The paper also describes the modular design of the system, including dataset generation, model training, and testing processes.

Uploaded by

Koushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072

Object Detection using Deep Learning


Sriram S1, Yogesh K2, Santhosh Kumar D3, Gayathri R4
1,2,3Undergraduate, Department of Computer Science and Engineering,
4Assistant Professor, Department of Computer Science and Engineering,
Sri Venkateswara College of Engineering, India.
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - It has received much research attention in the objects, its drawbacks are also evident. It is
recent years because of the close relationship between computationally costly due to a large number of candidate
object detection and video analysis and image windows and generates too many redundant windows.
comprehension. Traditional methods of object detection are However, if only a set number of sliding window models
based on hand-crafted features and architectures that are are used, the regions are unsatisfactory could be
flawlessly trainable. Their performance stagnates easily by generated.
constructing complex ensembles that combine multiple low- 1.2 Feature Extraction
level image features with high-level context from object
detectors and scene classifiers. With the rapid growth of
Object recognition is the definition of a set of related
deep learning, more efficient techniques are implemented to
machine views Tasks that include tasks such as digital
solve the problems inherent in conventional architectures,
picture recognition of objects. Classification of
capable of learning semantic, high-level, and deeper
photographs involves tasks such as one-class estimation
features. These models act differently in the context of
object in the image. Localization of artifacts refers to the
network design, training strategy, and optimization.
location of one or more picture artifacts drawing an
abundant box around their picture Extension. Target
Key Words: Deep Learning, Object Detection, Network detection blends these two features and Localizes one or
Design. more objects on an image and classifies them. When a
customer is active or Practitioner refers to the word
1. INTRODUCTION "object recognition" and they mean "object detection".

We should not get a full understanding of the picture. Object detection in Computer Vision is a challenging and
Concentrate not on the description of different images but exciting task. Detection can be hard because there are all
Often seek to guess exactly the definitions and positions of sorts of differences in orientation, Lighting, context, and
every image contain artifacts. This function is an entity occlusion, which can lead to entirely different Images of
Detection, normally composed of different subtasks. exactly the same thing. Now with the development of deep
Examples include face recognition, pedestrian recognition, learning and Neural network, eventually, we can solve
and finding skeletons. As one of its foundation's Problems these problems without coming up with them various real-
with computer vision, object recognition can have time heuristics. We had the Faster R-CNN model
Valuable knowledge for image semblance comprehension developed and trained on the platform for Tensorflow
and images. In many applications, including the Deep learning. The Region-based Faster RCNN, a tool for
classification of pictures, the study of human behaviour, detecting neural networks. First, we used a network of
Autonomous driving, and face recognition. Meanwhile, the area proposals (RPN) to produce ideas for detection[1]-
advancement in these areas, inherited from neural [6].
networks and related learning systems, should improve 2. LITERATURE SURVEY
neural network algorithms and also have significant
impacts on object detection techniques that can be
2.1 Improving object detection with deep
considered as learning systems. However, due to broad
variations in positions, poses, occlusions, and lighting convolutional networks via Bayesian
conditions, it is challenging to perform object detection optimization and structured prediction
correctly with an additional function of the object's
position. According to Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H.
Lee, Object detection systems based on the profoundly
1.1 Informative region selection convolutional neural network (CNN) have recently made
ground-breaking progress on many benchmarks for object
Informative region selection. As different objects can detection. Although the characteristics learned from these
appear at any image location and have various aspect high-capacity neural networks are egalitarian for
ratios or sizes, scanning the entire image with a multi- categorization, a major source of detection error is still
scale sliding window is a reasonable choice. Although this inaccurate localization. Built on high-capacity CNN
exhaustive strategy can evaluate all possible locations of architectures, we answer the position problem by 1) using
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2278
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072

a Bayesian optimization search algorithm which 2.5 Face detection using deep learning: an
sequentially proposes candidate regions for an object improved faster RCNN approach
bounding box, and 2) training the CNN with a formal loss
that specifically penalizes the inaccuracy of the According to X. Sun, P. Wu, and S. C. Hoi, via tuned several
position[1]. key hyper-parameters in the Faster RCNN architecture,
where they have found that, among others, the most
2.2 Subcategory-aware convolutional neural crucial one seems to be the number of anchors in the RPN
networks for object proposals and detection part. Traditional Faster RCNN uses nine anchors, which
sometimes fails to recall small objects. For face detection
tasks, however, small faces tend to be fairly common,
According to P. Druzhkov and V. Kustikova, in methods of
especially in the case of unclear face detection. Therefore,
detection of artifacts based on CNN, area proposal
instead of using the default setting, we add a size group of
becomes a bottleneck when artifacts show large variance 64 × 64, thus increasing the number of anchors to 12 and
in size, occlusion, or truncation. Moreover, these methods proposed a new method for face detection using deep
concentrate primarily on 2D object detection and cannot learning techniques. We extended the state of-the-art
estimate accurate object properties. In this paper, we Faster RCNN framework for generic object detection, and
suggest subcategory-aware CNNs for the detection of proposed several effective strategies for improving the
objects. We implement a new area proposal network using Faster RCNN algorithm for resolving face detection tasks,
subcategory information to direct the proposal generation including feature concatenation, multi-scale training, hard
process, and a new detection network for joint negative mining, and configuration of anchor sizes for
identification and classification of subcategories. We RPN[5].
achieve state-of-the-art efficiency on both detection and
pose estimation on widely used benchmarks by using 2.6 Imagenet classification with deep
subcategories related to object pose[2]. convolutional neural networks

2.3 Low-complexity approximate convolutional According to A. Krizhevsky, I. Sutskever, and G. E. Hinton,


with the neural network, which lots of neurons, consists of
neural networks five convolutional layers, some of which are followed by
special layers, and three fully-connected layers with a final
Following on from P. F. Felzenszwalb, D. McAllester, R. B. 1000-way softmax, the network has learned by computing
Girshick, and D. Ramanan, they think of the question of its top-5 predictions on test images. Probing the network’s
generic detection and localization Objects in static knowledge is to consider the feature activations induced
pictures, from categories such as people or vehicles. This by an image at the last, a dimensional hidden
is a bit of a difficult question since objects in these layer.Computing similarity by using Euclidean distance
categories can differ considerably Semblance. Variations between two 4096-dimensional, real-valued vectors is
occur not only from shifts in the lighting and viewpoint inefficient, but it could be made efficient by training an
but also because of non-rigid deformations and instability auto-encoder to compress these vectors to short binary
in intraclass Shape and other visual characteristics. People codes.Results show that a large, deep convolutional neural
wear varying clothing, for example, and take a variety of network is achieving record breaking results on a highly
poses as the cars come in various shapes and colors[3]. challenging dataset. It can be noted that our network’s
performance degrades if a single convolutional layer is
removed. Note that the net can identify even artifacts that
2.4 Object detection via a multi-region and are off-center, such as the mite in the top-left. Trying out
semantic segmentation-aware cnn model the visual awareness of the network is to find the function
activations that an image induces at the last, 4096-
According to S. Gidaris and N. Komodakis, we propose a dimensional hidden layer[4].
method for object detection that relies on a profoundly
convolutional neural network (CNN) of multi-region that 3. PROPOSED WORK
also encodes semantic segmentation-aware features. The The proposed system is designed for multiple moving
resulting CNN-based representation attempts to capture a tracking in real time. We use the bounding rectangular box
diverse collection of discriminative appearance variables for labelling the objects. The initial stage of the system
and exhibits sensitivity to localization, which is important starts with the collection of images and generation of the
for the precise location of objects. By implementing it on train and test dataset. The Cnn model comprises of the
an iterative localization system that alternates between convolutional layer and pooling layer with regional
scoring a box proposal and refining its position with a propositional network for region generation. The feature
deep CNN regression model, we leverage the above- maps are generated from the input image and fed into RoI
mentioned properties of our recognition module. Thanks layer with the regions generated. The output of the system
to the efficient use of our modules, we are detecting provides labelling of the objects in the test image with the
objects with very high precision in localization[6]. representation of the rectangular anchor boxes. The
system also provides labelling of overlapping of objects
based on the region mapped with the image.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2279
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072

Fig.1 Architecture Diagram


3.1 Modules
The structure of the overall system can be defined using
Modular design. Modularity is a common practice typically
described as the extent to which a system’s segments may
be divided and recombined. The system consists of the Chart 2. Training Iteration
following modules:
global_step/sec - global_step represents the specific
a. Generation of Training and Testing Dataset iteration while training the object detection model with
b. Generation of Training Model the number of batches processed per second.
c. Testing the Model
3.1.3 Generation of Testing the model
3.1.1 Generation of the Training and Testing
The Trained model is loaded as the protobuf file as ‘.pb’
The Faster R-CNN model is configured with an object and the labels of the trained images are loaded for
dimension of 600 x 1024.The maximum detection per providing the labels of the test image. The test image is
class set to 70. The training record is generated from the converted as feature maps for detecting the objects in the
training dataset, the maximum number of classes detected image. The feature maps are then combined with a
around 27. The Label file for the classes is generated and regional network for generating equal dimensional feature
the training model is built with the labeled records. The maps. The objects that are present in the image are
model is trained with the generation of protobuf file. labelled with the anchor boxes for live object tracking.

3.1.2 Generation of Training the model 3.2 Model Implementation

The images are collected from the COCO repository. The 3.2.1 Faster regional convolutional Neural
gathered images are labeled using the LabelImg tool. The network
labeled images are then converted to a CSV file using the
XML to CSV tool. For training and testing, 80% of the Faster R-CNN (frcnn for short) makes further progress
dataset is used for training and the remaining 20% of the than Fast RCNN. The selective research process is to
dataset is used for testing to check the accuracy. restore by Region Proposal Network (RPN).R-CNN is the
first step for Faster R-CNN. It uses a particular search to
20 finds out the regions of interests and passes them to a
Convolutional Neural Network. The procedure is related to
the R-CNN algorithm. Instead of serving the region
proposals to the CNN, we supply the input image to the
CNN to generate a convolutional feature map. From the
convolutional feature map, we recognize the field of plans
and warp them within squares, and by using an RoI
pooling layer, we reshape them toward a firm size so that
it can be served into an utterly connected layer. From the
Chart 1. Classification Loss RoI feature vector, we utilize a softmax layer to
prognosticate the class of the stated field and the offset
The loss occurred during classification of the image values for the bounding box. The reason “Fast R-CNN” is
category among the available image data. Ideally the loss quicker than R-CNN is because you don’t have to serve
should be less for a good trained cnn model. 2000 area plans to the convolutional neural network each
time. Alternatively, the convolution process is performed

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2280
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072

solely once per image, and a feature map is produced from commonly evaluated using the data in the matrix. It is
it. represented in eqn.(1).

3.2.2 Softmax

The softmax classifier provides "probabilities" for each (1)


class. Unlike the SVM, which measures uncalibrated and
challenging to elucidate rates for all classes, the Softmax Where TN is True Negative, FP is False Positive, FN is
classifier permits us to estimate "chances" for all tags. For False Negative, and TP is True Positive.
illustration, given an image, the SVM classifier might
provide you rates [12.5, 0.8, -23.0] for the types "rat, "cat," Classification Accuracy: It is the proportion of the total
and "ship." The softmax classifier can preferably measure number of prognostications, which was correct. It is
the three tags' possibilities as [0.9, 0.09, 0.01], which represented in eqn.(2).
enables you to evaluate its reliance in each class. In both
cases, we estimate the equal score vector f (e.g., by matrix
multiplication in this section). The variation is in the
analysis of the scores in f: The SVM describes these as (2)
class scores, and its dropping function supports the right
class (class 2, in blue) to have a score higher by a margin Positive Predictive Value or Precision: The proportion
than the 22 other class scores. The Softmax classifier of affirmative cases which is correctly identified.
alternatively explains the scores as (unnormalized) log-
likelihoods for every class. It then encourages the F1 Score: F1 score unites recall and precision compared
(normalized) log probability of the correct class to be high to a specific positive class -The F1 score can be expounded
(equivalently the negative of it to below). The final loss for as a weighted mean of the recall and precision, where an
this case is 1.58 for the SVM and 1.04 for the Softmax F1 score stands its best value at 1 and worst at 0. It is
classifier, but note that these numbers are not represented in eqn.(3).
comparable; they are only meaningful concerning loss
computed within the same classifier and with the same
data.

3.3 Data Evaluation Module (3)

3.3.1 Train-Test Split 4. CONCLUSION

The dataset is split into 2 parts before modelling. They In this paper, an accurate and efficient object detection
are Training Dataset and Testing Dataset. The model is system has been developed which achieves comparable
trained using the Training Dataset and in order to identify metrics with the utilization of the Faster CNN. This project
that the model is getting trained appropriately, the model uses recent techniques within the field of computer vision
is again tested using the Testing Dataset. and deep learning. A custom dataset was created using
labelImg and also the evaluation was consistent. This
could be employed in real-time applications that require
Here, we have split the data in the ratio 70:30, i.e 70% of
object detection for pre-processing in their pipeline. A
the dataset is given as a Training dataset and 30% is given
crucial scope would be to coach the system on a video
as testing dataset. This split is done by importing the
sequence for usage in tracking applications. The addition
library from scikit-learn,
of a temporary constant interface would facilitate smooth
sklearn.model_selection.train_test_split.
detection and more optimal than per-frame detection.
3.3.2 Metrics employed for analysis 5. FUTUREWORK
In this research, the metrics used for assessing the model
Discovering of object is a very time exhausting process to
are Confusion Matrix, Accuracy, and Classification Reports
draw large quantities of bounding boxes manually. To
from sklearn.metrics.
release this burden, semantic prior unsupervised object
discovery multiple instance learning and deep neural
A confusion matrix determines the number of true and network prediction can be integrated to make the best use
false prognostications created by the classification model of image-level supervision to cast object category tags to
differentiated to the real outcomes (target value) in the similar object regions and improve object limits.
data. The matrix is NxN, where N is the number of target Furthermore, this model is loaded into android, and
values (classes). The performance of such models is objects are detected in mobile camera, and those object
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2281
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072

names are spelled out by the voice assistant API in the app
which is helpful for blind people to navigate oneself.

REFERENCES

[1] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee,


“Improving object detection with deep convolutional
networks via bayesian optimization and structured
prediction,” in CVPR, 2019
[2] P. Druzhkov and V. Kustikova, “Subcategory-aware
convolutional neural networks for object proposals
and detection,” in WACV, 2018.
[3] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D.
Ramanan,
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural
networks,” in NIPS, 2012.
[5] X. Sun, P. Wu, and S. C. Hoi, “Face detection using deep
learning: An improved faster rcnn approach,”
arXiv:1701.08289, 2017.
[6] S. Gidaris and N. Komodakis, “Object detection via a
multi-region and semantic segmentation-aware cnn
model,” in CVPR, 2015.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2282

You might also like