0% found this document useful (0 votes)
21 views

Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/370988885

Object Detection using You Only Look Once (YOLO) Algorithm in Convolution
Neural Network (CNN)

Conference Paper · April 2023


DOI: 10.1109/I2CT57861.2023.10126213

CITATIONS READS

2 644

6 authors, including:

Jayshree Das
BVRIT
10 PUBLICATIONS 3 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jayshree Das on 04 July 2023.

The user has requested enhancement of the downloaded file.


2023 IEEE 8th International Conference for Convergence in Technology (I2CT)
Pune, India. Apr 7-9, 2023

Object Detection using You Only Look Once


(YOLO) Algorithm in Convolution Neural Network
(CNN)
Meghana Pulipalupula Srija Patlola Mahesh Nayaki
Department of Electronics and Department of Electronics and Department of Electronics and
Communication Engineering Communications Engineering Communications Engineering
B.V. Raju Institute of Technology, B.V. Raju Institute of Technology, B.V. Raju Institute of Technology,
Medak, India Medak, India Medak, India
[email protected] [email protected] [email protected]

Manoj Yadlapati Jayshree Das B.R. Sanjeeva Reddy


Department of Electronics and Department of Electronics and Department of Electronics and
Communication Engineering Communication Engineering Communication Engineering
B.V. Raju Institute of Technology, B.V. Raju Institute of Technology, B.V. Raju Institute of Technology,
Medak, India Medak, India Medak, India
[email protected] [email protected] [email protected]

Abstract— A useful tool to utilize would be a computer vision linear SVM for object identification, and a regression model
technique that enables us to recognize and find items in an for bounding boxes. Utilizing so many models lengthen the
image or video. To meet the demand, many algorithms, runtime. The outcome takes about 45-50 seconds to acquire.
including Region based Convolution Neural Network (RCNN)
Fast Region based Convolution Neural Network (FRCNN), are B. Fast-RCNN
available. However, in this instance the You Only Look Once RCNN leverages past work and DCNN to finish this task
(YOLO) V3 technique is suggested to be used for object to provide effective results quickly. Fast RCNN is made up of
detection. This program can instantly find and identify different a CNN that has had its final pooling layer replaced by a "ROI
objects. The class probabilities of the discovered photos are pooling" layer, and whose final FC layer is replaced by two
provided by the object identification process in YOLO, which is branches: a (K + 1) category softmax layer branch and a
conducted as a regression problem. Both Python and OpenCV
category-specific bounding box regression branch. The
used for implementation of the work. The observation of the
work results more accuracy of object detection using YOLO
strategy is comparable to the RCNN algorithm. However, we
algorithm. send the input picture to the CNN to create a convolutional
feature map rather than feeding it the region
Keywords— Computer Vision, Neural Network, Open CV, You recommendations. The regions of suggestions are located
Only Look Once from the convolutional feature map, warped into squares
using a RoI pooling layer, and then reshaped into a fixed size
I. INTRODUCTION to be input into a fully connected layer [4].
There are several methods for object detection, including C. Single Shot Detection (SSD)
RCNN and Fast RCNN. Even though these techniques have
SSD is a detector with a single shot. It forecasts the border
overcome the constraints of data restriction and modelling in
boxes and the classes in a single pass without the need of a
object detection, they cannot find objects in a single algorithm
delegated region proposal network. SSD adds two new
run. While other algorithms may require many runs to detect
features to increase accuracy: tiny convolutional filters to
an item, YOLO can do so in only one. You Only Look Once,
predict object classes and offsets for standard border boxes.
or YOLO. To identify objects, the approach requires just a
Although the precision is said to be state-of-the-art, the
single forward propagation through a neural network, as the
complete process only moves at 7 fps. Much less than what
name implies [1].
real-time processing demands. By removing the requirement
The technology of autonomous driving also relies heavily for the regional proposal network, SSD accelerates the
on object detection. Many automakers employ it in process. Several adjustments, such as multi-scale features and
conjunction to image recognition software to enable AI default boxes, are made by SSD to make up for the accuracy
sensors in their vehicles to operate safety, identify traffic in decline [5].
the past, produce 3D maps, and navigate without a driver [2].
D. You Look Only Once (YOLO)
II. VARIOUS ALGORITHMS YOLO algorithm uses convolutional neural networks
(CNN) for detecting objects. As name suggests, for detecting
A. Region based Convolutional Neural Network (RCNN)
objects it only requires a single forward propagation. Which
The RCNN is a machine learning model used in image means the detection of the object is completed in only single
processing and computer vision. By drawing borders around run. The use of CNN is for simultaneously detecting different
items in a picture, RCNN's primary objective is to identify the class probabilities and bounding boxes. YOLO works in a
things there [3]. The disadvantages of RCNN include the way that the blocks that are left over are separated into
employment of three models: CNN for character extraction, different grids in the illustration. Size of each grid is S x S.

979-8-3503-3401-2/23/$31.00 ©2023 IEEE 1


The grids that are created from an image that is given as input
are displayed in the following image. To forecast the height,
breadth, centre, and class of objects, YOLO use a regression
of single box. The likelihood of an object appearing in the
bounding box is depicted in the above graphic. Intersection
Over Union (IOU) - In object detection, intersection over
union (IOU) defines the overlap of boxes. YOLO uses IOU to
create an output box that properly encircles the items [6].
III. ALGORITHM DEVELOPMENT
The section gives the detailed information on the software
and hardware fragments. These fragments have been used for
detection of various classes of objects.
A. Hardware Used
A laptop which is having a usable camera and a GPU is
needed for the detection. Using camera video is taken as input
and GPU is needed for the image processing to be performed
in an efficient way.
Fig. 1. Block Diagram of Object Detection
B. Software Used
1) Anaconda The system is provided a picture, and with the aid of
Anaconda is a distribution in R and Python for scientific OpenCV, the input is processed to perform object detection
computing with the goal of streamlining package management and identification [8].
and deployment.
Analysis:
2) NumPy
The categorization of the item is carried out based on coco
NumPy is used to perform numerous operations of names where coco is a pre-defined dataset, which will include
mathematics. The library it provides to perform mathematical the information of several classes once the object has been
operations that are high level mathematical functions on detected based on the classes.
arrays and matrices. The phase at which it performs the tasks
depends on strong data structures. E. Equations
3) Open CV- by = sigma(ty) +
A library which allows to carry out image processing Cy bx = sigma(tx)
along with computer vision. The features provided in this are + Cx bh = ph * et
tracking, face detection and object detection. bw = pw *
etw where:
C. Training and Testing of Dataset (bx, by) represent the center position of bounding box
For the detection of the object that object class need to be in the image.
trained and tested. In this work the dataset used is predefined. Bounding box width is represented as
For detecting the object class that is not present in the pre- bx Bounding box height is represented
defined class, the data sets are collected which are the images as by Cell’s top left corner of anchor
of the object. These images are stored and labeled using box.
labelling software. These images are saved, and a text file is
also created which will contain the values of the corners of the
box which is used to detect the object. The accuracy of the A bounding box in Yolo algorithm is denoted by the four
algorithm depends on the number of images that are collected variables [ y center, x center, height, width]. Centre of the
in the dataset. bounding box is located at x center and y center, which are
normalized coordinates. As the centre of the bounding box on
If these images are in large quantity, then the accuracy of the x- and y-axes, the pixel values of x and y are used to
the algorithm will increase, if there are few images in the normalize coordinates. The value of x is divided by the
dataset then the accuracy of the detection not up to the mark. image's width, while the value of y is divided by its height.
After labelling the images, the dataset needs to be trained. The dimensions of bounding box are represented with height
Other dataset is used for testing which contain less images and width. Additionally, they become typical.
compared to the training dataset. The accuracy will false or
biased if the same dataset used for training and testing [7]. Pr(Classi | Object) * Pr(Object) * IOUtruth
= Pr(Classi) * IOUtruth
D. Object Detection: For each bounding box, the confidence score can be taken
An input is given as image which will be converted into as output of neural network. Final output is based on
1024 x 1024 by dividing into boxes called grids, then the confidence levels of boxes which are having highest
image processing is done on the image. The object will be confidence [9].
detected if that class is present in the dataset. After detection
of object is done bounding box is created around object and F. Object Detection using YOLO Algorithm
the confidence level will be There are two sorts of YOLO weights that are used for
detection: YOLO weights and YOLO tiny weights. Because it

2
has more characteristics and is easier to identify, YOLO IV. REAL TIME OBJECT DETECTION
weights are employed in this. Once the detection criteria is satisfied, used the code in
Then, after receiving a picture as input, the image is real time environment for detection of several classes.
separated into 1024x1024 squares. However, this technique is restricted to a few classes.
For the image, bounding boxes are constructed that will be A. Observations
applied to the confidence level for the item.  The YOLO V3's weight is important since the
Only if the object is present in the training and test data accuracy also depends on the number of
set will it be recognized. characteristics the image is subjected to.

The result is a box encircling the object and an image on  The size of the picture affects how quickly things are
which the confidence is based. Here Mean Average detected. The conversion process takes less time if the
Precision(mAP) will be used to know the confidence level of image's converted size is minimal. The length of time
the detection in a scale of 0-1. required for detection increases if the size of the
image is large when being converted.
B. Results
Fig. 3: Bird Class detectBird class is detected with a
confidence level of 0.87 .

Fig. 3. Bird Class detection

Fig. 2. Process of Object Detection using YOLO Algorithm


Fig. 4. Person Class Detection
G. Comparison of the detection models
Person Class is detected with 0.69 confidence level.

TABLE I. COMPARISON OF DIFFERENT ALGORITHMS


Model mAP FPS

SSD500 46.5 19

FRCNN 59.1 6

RFCN 51.9 12

YOLOV3 60.6 20

This is the compression table of several algorithms YOLO


V3 outperforms them. It is best because the precision and
frames per second are major parameters for detection to be Fig. 5. Person and umbrella class detection
efficient [10].
Person and umbrella class is detected with confidence
level that is almost 1.

3
VI. FUTURE SCOPE:
The use of weapons for illicit purposes is growing in
popularity nowadays. One such activity is mass shooting. It
can be possible to rapidly locate the weapons by employing
YOLO. The future work that will be implemented in this
effort will expand object detection into specialized domains
like the autonomous field, the military field etc.
The results of this study can be utilized to enhance the
accuracy of the detection of distinct types of weapons, such as
Fig. 6. Apple class detection rifles and handguns. They can also help in preventing the
mass shootings that have been happening in the recent past.
Apple class is detected with different confidence level.
REFERENCES
[1] Zihong Shi, “Object Detection Models and Research Directions.”
IEEE International Conference on Consumer Electronics and
Computer Engineering (ICCECE) 2021.
[2] Abhishek Sarda, Shubhra Dixit, Anupama Bhan, “Object Detection for
Autonomous Driving using YOLO algorithm.” 2021.
[3] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. “Region-
based convolutional networks for accurate object detection and
segmentation.” IEEE Transactions on Pattern Analysis and Machine
Intelligence Volume: 38, Issue: 1, 01 January 2016.
[4] Ross Girshick, “Fast R-CNN.” IEEE International Conference on
Computer Vision, 2015.
Fig. 7. Fire Hydrant Class Detection
[5] Qianjun Shuai, Xingwen Wu. “Object detection system based on SSD
algorithm.” IEEE International Carnahan Conference on Security
Fire Hydrant class is detected with confidence level of 1. Technology October 2020.
[6] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. “You
V. CONCLUSION Only Look Once: Unified, Real-Time Object Detection.” IEEE
In comparison to Fast R-CNN and Retina-Net and other Conference on Computer Vision and Pattern Recognition (CVPR) June
2016.
object identification methods, this method yields results better
[7] Muhammed Ku¨rs¸ad Uçar, Majid Nour, Hatem Sindi, and Kemal
for object detection. Polat, “The Effect of Training and Testing Process on Machine
The observation is that the accuracy of the image will Learning in Biomedical Datasets.” May 2020
depend on the quality of the image irrespective of whether it is [8] Y Amit, P Felzenszwalb, R Girshick – “Object Detection” .2020.
color image or black and white images. [9] Zhong-Qiu Zhao; Peng Zheng; Shou-Tao Xu; Xindong Wu, “Object
Detection with Deep Learning: A Review.” Volume: 30, Issue: 11,
Using this algorithm any class can be detected, the only IEEE Transactions on Neural Networks and Learning Systems January
thing that need to be do is to train and test the dataset in which 2019.
the class images are available. Limitation of this detection is [10] Harsh Jain, Aditya Vikram, Mohana, Ankit Kashyap, Ayush Jain,
Weapon Detection using Artificial Intelligence and Deep Learning for
that it cannot detect small objects unless you train the similar Security Applications. IEEE International Conference on Electronics
objects to the algorithm. and Sustainable Communication Systems (ICESC), 2020.

View publication stats

You might also like