Enhancing Real-Time Object Detection With YOLO Alg
Enhancing Real-Time Object Detection With YOLO Alg
School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
1,2
Abstract
This paper introduces YOLO, the best approach to object detection. Real-time detection plays a significant role in various
domains like video surveillance, computer vision, autonomous driving and the operation of robots. YOLO algorithm has
emerged as a well-liked and structured solution for real-time object detection due to its ability to detect items in one
operation through the neural network. This research article seeks to lay out an extensive understanding of the defined Yolo
algorithm, its architecture, and its impact on real-time object detection. This detection will be identified as a regression
problem by frame object detection to spatially separated bounding boxes. Tasks like recognition, detection, localization, or
finding widespread applicability in the best real-world scenarios, make object detection a crucial subdivision of computer
vision. This algorithm detects objects in real-time using convolutional neural networks (CNN). Overall this research paper
serves as a comprehensive guide to understanding the detection of objects in real-time using the You Only Look Once
(YOLO) algorithm. By examining architecture, variations, and implementation details the reader can gain an
understanding of YOLO’s capability.
Copyright © 2023 G. Lavanya et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA
4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the
original work is properly cited.
doi: 10.4108/eetiot.4541
YOLO's unified detection approach include Grid Division, YOLO is a single-stage object detection model [5]. A
Anchor Boxes, Prediction Generation and Non-Maximum simple neural network predicts class and bounding box
Suppression (NMS – eliminates redundant bounding boxes). probabilities directly from the images in just one set of
First, the machine divides the given image which contains evaluations. If once the image is detected by the machine,
the object into S x S grid. The confidence score tells how the Yolo algorithm will start image processing after that it
confident the training model will be confident the box detects objects from the image using respective libraries. It
containing the object, with how much accuracy it thinks the faces errors and difficulties which appear as groups in
box has and the prediction capacity of the box how accurate detecting small objects. Object detection algorithms should
it is to predict [4]. Figure 1 represents grid division and the not only be accurate in the prediction of object class but
cell parameters. And Figure 2 represents image and object also with the location and must be incredibly fast while
classifications of a single object image. doing the process of the video processing in real-time
demands. YOLO-V2 takes out all the connected layers
(a) How it works when there are multiple objects. which are only fully or linear structured and for the
prediction of bounding boxes it introduced anchor boxes
features like multi-scale training and included higher
resolution capacity. In general, already the object detection
models having tiny objects were facing the problems of
poor performance and low precision. Instead of predicting
the coordinates of the bounding box directly from that
convolution network, it uses linear connected layers to
predict bounding boxes [6]. From figure 3, it solves issues
like low performance and precision, a model based upon a
deep learning approach which was yolo-v2 having tiny
objects, called O-YOLO-v2 (Optimized yolo v2) [7].
2. FAST YOLO: A fast you only look once Table 1. Comparison of YOLO version improvements
machine for real-time embedded object
detection in recordings
Yolo Years Improvements Total no. of
The most challenging thing in computer vision is object Versions layers in the
network
detection because it involves both image classification
which classifies the image and image localization which Yolov2 2017 Included higher Contains 5 max-
localizes the image. For achieving maximum object resolution and pooling
detection output as compared to some other approaches, the anchor boxes layers which is
by dark net and
deep neural networks (DNNs) were revealed, as all know
19 convolutional
YOLO version2 is an existing state in DNN-dependent layers.
object detection techniques in both terms of accuracy and Yolov3 2018 Performance on Increased
processing speed. Even though yolov2 can achieve real-time smaller objects number of
high performance on a powerful graphics processing unit, it
layers to 106.
remains very objectifying for holding this method for actual
time detection of objects in video on devices like embedded Yolov4 2020 Optimal speed Have 53
systems with only related computer memory or power. Here convolutional
and accuracy
there is a proposal of a new framework which is Fast
layers with
YOLO, called framework as fast You Only Look Once
certain sizes.
which advances yolo version2 to perform detection of
objects in running video in a real-time manner on devices Yolov5 Introduced mosaic It has 80
2020
that are embedded [12]. This type of a single convolution
augmentation classes of 3
network simultaneously forecasts numerous bounding boxes
parts
and the probability of classes for the boxes [13]. With zero
processing of batch on a GPU TitanX this base network will
Yolov6 2021 Architectural Not fixed
run at 45 fps and the faster version will run at more than 150
improvements
frames per second and by this we can process the detection
of any streamed videotape in real-time with minimal 25ms Yolov7 Infers faster and
2021 Not fixed
of latency [14]. YOLO is globally the best when comes to
greater accuracy
the image and its process of prediction. Contrastive to the
sliding window method and regional-based proposal Yolov8 2023 improved adaptive Not fixed
methods YOLO can observe the entire image in the process
training,
of test time and training. So that it completely encodes the
customizable
provisional info about its appearance and classes. YOLO has
architecture
the learning capability to recognize the generalized portrayal
advanced data
of objects. As soon as YOLO is trained on general images
augmentation
and tested in real-time work, YOLO outruns best object
detection methods like DPM and regions with CNN by an
extensive edge [15]. From figure 5, it visualizes the
detection of different objects. yolo-SA is an improved
version of the one-stage detection model YOLO v4 [16].
3. Network Architecture
Table 1 represents comparative analysis of all yolo versions Next to network design of the detection of objects was
in the division of Years, improvements, and total no. of increasingly trending and has grown broadly, in the Deep
layers in the network. Learning generation. So, the network architecture consists of
mainly three different layers: The convolution layer, the
Max pool, and the Fully Connected layers. Yolo network
6. Yearly Trends
This section has organization of all the publication data for
the purpose of displaying yearly growth of YOLO versions.
Figure 7. Flow chart of object detection model Table 2 explains the count of educational research papers of
all versions of yolo are yolo v1, yolo v2, yolo v3, yolo v4,
yolo v5, yolo v6, yolo v7 and yolo v8. This breakdown
5. Yolo object detection algorithm is shows that the publication number of those papers has
crucial because of the given reasons: increased slowly in the 2020 and 2021. Apart from, YOLO
V3, YOLO and V2 versions have interested most of the
researchers due to its properties, here the time factor comes
Speed: As YOLO predict objects in real-time, it improves under separate element. YOLO V5, V6, v7 and V8 versions
detection speed. Compared to other algorithms yolo can count is low because both are recent to the trend now so
perform much faster running at 45 frames per second. they will improve in future years. Fig 8 represents the
Another main difference is YOLO has the capability to see graphical view of the mentioned table2. Table 3 represents
complete images at only once which is not present in various comparative analysis of object detection algorithms
previous methods [28]. We will run the image on CNN for which includes invention year, novelty of algorithm and
only one time at run time. All the testing and training recent searches of mentioned algorithms.
parameters are as same as between fast Yolo and YOLO
[29].
Table 2. Yearly trends of publication data
High accuracy: YOLO has a high prediction capacity that
gives the best results with fewer background mistakes. Yolo v3 Yolov4 Yolov5 Yolov6 Yolov7 Yolov8 total
There are some different heuristics to increase yolo
accuracies like cosine learning rate scheduler, data
augmentation, batch normalization (synchronized) and 2017 10 0 0 0 0 0 10
image mix-up [30]. Having a larger pixel quality improves
accuracy but takes off with inference and slow training time. 2018 50 19 0 0 0 0 34
For more accuracy large pixel quality may help the model to
detect small objects. 2019 48 210 0 0 0 0 258
Learning capabilities: yolo has a high learning capability,
2020 36 496 81 13 9 0 635
which allows one to find out the patterns of the objects and
apply them in the process of detection of objects. Yolo
acquired the object detection by division of an image into N 2021 418 734 440 175 23 8 1798
grids, of equal dimensions S x S. Based on the COCO
dataset (common objects in context), this algorithm can
detect classes of 80 COCO objects: bus, person, car, total 529 1459 521 188 32 8 2737
Bicycle, motorbike, aeroplane, truck, train, boat [31].
2. Fast R-CNN 2015 For the creation of a set Fast R-CNN GitHub,
of regions, it uses a PyTorch
regional method.
8. Single Shot Detector (SSD) By suing multi box it SSD ingle shot.
2016 Detects multiple Multi box detector
objects in the given bibtex
image.
9. You Only Look Once (YOLO) 2015 Increases accuracy in YOLO latest versions.
predictions. And yolo full form
7. Conclusion
[2] Review article: W. Zhiqiang, L. Jun, A review of object
detection based on convolutional neural network, in: 2017
36th Chinese Control Conference (CCC), 2017, pp.
In this paper, it is the general view based on YOLO object
11104– 11109. doi: 10.23919/ChiCC.2017.8029130
detection and object classification. Detection of objects is
a significant technique in Computer Vision for instance [3] Journal article: Arya MC, Rawat A. A review on YOLO
location of objects in images and videos. As we compared (You Look Only One)-an algorithm for real time object
to previous classification techniques. It offers several detection. J Eng Sci. 2020;11:554-7
advantages, including real-time processing, simplicity,
and effective handling of small objects which performs [4] Review paper: Arya, Mukesh Chandra, and Anchal
classification and object detection in one pass through the Rawat. "A review on YOLO (You Look Only One)-an
network in YOLO's unified detection approach. Although algorithm for real time object detection." J Eng Sci 11
(2020): 554-7.
there are some issues yet to be solved with some versions
of Yolo regarding both larger and smaller objects, it [5] Conference: Redmon et al. in You Only Look Once:
struggles with output to get perfect alignment of objects in Unified, Real-Time Object Detection. Proceedings of the
the image. Further, these issues are surely going to be IEEE conference on computer vision and pattern
rectified and worked with the best outcome. YOLO recognition. 2016.
algorithm is mainly based on a regression model, it
predicts all bounding boxes and classes for the entire [6] Review paper: Tsang, Sik-Ho. "Review: YOLOv2 &
image in one time of the algorithm, instead of selecting YOLO9000—You Only Look Once (Object Detection
the region of interesting part in an Image, and Object (accessed on 24 February 2019) (2019 )u Only Look Once:
Unified, Real-Time Object Detection
Detection is composed of general tasks such as
localization, object classification and segmentation. In the [7] Conference: M. Takahashi, Y. Ji, K. Umeda and A. Moro,
performance, the Yolo algorithm gives its best for "Expandable YOLO: 3D Object Detection from RGB-D
detecting objects. We mainly reviewed unified detection, Images," 2020 21st International Conference on Research
types of Yolo versions, applications based on Yolo, and Education in Mechatronics (REM), 2020, pp. 1-5, doi:
network architecture and comparative analysis. Overall, 10.1109/REM49740.2020.9313886.
YOLO-based real-time object detection has
revolutionized computer vision applications by providing [8] Journal article: Ju, M.; Luo, H.; Wang, Z.; Hui, B.;
efficient and accurate solutions. Chang, Z. The Application of Improved YOLO V3 in
Multi-Scale Target Detection. Appl. Sci. 2019, 9, 3775.
[9] Journal article: wei fang 1,2, (member, ieee), lin wang 1 ,
and peiming ren 1, “Tinier-YOLO: A Real-Time Object
8. Future Scope Detection Method for Constrained Environments.
Object detection in real-time is the main ability that is [10] Journal article: Yi, Zhang, Shen Yongliang, and Zhang
wanted by most robots and computer vision systems. It’s Jun. "An improved tiny-yolov3 pedestrian detection
making great progress and giving output in many algorithm." Optik 183 (2019): 17-23.FANG 1,2, (Member,
directions because of the early research in this area. It has IEEE), lin wang 1 , and peiming ren 1, “Tinier-YOLO: A
to be considered that object detection with the Yolo Real-Time Object Detection Method for Constrained
algorithm is not used much in many areas where it could Environments.
be of great help and this could be improved in future. In
[11] Journal article: Tian, Yunong, et al. "Apple detection
fact, YOLO object detection in images has received a lot during different growth stages in orchards using the
of observation in the pattern recognition sectors and improved YOLO-V3 model." Computers and electronics in
computer vision in recent years. The future of these agriculture 157 (2019): 417-426
mechanisms is in the process of proving and could give
freedom from routine jobs which will be done more [12] Article: Shafiee, Mohammad Javad, et al. "Fast YOLO: A
precisely by systems and machines. Keys areas of future fast you only look once system for real-time embedded
exploration are improved accuracy, handling complex object detection in video." arXiv preprint
scenes, multi-object tracking and domain-specific object arXiv:1709.05943 (2017).
detection.
[13] Journal article: George, Jose, Shibon Skaria, and V. V.
Varun. "Using YOLO based deep learning network for real
time detection and localization of lung nodules from low
dose CT scans." Medical Imaging 2018: Computer-Aided
References Diagnosis. Vol. 10575. SPIE, 2018.mad Javad, et al. "Fast
YOLO: A fast you only look once system for real-time
[1] Conference: Redmon, Joseph, et al. "You only look once: embedded object detection in video." arXiv preprint
Unified, real-time object detection. "Proceedings of the arXiv:1709.05943 (2017).
IEEE conference on computer vision and pattern
recognition. 2016.
[14] Journal article: Chiang, Holly, Yifan Ge, and Connie Wu.
"Multiple Object Recognition with Focusing and [27] Conference: Redmon J, Divvala S, Girshick R, Farhadi A
Blurring." Lectures from the Course (2016). (2016) You only look once: unified, real-time object
[15] Conference: Du, Juan. "Understanding of object detection detection. In proceedings of the IEEE conference on
based on CNN family and YOLO." Journal of Physics: computer vision and pattern recognition, pp 779-788
Conference Series. Vol. 1004. No. 1. IOP Publishing, 2018
[28] Article: Long, Xiang, et al. "PP-YOLO: An effective and
[16] Article: Joseph Redmon∗, Santosh Divvala, Ross Girshick efficient implementation of object detector." arXiv preprint
Ali Farhadi∗ University of Washington∗, Allen Institute arXiv:2007.12099 (2020)
for AI†, Facebook AI Research, ”You Only Look Once:
Unified, Real-Time Object Detection” [29] Journal article: Zhang, Zhi, et al. "Bag of freebies for
training object detection neural networks." arXiv preprint
[17] Conference: Santosh Divvala, Redmon, Joseph, Ross arXiv:1902.04103 (2019)
Girshick, and Ali Farhadi. "You only look once: Unified,
real-time object detection." In Proceedings of the IEEE [30] Conference: Yin, Xuanyu, et al. "YOLO and K-Means
conference on computer vision and pattern recognition, pp. Based 3D Object Detection Method on Image and Point
779-788. 2016 Cloud." The Proceedings of JSME annual Conference on
Robotics and Mechatronics (Robomec) 2019. The Japan
[18] Journal article: Wong, Alexander, et al. "Yolo nano: a Society of Mechanical Engineers, 2019
highly compact you only look once convolutional neural
network for object detection." 2019 Fifth Workshop on [31] Conference: Redmon, Joseph, and Ali Farhadi.
Energy Efficient Machine Learning and Cognitive "YOLO9000: better, faster, stronger." Proceedings of the
Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 2019 IEEE conference on computer vision and pattern
recognition. 2017.
[19] Journal article: Wei H, Kehtarnavaz N (2019) Semi-
supervised faster RCNN-based person detection and load [32] Journal article: Krizhevsky, A., Sutskever, I., & Hinton,
classification for far field video surveillance. Mach Learn G. E. (2012). Imagenet classification with deep
Knowl Extraction 1(3):756–767 convolutional neural networks. Advances in neural
information processing systems (pp. 1097-1105).
[20] Article: Tsang S-H (2018) Review: Inception-v4 - [33] Ghosh, H., Tusher, M.A., Rahat, I.S., Khasim, S.,
Evolved From GoogLeNet, Merged with ResNet Idea Mohanty, S.N. (2023). Water Quality Assessment Through
(Image Classification), towards data science Predictive Machine Learning. In: Intelligent Computing
and Networking. IC-ICN 2023. Lecture Notes in Networks
[21] Conference: Redmon, Joseph, et al. "You only look once: and Systems, vol 699. Springer, Singapore.
Unified, real-time object detection." Proceedings of the https://doi.org/10.1007/978-981-99-3177-4_6
IEEE conference on computer vision and pattern [34] Alenezi, F.; Armghan, A.; Mohanty, S.N.; Jhaveri, R.H.;
recognition. 2016. Tiwari, P. Block-Greedy and CNN Based Underwater
Image Dehazing for Novel Depth Estimation and Optimal
[22] Journal article: H. Deshpande , A. Singh, H. Herunde, Ambient Light. Water 2021, 13, 3470.
“Comparative Analysis on YOLO Object Detection with https://doi.org/10.3390/w13233470
OpenCV”
[24] Journal article: Liu, Y., Ai, H., & Xu, G. Y. (2001,
September). Moving object detection and tracking based
on background subtraction. Proc. SPIE 4554, object
detection, classification, and tracking technologies (Vol.
4554, pp. 62-66).
[25] Journal article: Sungandi, B., Kim, H., Tan, J. K., &
Ishikawa, S. (2009). Real time tracking and identification
of moving persons by using a camera in outdoor
environment. International journal of innovative
computing, information and control, 5, 1179-1188