0% found this document useful (0 votes)
46 views4 pages

Enhancing Surveillance Systems With YOLO Algorithm For Real-Time Object Detection and Tracking

Uploaded by

sk8316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Enhancing Surveillance Systems With YOLO Algorithm For Real-Time Object Detection and Tracking

Uploaded by

sk8316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Proceedings of the Second International Conference on Automation, Computing and Renewable Systems (ICACRS-2023)

IEEE Xplore Part Number: CFP23CB5-ART; ISBN: 979-8-3503-4023-5

Enhancing Surveillance Systems with YOLO


Algorithm for Real-Time Object Detection and
2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS) | 979-8-3503-4023-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICACRS58579.2023.10404710

Tracking
Anish A 1 Sharan R2 M s.A. Hema M alini3
UG Student UG Student Assistant Professor
Department of ECE Department of ECE Department of ECE
Saveetha Engineering College Saveetha Engineering College Saveetha Engineering College
Chennai,India Chennai, India Chennai,India
[email protected] [email protected] [email protected]

M s.T.Archana4
Assistant Professor
Department of ECE
Saveetha Engineering College
Chennai,India
[email protected]

Abstract-A Virtually Impaired Person (VIP) is unable to Object detection relies heavily on image classification and
identify objects when they cannot recognize where the object is recognition, with numerous datasets at our disposal.
placed. The researchers are working on it to enhance object Microsoft COCO stands out as a widely utilized benchmark
detection and help VIP. The challenges faced by researchers
for object detection, providing a vast dataset for image
are performing detection under low-resolution images,
insufficient sensors, portability, and cost. Making a compact
classification [2]. In this research study, the author
device and alerting them is required. By considering the above - performed a comparative analysis of three prominent object
mentioned difficulties, an innovative solution is described in detection algorithms: SSD, Faster-RCNN, and YOLO. SSD
this research work. The growth of image processing and deep enhances detection capabilities by adding multiple feature
learning techniques has simplified the complexity of processing layers to the network's end, facilitating improved object
data and provided accurate results within a limited time recognition [3]. Faster R-CNN offers a unified, faster, and
period. The suggested technique presented is a deep learning more accurate approach to object identification through the
algorithm called the YOLO algorithm, which is combined with use of convolutional neural networks. On the other hand,
the web to predict objects accurately. For this approach, a
YOLO, designed by Joseph Redmon, presents an end-to-end
dataset with a total of 500 images was chosen and trained. The
proposed classifier result is satisfactory, and it achieved an
network for object detection [3].
overall accuracy of 94%. Furthermore, this proposed
technique provides enough output in comparison with several This study utilizes Microsoft COCO dataset as a common
other machine learning and image processing algorithms. benchmark and employ consistent evaluation metrics across
all three algorithms. This approach enables a fair
Keywords: Visually Impaired Person (VIP), YOLO Algorithm, comparison of the performance of these algorithms, each
Object Detection, Image Processing, Deep Learning. employing distinct architectural approaches. The results
obtained from this comparative analysis offer valuable
I. INTRODUCTION insights into the unique strengths of each algorithm,
allowing us to differentiate their characteristics and
Recently, computational vision and its functionality have determine the most effective object recognition method for
been used everywhere, especially in the automobile specific scenarios.
industry, robots, the healthcare industry, and surveillance
systems. Deep learning has garnered significant attention for
its remarkable performance in areas like natural language
processing, image classification, and object detection.
Market projections indicate substantial growth in the coming
years, with easy access to powerful Graphics Processing
Units (GPUs) and extensive datasets cited as key drivers [1].
Notably, both of these prerequisites have become readily
available in recent times [1].

Architecture of the YOLO

979-8-3503-4023-5/23/$31.00 ©2023 IEEE 1254


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 24,2025 at 03:38:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Automation, Computing and Renewable Systems (ICACRS-2023)
IEEE Xplore Part Number: CFP23CB5-ART; ISBN: 979-8-3503-4023-5

II. RELATED WORKS Another research paper introduced a variation of SSD called
Tiny SSD. It's a compact single-shot detection deep
Object detection is a crucial research area that leverages the convolutional neural network designed for real-time
availability of powerful learning tools to explore deeper embedded object identification. Tiny SSD includes
features. Its goal is to consolidate information on diverse enhanced layers and has a small size of 2.3 MB, making it
object recognition techniques and classifiers employed by suitable for embedded applications [9]. In our study, we
various researchers, facilitating a comparative analysis and used a similar SSD model for comparative analysis.
practical insights for object detection applications. This
work is underpinned by a comprehensive literature review. III. PROBLEM STATEMENT

Ross Girshick's contributions introduced the Fast R-CNN Object detection technology has a wide range of
model, a novel approach to object identification that utilizes applications, such as autonomous driving, detecting objects
Convolutional Neural Networks (CNN). What sets Fast R-
from the air, recognizing text, surveillance, assisting in
CNN apart is its window extraction algorithm, which is
different from the traditional sliding window procedure used rescue operations, robotics, facial recognition, identifying
in the R-CNN model. Fast R-CNN merges individual pedestrians, creating visual search engines, computing
training for deep convolution networks for feature extraction objects of interest, and recognizing brands. However, there
and Support Vector Machines (SVM) for classification, are several significant challenges to address for its effective
efficiently combining feature extraction and classification in implementation.
a unified framework. Remarkably, Fast R-CNN achieves a
training time that is nine times faster than R-CNN. Variation in Object Occupancy: Objects in images can vary
Additionally, the Faster R-CNN model integrates the significantly in terms of their size, ranging from taking up a
components of proposal isolation and Fast R-CNN into a
majority of the pixels (70% to 80%) to occupying very few
network template known as the region proposal network
(RPN). This achieves accuracy equivalent to that of Fast R- pixels (≤10%).
CNN. Collectively, these methods represent a deep learning -
based object recognition approach capable of operating at 5– Multiple Object Sizes: Images often contain objects of
7 frames per second (fps) [4]. This research has provided various sizes, and detecting objects of different scales can be
essential insights into R-CNN, Fast R-CNN, and Faster R- a complex task.
CNN, serving as an inspiration for our model's training.
Labelled Data Availability: Training object detection models
Kim et al.'s notable work employs CNN in combination with requires large volumes of labeled data, and obtaining such
background subtraction to construct a system for detecting data can be resource-intensive and time-consuming.
and recognizing movable objects recorded on CCTV
cameras. The approach hinges on applying the background Object detection using machine learning and deep learning
subtraction classifier to every frame, which informed a
algorithms faces several common challenges. Some of the
similar architecture utilized in our project [5].
frequently encountered issues include:
Joseph Redmon and his team introduced YOLO, a
convolutional neural network architecture that offers a one- 1) Multi-Scale Training: Most object recognition systems
stop solution for frame position prediction and the are designed and trained for specific input resolutions,
categorization of multiple candidates. YOLO addresses which causes them to underperform when presented with
object detection as a regression problem, streamlining the inputs of varying scales or resolutions.
process from image input to category and position output
[6]. The methods used in our YOLO architecture for 2) Foreground-Background Class Imbalance: The presence
bounding box recognition and feature extraction were of an imbalance or disproportion among instances of various
inspired by the techniques outlined in this study.
object categories can significantly affect the functionality of
the suggested approach. Some categories may be
Tanvir Ahmed and their team introduced an innovative
YOLO v1 network model, which involved optimizing the overrepresented or underrepresented in the training data.
loss function, introducing a new inception model structure,
and incorporating specialized pooling pyramid layers. This 3) Detection of Smaller Objects: Algorithms trained on
led to improved performance on the PASCAL VOC dataset larger objects tend to perform well with such objects but
[7]. Our project utilized this research as a foundation for often exhibit poor performance when it comes to detecting
applying the YOLO model and its training techniques. relatively smaller-sized objects.

Wei Liu and colleagues presented the Single Shot MultiBox To improve the robustness and applicability of object
Detector (SSD), a novel approach for image object detection algorithms across various domains and
detection. SSD simplified the process by combining object
applications, it is crucial to address these challenges.
proposal generation and pixel resampling into a single step
[8]. Our project adopted training and model analysis Researchers and engineers are actively working on
methods inspired by their work. developing innovative solutions to tackle these issues and
enhance the performance of object detection systems.

979-8-3503-4023-5/23/$31.00 ©2023 IEEE 1255


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 24,2025 at 03:38:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Automation, Computing and Renewable Systems (ICACRS-2023)
IEEE Xplore Part Number: CFP23CB5-ART; ISBN: 979-8-3503-4023-5

IV. PROPOSED SYSTEM The YOLO model consists of 24 convolutional layers


followed by 2 fully connected layers, highlighting its
The block diagram represented in figure 1 the flow of efficacy in object detection tasks.
proposed study.
V. RESULT AND DISCUSSION

The YOLO structure is presented, which is considered the


backbone network for feature extraction. Detection of
objects in a room using live-streaming video frames is done
with YOLO. The concept of a region proposal network
(RPN) suggests the identification of potentially significant
regions within an image or frame by generating a redundant
collection of overlapping bounding boxes. These pro posed
regions serve as candidate areas for further examination.
Subsequently, a trained model attempts to classify the object
type within each bounding box.

In traditional object detectors, the classifiers often analyze


Fig.1.Proposed system the same portion of an image multiple times, which can be
computationally intensive. YOLO, however, distinguishes
It is an innovative technique to perform object classification itself by its ability to examine a specific portion of an object
by reframing the problem as a regression task instead of a just once, unlike available networks. As an object detection
classification problem. In the YOLO architecture, CNN is technique, YOLO offers a faster processing speed while
employed to recognize all bounding boxes and the class maintaining comparable precision in object detection tasks.
probabilities of each object within an image. This unique
method allows YOLO to identify objects and their precise The implementation of YOLO in object detection is often
positions in a single pass, leading to its name, "You Only complemented by tools like OpenCV for displaying the
Look Once." results. By applying Yolo v3, objects within an image or
frame can be effectively detected, and their accuracy can be
The CNN plays a crucial role in feature extraction from quantified through the use of a confusion matrix. This
visual input, as it efficiently propagates low-level features allows for the evaluation and assessment of the object
from initial convolutional layers to deeper layers in a deep detection model's performance.
CNN. There are several circumstances involved in
accurately identifying multiple objects and determining their Dataset Collection: In this, raw data is chosen as input. It is
exact positions within a single visual input. YOLO leverages unprocessed data.
two key CNN features: parameter sharing and multiple
Data pre-processing: Many sections of the data may be
filters, to effectively address these object classification
unnecessary or incomplete. Data cleansing is used to
difficulties.
manage this aspect. It entails dealing with missing data,
noisy data, and so on. This is the most significant module
In the recognition process, an image or frame is divided into
a grid comprising S × S cells. Each of these grid cells is since, without it, the outcome prediction could lead to an
inaccurate stage.
responsible for recognizing B-bounding boxes, including
their positions, dimensions, the probability of an object's Feature Extraction: This stage helps to extract the required
presence within the cell, and conditional class probabilities. features from several features to perform classification by
The core principle behind object recognition within a grid applying a dataset. The original set of features can describe
cell is that the object's centre should be located within that the majority of the data in the new reduced feature set. From
particular cell. Subsequently, the grid cell is responsible for a mixture of the original set, a summarized model of the
identifying the object using an appropriate bounding box. To original features is generated. To detect fraudulent
be more specific, YOLO detects a set of parameters for a behaviour, YOLO extracts both individual and frequency -
single bounding box in each grid cell. The first five based behaviour in this module.
parameters are specific to that particular bounding box,
while the remaining parameters are shared among all
bounding boxes within the grid, regardless of the number of
boxes.

YOLO is an approach based on convolutional neural


networks (CNN), and its performance is evaluated on the
PASCAL VOC detection dataset.

979-8-3503-4023-5/23/$31.00 ©2023 IEEE 1256


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 24,2025 at 03:38:17 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Automation, Computing and Renewable Systems (ICACRS-2023)
IEEE Xplore Part Number: CFP23CB5-ART; ISBN: 979-8-3503-4023-5

with various probabilities and repeatedly tracks them with a


bounding box of green colour.

VI. CONCLUS ION

The proposed system has been developed with the aim of


improving object detection and identification in videos, and
the findings presented in this thesis demonstrate its
enhanced capability in achieving this goal. The research
involved a series of experiments that analyzed several
approaches to object detection and identification. The study
not only establishes a theoretical foundation but also
provides practical insights into the efficiency of these
methods. As a result, the proposed system offers a
comprehensive and well-rounded exploration of object
detection techniques, which can be effectively applied in
various real-world scenarios.

REFERENCE

1. A. T iwari, A. Kumar, and G. M. Saraswat, “Feature extraction for object


recognition and image classification,” International Journal of Engineering
Research & Technology (IJERT), vol. 2, pp. 2278–0181, 2013.
2. J. Yan, Z. Lei, L. Wen, and S. Z. Li, “T he fastest deformable part model for
object detection,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2497–2504, New York, NY, USA,
2014.
3. T . Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J.
Yagnik, “Fast, accurate detection of 100,000 object classes on a single
machine,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 1814–1821, New York, NY, USA, 2013.
4. P. Viola and M. J. Jones, “Robust real-time face detection,” International
Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004.
5. C.-J. Du, H.-J. He, and D.-W. Sun, “Object classification methods,”
in Computer Vision Technology for Food Quality Evaluation, pp. 87–110,
Elsevier, Berlin, Germany, 2016.
6. K. W. Eric, Li Yueping, N. Zhe, Y Juntao, L. Zuodong, and Z. Xun, “Deep
fusion feature based object detection method for high resolution optical
remote sensing images,” Applied Science, vol. 34, 2019.
7. D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp.
91–110, 2004.
8. N. Dalal and B. T riggs, “Histograms of oriented gradients for human
detection,” in Proceedigs of the International Conference on Computer
Vision & Pattern Recognition (CVPR’05), pp. 886–893, Berlin, Germany,
2005.
9. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
unified, real-time object detection,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 779–788, Las Vegan,
NV, USA, 2016.
10. Y. Zheng, C. Zhu, K. Luu, C. Bhagavatula, T. H. N. Le, and M. Savvides,
“T owards a deep learning framework for unconstrained face detection,”
in Proceedings of the 2016 IEEE 8th International Conference on
BiometricsTheory, Applications and Systems (BTAS), pp. 1–8, IEEE, New
York, NY, USA, 2016.
11. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies
for accurate object detection and semantic segmentation,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition , pp.
580–587, New York, NY, USA, 2014.
12. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International
Conference on Computer Vision, pp. 1440–1448, Berlin, Germany, 2015.
13. W. Liu, D. Anguelov, D. Erhan et al., “Single shot multibox
detector,” European Conference on Computer Vision, vol. 45, pp. 21–37,
2016.
14. T .-Yi Lin, P. Dollár, R. B. Girshick et al., “Feature pyramid networks for
object detection,” IEEE CVPR, vol. 43, pp. 936–944, 2017.
15. W. Liu, D. Anguelov, D. Erhan et al., “SSD: single shot multibox
detector,” Computer Vision-ECCV 2016, vol. 43, pp. 21–37, 2016.

Fig.2.Objects identified in a room

The experimental analysis depicts and names recognized


objects such as bottles, clocks, scissors, books, and mouse

979-8-3503-4023-5/23/$31.00 ©2023 IEEE 1257


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 24,2025 at 03:38:17 UTC from IEEE Xplore. Restrictions apply.

You might also like