Enhancing Surveillance Systems With YOLO Algorithm For Real-Time Object Detection and Tracking
Enhancing Surveillance Systems With YOLO Algorithm For Real-Time Object Detection and Tracking
Tracking
Anish A 1 Sharan R2 M s.A. Hema M alini3
UG Student UG Student Assistant Professor
Department of ECE Department of ECE Department of ECE
Saveetha Engineering College Saveetha Engineering College Saveetha Engineering College
Chennai,India Chennai, India Chennai,India
[email protected] [email protected] [email protected]
M s.T.Archana4
Assistant Professor
Department of ECE
Saveetha Engineering College
Chennai,India
[email protected]
Abstract-A Virtually Impaired Person (VIP) is unable to Object detection relies heavily on image classification and
identify objects when they cannot recognize where the object is recognition, with numerous datasets at our disposal.
placed. The researchers are working on it to enhance object Microsoft COCO stands out as a widely utilized benchmark
detection and help VIP. The challenges faced by researchers
for object detection, providing a vast dataset for image
are performing detection under low-resolution images,
insufficient sensors, portability, and cost. Making a compact
classification [2]. In this research study, the author
device and alerting them is required. By considering the above - performed a comparative analysis of three prominent object
mentioned difficulties, an innovative solution is described in detection algorithms: SSD, Faster-RCNN, and YOLO. SSD
this research work. The growth of image processing and deep enhances detection capabilities by adding multiple feature
learning techniques has simplified the complexity of processing layers to the network's end, facilitating improved object
data and provided accurate results within a limited time recognition [3]. Faster R-CNN offers a unified, faster, and
period. The suggested technique presented is a deep learning more accurate approach to object identification through the
algorithm called the YOLO algorithm, which is combined with use of convolutional neural networks. On the other hand,
the web to predict objects accurately. For this approach, a
YOLO, designed by Joseph Redmon, presents an end-to-end
dataset with a total of 500 images was chosen and trained. The
proposed classifier result is satisfactory, and it achieved an
network for object detection [3].
overall accuracy of 94%. Furthermore, this proposed
technique provides enough output in comparison with several This study utilizes Microsoft COCO dataset as a common
other machine learning and image processing algorithms. benchmark and employ consistent evaluation metrics across
all three algorithms. This approach enables a fair
Keywords: Visually Impaired Person (VIP), YOLO Algorithm, comparison of the performance of these algorithms, each
Object Detection, Image Processing, Deep Learning. employing distinct architectural approaches. The results
obtained from this comparative analysis offer valuable
I. INTRODUCTION insights into the unique strengths of each algorithm,
allowing us to differentiate their characteristics and
Recently, computational vision and its functionality have determine the most effective object recognition method for
been used everywhere, especially in the automobile specific scenarios.
industry, robots, the healthcare industry, and surveillance
systems. Deep learning has garnered significant attention for
its remarkable performance in areas like natural language
processing, image classification, and object detection.
Market projections indicate substantial growth in the coming
years, with easy access to powerful Graphics Processing
Units (GPUs) and extensive datasets cited as key drivers [1].
Notably, both of these prerequisites have become readily
available in recent times [1].
II. RELATED WORKS Another research paper introduced a variation of SSD called
Tiny SSD. It's a compact single-shot detection deep
Object detection is a crucial research area that leverages the convolutional neural network designed for real-time
availability of powerful learning tools to explore deeper embedded object identification. Tiny SSD includes
features. Its goal is to consolidate information on diverse enhanced layers and has a small size of 2.3 MB, making it
object recognition techniques and classifiers employed by suitable for embedded applications [9]. In our study, we
various researchers, facilitating a comparative analysis and used a similar SSD model for comparative analysis.
practical insights for object detection applications. This
work is underpinned by a comprehensive literature review. III. PROBLEM STATEMENT
Ross Girshick's contributions introduced the Fast R-CNN Object detection technology has a wide range of
model, a novel approach to object identification that utilizes applications, such as autonomous driving, detecting objects
Convolutional Neural Networks (CNN). What sets Fast R-
from the air, recognizing text, surveillance, assisting in
CNN apart is its window extraction algorithm, which is
different from the traditional sliding window procedure used rescue operations, robotics, facial recognition, identifying
in the R-CNN model. Fast R-CNN merges individual pedestrians, creating visual search engines, computing
training for deep convolution networks for feature extraction objects of interest, and recognizing brands. However, there
and Support Vector Machines (SVM) for classification, are several significant challenges to address for its effective
efficiently combining feature extraction and classification in implementation.
a unified framework. Remarkably, Fast R-CNN achieves a
training time that is nine times faster than R-CNN. Variation in Object Occupancy: Objects in images can vary
Additionally, the Faster R-CNN model integrates the significantly in terms of their size, ranging from taking up a
components of proposal isolation and Fast R-CNN into a
majority of the pixels (70% to 80%) to occupying very few
network template known as the region proposal network
(RPN). This achieves accuracy equivalent to that of Fast R- pixels (≤10%).
CNN. Collectively, these methods represent a deep learning -
based object recognition approach capable of operating at 5– Multiple Object Sizes: Images often contain objects of
7 frames per second (fps) [4]. This research has provided various sizes, and detecting objects of different scales can be
essential insights into R-CNN, Fast R-CNN, and Faster R- a complex task.
CNN, serving as an inspiration for our model's training.
Labelled Data Availability: Training object detection models
Kim et al.'s notable work employs CNN in combination with requires large volumes of labeled data, and obtaining such
background subtraction to construct a system for detecting data can be resource-intensive and time-consuming.
and recognizing movable objects recorded on CCTV
cameras. The approach hinges on applying the background Object detection using machine learning and deep learning
subtraction classifier to every frame, which informed a
algorithms faces several common challenges. Some of the
similar architecture utilized in our project [5].
frequently encountered issues include:
Joseph Redmon and his team introduced YOLO, a
convolutional neural network architecture that offers a one- 1) Multi-Scale Training: Most object recognition systems
stop solution for frame position prediction and the are designed and trained for specific input resolutions,
categorization of multiple candidates. YOLO addresses which causes them to underperform when presented with
object detection as a regression problem, streamlining the inputs of varying scales or resolutions.
process from image input to category and position output
[6]. The methods used in our YOLO architecture for 2) Foreground-Background Class Imbalance: The presence
bounding box recognition and feature extraction were of an imbalance or disproportion among instances of various
inspired by the techniques outlined in this study.
object categories can significantly affect the functionality of
the suggested approach. Some categories may be
Tanvir Ahmed and their team introduced an innovative
YOLO v1 network model, which involved optimizing the overrepresented or underrepresented in the training data.
loss function, introducing a new inception model structure,
and incorporating specialized pooling pyramid layers. This 3) Detection of Smaller Objects: Algorithms trained on
led to improved performance on the PASCAL VOC dataset larger objects tend to perform well with such objects but
[7]. Our project utilized this research as a foundation for often exhibit poor performance when it comes to detecting
applying the YOLO model and its training techniques. relatively smaller-sized objects.
Wei Liu and colleagues presented the Single Shot MultiBox To improve the robustness and applicability of object
Detector (SSD), a novel approach for image object detection algorithms across various domains and
detection. SSD simplified the process by combining object
applications, it is crucial to address these challenges.
proposal generation and pixel resampling into a single step
[8]. Our project adopted training and model analysis Researchers and engineers are actively working on
methods inspired by their work. developing innovative solutions to tackle these issues and
enhance the performance of object detection systems.
REFERENCE