0% found this document useful (0 votes)
8 views6 pages

Enhancing Object Detection and Tracking From Surveillance Video Camera Using YOLOv8

This document discusses advancements in object detection and tracking using the YOLOv8 algorithm in surveillance video systems. It highlights the challenges posed by adversarial attacks and the scarcity of labeled video datasets, while proposing a novel approach to enhance real-time video object identification. The study aims to improve efficiency and accuracy in object detection through a lightweight tracking method that integrates features from trained image detectors.

Uploaded by

tejaswim1070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Enhancing Object Detection and Tracking From Surveillance Video Camera Using YOLOv8

This document discusses advancements in object detection and tracking using the YOLOv8 algorithm in surveillance video systems. It highlights the challenges posed by adversarial attacks and the scarcity of labeled video datasets, while proposing a novel approach to enhance real-time video object identification. The study aims to improve efficiency and accuracy in object detection through a lightweight tracking method that integrates features from trained image detectors.

Uploaded by

tejaswim1070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Enhancing Object Detec on and Tracking From

2023 International Conference on Recent Advances in Information Technology for Sustainable Development (ICRAIS) | 979-8-3503-0663-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICRAIS59684.2023.10367122

Surveillance Video Camera Using YOLOv8


Amba Sahithi Bejugam Sai Teja Chikili Vishwas Shastry
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
B V Raju Ins tute of Technology B V Raju Ins tute of Technology B V Raju Ins tute of Technology
Narsapur, Medak, India Narsapur, Medak, India Narsapur,Medak,India
sahithiamba [email protected] [email protected] [email protected]
Chedurupally Venugopal CH. Rajyalakshmi
Computer Science and Engineering Computer Science and Engineering
B V Raju Ins tute of Technology B V Raju Ins tute of Technology
Narsapur, Medak, India Narsapur, Medak, India
[email protected] [email protected]
Abstract—In the rapidly evolving landscape of object significantly improved efficiency and precision in tasks that
tracking and detec on techniques, video surveillance systems have once required human interven on. However, video object
seen significant advancements, boos ng their ability to discern
threats and adversarial ac vi es. However, as technology progresses, detec on presents unique challenges. It requires analyzing
malicious actors con nually adapt their strategies to evade detec on. informa on from mul ple frames, demanding substan al
Adversarial a7acks, along with incidents like fire outbreaks, violent computa onal resources and exper se. Compared to image
ac ons, intrusions, and video tampering, pose substan al risks to the
object detec on datasets, video datasets are scarcer and more
integrity and effec veness of these systems. Manipula ng images and
videos can lead to compromised tracking results, either through costly to label due to the high number of frames involved.
altera on or dele on of key frames. To address these evolving Using a labelled dataset of photographs, a deep learning model
challenges, researchers and developers are relessly working to is trained to dis nguish the visual traits specific to each
create more flexible, robust, and resilient object tracking and
detec on systems. While video object detec on is paramount for in- category of images.
depth scene explora on, it has remained rela vely underexplored
due to the scarcity of labelled video datasets. The YOLOv8 algorithm In response to this research gap, our study explores
harnesses the strengths of the YOLOv8 architecture to elevate object an innova ve approach to enhance video object iden fica on
detec on performance. This study focuses on real- me analysis of by augmen ng a trained image object detector. We introduce
surveillance camera-generated video data, presen ng an automated
detec on approach employing smart networks and algorithms. a lightweight tracking method that leverages features obtained
from the image object detector, ensuring minimal speed loss
Index Terms—CNN (Convolu onal Neural Networks), during video object recogni on. This approach not only
YOLOv8 algorithm, Deep Learning, Detec on, Classifica on, improves accuracy but also makes real- me video object
Segmenta on, tracking.
detec on more feasible. By developing more accurate, robust
I. INTRODUCTION and efficient image classifica on systems using suitable object
detec on algorithms, we can harness valuable insights from
Object detec on, object tracking, and object
the vast visual data available globally. Our research addresses
iden fica on cons tute the founda on of computer vision
these challenges and contributes to advancing the fields of
applica ons, encompassing a broad spectrum of domains,
image classifica on, object detec on and video analysis.
including security, healthcare, autonomous systems and
entertainment. Object detec on is the process of iden fying II. LITERATURE SURVEY
and loca ng objects within a series of frames, while object
tracking involves tracing an object’s trajectory across mul ple Saeed Matar Al Jaberi et.al [1] recommended GANN
frames and object iden fica on focuses on determining the neural networks. It u lizes GANNs to generate realis c normal
presence and posi on of objects within each frame. This traffic samples and effec vely discriminate between normal
essen al computer vision task has applica ons ranging from and adversarial samples, thereby reducing false posi ves in
security and surveillance to autonomous vehicles and intrusion detec on systems. MTGAN employs a mul task
augmented reality. Deep learning models, par cularly learning, that simultaneously trains the object detec on
Convolu onal Neural Networks (CNNs), have revolu onized model to detect small objects and generate realis c adversarial
image classifica on, making it a vital tool across various real- perturba ons. MTGAN enhances the robustness of the model
world applica ons. Automa on of image processing tasks has

979-8-3503-0663-7/23/$31.00 © 2023 IEEE


228
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.
against GANN threats and improves small object detec on Ye Lyu et.al [8] described a video object detec on framework
performance. called GOTURN Held. This framework incorporates a
Eunice et.al [2] addressed gloss predic on in word-level sign convolu onal regression tracker that merges siamese network
language recogni on with a focus on enhancing accuracy and features using fully connected layers for bounding box
reducing computa onal complexity. They introduced regression. It also u lizes DCN networks to evaluate the
innova ve techniques such as key frame extrac on, pose integra on of the tracker. The tracker employs ROI-wise
vector augmenta on and YOLOv3 integra on, achieving features for object classifica on and regression. By leveraging
notable improvements in gloss predic on accuracy. Their lightweight HRNet-w32 backbone features from all four stages,
research draws inspira on from their methods to advance the the framework achieves improved object detec on accuracy.
field of con nuous sign language recogni on (CSLR). Viktor The proposed Plug and Play tracker effec vely u lizes deep
Denes Huszar et.al [3] demonstrated that their approach features from image detectors, minimizing addi onal memory
outperforms exis ng methods achieving approximately 2 and me requirements.
higher accuracy. Addi onally, the experiments show that the Abid Mehmood et.al [9] suggested a method for iden fying
approach remains robust encountered during remote server unusual ac vity in crowd footage. It introduces a UAV system
processing applica ons to address the problem of efficient that uses MC-CNN and MIR-OF to es mate crowd density and
violence detec on. Adapted the computa onally lightweight velocity. The gap between synthe c and real-world data is
x3d-m deep learning architecture detec ng violence pa7erns bridged by a C3DGAN. The usage of the FTLE field and a CNN
from videos using two methods TL and FL from the kine cs400 for detec ng divergence behavior is also introduced in this
dataset. This study addresses the need for enhancing object study. For object detec on, faster R-CNN and HLSOF are used.
detec on. The method uses the RGB frames and SG3Is for mo on
Young-Gab Kim et.al [4] presented a system designed to characteris cs, and a modified pre-trained 2D CNN for spa al
enhance public safety and security in smart city environments and temporal streams.
by leveraging advanced computer vision techniques Hyeseung Park et.al [10] approach involved maintaining two
successfully. Iden fies and classifies abnormal objects in video, background models a short-term model to capture recent
u lizes an MSD-CNN implemented on a GPU, and employs changes and a long-term model to represent the sta onary
dynamic programming. The system’s real- me capability background by con nuously comparing the current frame with
allows for proac ve measures to be taken in response to the two background models. The Mask R-CNN can iden fy
poten al threats or suspicious ac vi es thereby improving abandoned objects by detec ng significant differences to
overall security in smart ci es. specify the posi on and area of a candidate sta onary object.
Yung-Yao Chen et.al [5] proposed a pipeline in smart video Alberto Sabater et.al [11] demonstrated the work aims to
surveillance systems for object detec on. The approach is enhance the accuracy and efficiency of detec ng objects in
based on the YOLOV3 algorithm the system employs edge- videos using YOLOv3.The focus lies on techniques such as non-
cloud collabora on where edge devices capture and pre- maximum suppression to eliminate redundant detec ons for
process video streams while the cloud server performs the maintaining temporal consistency and the integra on to
heavy computa onal tasks of object detec on. The pipeline improve detec on results by leveraging these methods. Their
includes video acquisi on edge pre-processing cloud-based research contributes to advancing the field of video object
object detec on using YOLOV3 and result transmission back to detec on by addressing challenges related to robustness and
the edge for further processing which enables efficient and efficiency in the post-processing stage.
real- me object detec on in smart surveillance applica ons. Venkata Prasad.V et.al [12] proposed a GMM model. The main
Beom Kwon et.al [6] suggested an innova ve approach in video approach is allowing the system to learn and adapt to various
surveillance systems for the detec on. They introduced a ac ons performed by objects this enables the detec on of
model based on CNN called Mul -Scale ResBlock (MSRB), specific ac ons or behaviour’s such as running or carrying
which effec vely detects pedestrians across different scales. suspicious objects. By leveraging GMM algorithms, the smart
Moreover, they incorporated domain adap ve techniques to visual surveillance system enhances security measures aids in
enable the detector to adapt to various surveillance anomaly detec on, and ensures efficient monitoring and
environments. The combina on of MSRB and domain response to poten al threats.
adapta on ensures reliable and accurate detec on.
Khayrat et.al [7] effec vely employed deep learning models,
including YOLO, CNN, and LSTM, which mirrors our research
methodologies. Notably, the referenced study reported III. PROBLEM STATEMENT
impressive results, achieving a 90% accuracy rate for smoking
detec on and 93.8% accuracy for playing card iden fica on • Detec on of objects and tracking them from the videos of
using YOLO, while their CNN-LSTM model reached an accuracy surveillance cameras.
of 93.5%. This study serves as a per nent benchmark for our • Deep learning-based models have shown promising
work in the development of surveillance systems for similar results in object detec on, but the development of robust
applica ons.
229
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.
and accurate models requires a large amount of labelled C. Data preprocessing
training data. Data preprocessing is a necessary stage in the
• By current Literature the integra on of object detec on prepara on of YOLO-V8 architecture for object detec on. It
and tracking is done in videos as the object moves. requires many key steps to ensure that the input data is in a
• Previous models based on deep learning algorithms have standard format. Rescaling the images to a consistent
limita ons, and they are not accurate enough. resolu on ensures that the input data has a uniform format.
Furthermore, the objects that need to be tracked are not This step involves resizing the images to a predefined
enough to detect the ac vity of objects. resolu on, such as 512x512 pixels or 640x640 pixels while
maintaining the aspect ra o. Intensity normaliza on is
IV. EXISTING WORK
performed to standardize the pixel values across the images.
The research on object detec on and tracking down This step helps to remove varia ons in brightness and contrast
of videos in surveillance cameras is based on deep-learning between different images.
algorithms. Much work has been done over these years and
D. YOLOv8 Architecture
there are numerous exis ng works of object detec on. Many
researchers developed deep-learning models to improve YOLOv8, short for “You Only Look Once version 8,” is
efficiency. Some image classifica on techniques include a state-of-the-art object detec on model known for its
VGCNet, ResNet and MobileNet. Object detec on algorithms efficiency and accuracy. We have customized and fine-tuned
include the GMM (Gaussian Mixture Model) Algorithm which the YOLOv8 architecture to suit the specific requirements of
has been used to detect the object’s ac on in smart visual object detec on and tracking in surveillance video footage.
surveillance systems. GMM is a technique for modeling the The YOLOv8 architecture consists of numerous layers, designed
background in a video sequence. It works by modeling the pixel to process input images in a single pass while simultaneously
intensi es in the background of a video sequence as a mixture detec ng and tracking objects.
of Gaussian distribu ons. Each Gaussian distribu on The core components of the YOLOv8 architecture include:
represents a par cular colour or intensity value in the 1) Input Layer: The input layer is configured to
background. To link tracked objects using IoU(Intersec on over handle images of a specific dimension, typically resizing the
Union) scores, the first step is to perform object detec on in input images to a standardized size, such as 448x448 pixels.
each video sequence frame. Over the years, YOLOv3 and Preprocessing steps are applied to the input data to prepare it
YOLOv5 have been widely used for their adaptability and for the neural network.
flexibility in object detec on; Regression tracking which uses a 2) Backbone Network: YOLOv8 employs a
correla on filter which is used track the target object over me powerful backbone network, oSen based on well-known
by compu ng the target object features and features in the architectures like Darknet, CSP Darknet or others. This network
subsequent frames of the video. The exis ng work men oned is responsible for feature extrac on from the input images and
above regarding YOLO is lacking with limited interpretability, plays a crucial role in the model’s ability to detect objects
many of the research efforts have also found lower accuracy efficiently.
rates. 3) Feature Pyramid: To enable mul -scale
object detec on, YOLOv8 generates a feature pyramid that
V. METHODS combines features from different levels of the backbone
A. Data Collec on network. This feature pyramid allows the model to detect
Data collec on typically involves gathering a dataset objects of various sizes and scales within the same image.
that consists of different images that consist of labels and 4) Detec on Head: YOLOv8’s detec on head is
different object classes and their annota ons. responsible for predic ng object bounding boxes and their
associated class probabili es. It uses anchor boxes, confidence
B. Data Prepara on scores and bounding box regression to precisely locate and
Training, valida on, and tes ng sets are oSen classify objects in the input image.
generated from the three subsets of the acquired dataset. The 5) Output Layer: The model’s output layer
valida on set is used for hyperparameter tuning and model provides the final predic ons in a format that includes object
selec on, the seTngs that control the training procedure such class predic ons and bounding box coordinates. This
as learning rate, dropout rate—are known as informa on is crucial for iden fying and tracking objects
hyperparameters. The tes ng set is used to assess the final within surveillance video frames.
performance of the trained model, and the training set is used E. Proposed YOLOv8 Architecture
to train the YOLO-V8 architecture which takes the labelled
The proposed YOLOv8 Architecture as shown in Fig.1
datasets to recognize and localize the objects.
consists of four major steps namely Data Acquisi on, Data
Preprocessing, Model Training and Model Evalua on.

In the proposed YOLOv8 implementa on for


surveillance video object detec on and tracking, we’ve
230
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.
introduced custom layers and op miza ons to enhance 3) Loss func on: YOLOV8 uses this loss func on
performance. These customiza ons cater to surveillance that balances the detec on part and segmenta on tasks and
requirements, handling processing steps efficiently. We’ve reduces the false posi ves and false nega ves. The loss
fine-tuned hyperparameters, network architecture and loss func on is computed using the following equa on:
func ons for improved accuracy.
L = Lbox +Lobj +Lcls +Lseg (3)
where Lbox is the loss for bounding box predic on, Lobj is the
loss for an objectness score predic on, Lcls is the loss for class
predic on, and Lseg is the loss for segmenta on mask
predic on.
VI. RESULTS AND DISCUSSION
In this sec on, we provide a comprehensive analysis
of the YOLOv8 model’s performance during training and
valida on, u lizing datasets sourced from the Roboflow
plaXorm for their quality. The training dataset comprises over
4028 images, each with a size of 416x416 pixels and we have
employed preprocessing and data augmenta on techniques to
enhance dataset quality and size. The model is trained with a
Fig. 1. Proposed YOLOv8 Architecture learning rate of 0.001 and a batch size of 64 for 50 epochs
completed in 3.644 hours. One of its notable strengths is its
ability to achieve favourable results. We set the learning rate
Yolov8 is a single-stage model renowned for its ability
at 0.001, a somewhat slow rate. This deliberate choice is ideal
to iden fy objects in a single pass. This efficiency is achieved
for training a substan al language model as it permits the
through a series of convolu onal layers that extract features
model to gradually adapt to new data while mi ga ng the risk
from the input image. Our customized yolov8 model boasts an
of overfiTng. Valida on accuracy and accuracy are pivotal
impressive 268 layers and a total of 68126457 parameters to
metrics for assessing deep learning models in object detec on.
make yolov8 specialized. For our use case, we introduced
They enable quan ta ve evalua ons and comparisons among
several modifica ons to our training dataset including images
different models or varia ons thereof. Our model’s training
related to fire accidents, knives, guns and violent scenes
involves two essen al loss components: “Train/box loss”
allowing the model to become proficient at recognizing these
assessing object localiza on precision and “Train/cls loss”
objects. We employed transfer learning to further enhance the
measuring object classifica on proficiency. Valida on also
model’s performance for op miza on during training. We
employs two crucial losses: “Valid/box loss” evalua ng precise
u lized the Adam op mizer which is par cularly well-suited for
bounding box predic on for unseen data, and “Valid/cls loss”
training large language models. Training includes data
assessing accurate object classifica on. Both valida on losses
augmenta on, specialized loss func ons, and custom datasets.
exhibit a declining trend, indica ng the model’s proficiency in
The training setup uses the Adam op mizer with op mized
generaliza on and precise object detec on, as demonstrated
hyperparameters and a dynamic learning rate schedule, the
in Fig. 4.
model achieves peak proficiency in detec ng and tracking
objects in surveillance videos.
F. Equa ons
1) Objectness score: YOLOV8 predicts
the probability of an object’s presence within each
grid cell using an objectness score. The objectness score is Fig. 2. Detec on of fire
computed using the following equa on:

σ = σ(t0) (1)

2) Class predic on: YOLOV8 performs class


predic on for each object by assigning a probability to
different predefined classes. The class probabili es are
computed using the following equa on:

(2)

Fig. 3. Detec on of violence


231
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.
It is crucial to monitor valida on and training losses
during model training. A consistent decline in valida on and
training losses signifies model improvement while maintaining
a balance between loss minimiza on and preven ng
overfiTng. Both valida on losses decrease, reflec ng the
model’s proficiency in generaliza on and precise predic ons
for object detec on as shown in Fig. 4.

Fig. 7. Represents F1-confidence curve of the model for trained dataset

standard measures such as accuracy, precision, recall, F1-score


and mean average precision (mAP). These metrics collec vely
serve to demonstrate the accuracy and robustness of our
model, in the context of detec ng and tracking objects
relevant to our work. The precision-confidence curves as
shown in Fig.5 and Fig.6 provide insights into the trained
model’s performance on violence and fire datasets. These
curves allow for direct model comparisons and visually
Fig. 4. Represents the loss varia ons with training epochs demonstrate our model’s superior precision across various
confidence thresholds. The F1-confidence curve, shown in Fig.
7, emphasizes the model’s balance between precision and
recall, achieving an F1 score of 0.68 at a confidence threshold
of 0.364. This score reflects the model’s high accuracy and
coverage for dataset classes, indica ng substan al
performance.

Fig. 5. Represents precision-confidence curve of the model for trained violence


dataset

Fig. 8. Detec on of objects in image

The compara ve analysis of object detec on models


reveals key insights into their performance.

As shown in Table I, the proposed YOLOv8 model


Fig. 6. Represents precision-confidence curve of the model for trained fire
dataset
exhibits remarkable precision across all classes, surpassing the
other models in the study. This high precision is par cularly
As shown in Figures 2 and 3, our model exhibits pronounced for the ’Fire’, ’Knife’ and ’Pistol’ classes, indica ng
excep onal performance in accurately detec ng the classes that our model excels in correctly iden fying instances of these
“fire” and “violence”. This remarkable accuracy signifies a objects while minimizing false posi ves. Moreover, when
significant step toward the success of our research. The evalua ng the mean Average Precision (mAP) at an
model’s ability to consistently and accurately iden fy diverse intersec on over a union (IoU) threshold of 0.5 ([email protected]),
classes underscores its versa lity and competence in detec ng YOLOv8 consistently outperforms its counterparts. This metric
various objects or subjects as shown in Fig.8. These outcomes is crucial for assessing the model’s ability to provide accurate
affirm the model’s robustness and suitability for applica ons in object localiza on. Our model achieved an overall [email protected] of
object detec on, emphasizing its high-accuracy results. We 0.706, demonstra ng its excellence in localiza on and overall
have conducted a thorough evalua on of our research work, accuracy. In contrast, YOLOv7 and YOLOv5, although
offering a comprehensive assessment of our system’s compe ve in some aspects, fall behind YOLOv8 in terms of
performance. Our evalua on employs a range of quan ta ve precision and [email protected], par cularly for the ’Pistol’ class.
metrics designed to gauge the effec veness of our approach. These results highlight the superior object detec on
These metrics include capabili es of our YOLOv8 model, making it the top choice
232
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.
Model Class Precision Recall [email protected] suitable for a wide range of applica ons, par cularly in video
YOLOv8 All 0.721 0.651 0.706 surveillance, where rapid processing is essen al. Our findings
Fire 0.858 0.665 0.774 underscore the superiority of the YOLOv8 model architecture
Knife 0.818 0.665 0.774 over its predecessors, demonstra ng marked improvements in
Pistol 0.778 0.778 0.814 performance metrics. Moreover, our chosen model
Violence 0.801 0.76 0.797 architecture is highly customizable, enabling training on
YOLOv7 All 0.528 0.564 0.512 bespoke datasets tailored to specific object detec on tasks.
Fire 0.822 0.662 0.782 This adaptability paves the way for future research to explore
Knife 0.588 0.746 0.669 further enhancements and alterna ve algorithms, ul mately
Pistol 0.543 0.527 0.546 advancing the robustness and versa lity of object detec on
Violence 0.822 0.662 0.782 systems. However, it’s crucial to acknowledge the poten al
YOLOv5 All 0.626 0.534 0.553 limita ons and challenges associated with real-world
Fire 0.801 0.652 0.765 deployment, such as varia ons in ligh ng condi ons, object
Knife 0.716 0.695 0.740 occlusions, and poten al biases in training data. Future work
Pistol 0.511 0.410 0.398 should focus on addressing these challenges to ensure the
Violence 0.799 0.754 0.795 broader applicability of our approach.
Faster R-CNN All 0.702 0.620 0.695 REFERENCES
Fire 0.811 0.642 0.765
[1] Saeed Matar Al Jaberi, Asma Patel, and Ahmed N. AL-Masri, “Object
Knife 0.811 0.642 0.765 tracking and detection techniques under GANN threats: A systemic review,”
Applied Soft Computing, vol. 139, pp. 110224, Mar 2023.
Pistol 0.525 0.502 0.536
Violence 0.781 0.732 0.788 [2] J. Eunice, A. J, Y. Sei, and D. J. Hemanth, “Sign2Pose: A Pose-Based
Approach for Gloss Prediction Using a Transformer Model,” Sensors, vol.
Mask R-CNN All 0.731 0.670 0.718 23, no. 5, pp. 2853, Mar 2023.
Fire 0.825 0.690 0.794
Knife 0.825 0.690 0.794 [3] V. D. Huszar, V. K. Adhikarla, I. N´ egyesi, and C. Krasznay, “Toward´ Fast
and Accurate Violence Detection for Automated Video Surveillance
Pistol 0.557 0.518 0.567 Applications,” IEEE Access, vol. 11, pp. 18772-18793, 2023.
Violence 0.812 0.770 0.809
[4] P. Y. Ingle and Y.-G. Kim, “Real-Time Abnormal Object Detection for Video
TABLE I: COMPARISION RESULTS OF YOLOV8 Surveillance in Smart Cities,” Sensors, vol. 22, no. 10, pp. 3862, May 2022.

[5] Yung-Yao Chen, Yu-Hsiu Lin, Yu-Chen Hu, Chih-Hsien Hsia, Yi-An Lian,
and Sin-Ye Jhong, “Distributed Real-Time Object Detection Based on
Edge-Cloud Collaboration for Smart Video Surveillance Applications,” IEEE
for scenarios where precision and localiza on accuracy are Access, vol. 10, pp. 93745-93759, Sep 2022.
paramount. While Faster R-CNN and Mask R-CNN perform
[6] B. Kwon and T. Kim, “Toward an Online Continual Learning Architecture for
admirably, especially for the ’Knife’ class, they exhibit slightly Intrusion Detection of Video Surveillance,” IEEE Access, vol. 10, pp.
lower overall [email protected] and recall compared to YOLOv8. This 89732-89744, Aug 2022.

indicates that our model is be7er suited for our project’s [7] Khayrat et al. “An Intelligent Surveillance System for Detecting Abnormal
objec ves, where high precision and localiza on accuracy are Behaviour’s on Campus using YOLO and CNN-LSTM Networks,” pp. 104-
109, 2022.
essen al for object detec on and tracking. However, the
comprehensive analysis of our YOLOv8 model against other [8] Lyu, Y., Yang, M. Y., Vosselman, G., and Xia, G.-S., “Video object detection
with a convolutional regression tracker”, ISPRS Journal of Photogrammetry
state-of-the-art models underscores its superiority in terms of and Remote Sensing, vol. 176, pp. 139–150, June 2021.
precision, [email protected] and overall accuracy for object detec on [9] A. Mehmood, “Efficient Anomaly Detection in Crowd Videos Using Pre-
Trained 2D Convolutional Neural Networks,” IEEE Access, vol. 9, pp.
in our project. This performance makes YOLOv8 an ideal choice 138283-138295, Oct 2021.
for applica ons requiring high-precision object detec on, such
as surveillance systems and security applica ons. [10] H. Park, S. Park and Y. Joo, “Detection of Abandoned and Stolen Objects
Based on Dual Background Model and Mask R-CNN,” IEEE Access, vol. 8,
pp. 80010-80019, Apr 2020.
VII. CONCLUSIONS
In conclusion, our study has explored the u liza on of [11] A. Sabater, L. Montesano and A. C. Murillo, “Robust and efficient
postprocessing for video object detection,” 2020 IEEE/RSJ International
the YOLOv8 model architecture in combina on with Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV,
USA, vol. 1, pp. 10536-10542, Sep 2020.
Convolu onal Neural Networks (CNNs) for object detec on,
which has yielded notable advantages in terms of [12] Venkata Prasad V, Chandra Sekhar Rayi, Cheggoju Naveen and Vishal R.
computa onal efficiency and performance. This research satpute, “Object’s Action Detection using GMM Algorithm for Smart Visual
Surveillance System,” Procedia Computer Science, vol. 133, pp.
represents a significant step forward, building upon the legacy 276–283, 2018.
of YOLO, a real- me object detec on algorithm renowned for
its speed and accuracy. By harnessing the power of YOLOv8 and
CNNs, we have addressed various aspects of object detec on
in our work. One key highlight of this approach is its
excep onal speed and real- me performance, making it highly
233
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on September 15,2024 at 05:26:53 UTC from IEEE Xplore. Restrictions apply.

You might also like