0% found this document useful (0 votes)
131 views5 pages

YOLO-Based Video Processing For CCTV Surveillance

Its a yolo based paper for detecting a person with cctv

Uploaded by

d4technozgamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views5 pages

YOLO-Based Video Processing For CCTV Surveillance

Its a yolo based paper for detecting a person with cctv

Uploaded by

d4technozgamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the 5th International Conference on Inventive Research in Computing Applications (ICIRCA 2023)

IEEE Xplore Part Number: CFP23N67-ART; ISBN: 979-8-3503-2142-5

YOLO-Based Video Processing for CCTV


Surveillance
Dr.Sujata Terdal Dr.Sayyada F Ameena Fatima
Department of Computer Department of Computer B.E. Final Year, Department of
Science and Engineering Science and Engineering Computer Science and Engineering
P.D.A. College of Engineering P.D.A. College of Engineering P.D.A. College of Engineering
Kalaburagi, India Kalaburagi, India Kalaburagi, India
2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA) | 979-8-3503-2142-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICIRCA57980.2023.10220923

[email protected] [email protected] [email protected]

Amulya Reddy Manasi Koppal


B.E. Final Year, Department of B.E. Final Year, Department of
Computer Science and Engineering Computer Science and Engineering
P.D.A. College of Engineering P.D.A. College of Engineering
Kalaburagi, India Kalaburagi, India
[email protected] [email protected]

Abstract— A visual surveillance system is primarily employed By leveraging the power of surveillance system, it can
for the analysis and interpretation of object behaviours within a proactively prevent potential threats and create a secure
given video scene. Surveillance system is important because it environment in public spaces. In order to generate alert
acts as a deterrent to crime, helping prevent theft, vandalism, message, the harmful objects need to be detected and this is
assault and enhances public safety by monitoring public spaces done using different object detection methods.
and allowing for a timely response to emergencies. The
performance of a surveillance system using traditional cameras
is vastly superior during daytime as compared to night time, this Classification-based algorithms, such as Convolutional
is primarily due to the availability of natural light during the Neural Networks (CNN) and Recurrent Neural Networks
day whereas at night in the absence of ambient light it’s difficult (RNN), belong to the first category. These algorithms are
for detecting objects. The developed system that detects slower because they require running predictions for every
anomalous activity during the day as well as night and alert the selected region of an image. This region-based approach can
authorized user. The proposed system captures the video using be time-consuming as it involves analyzing numerous regions
CCTV camera and converts the video into frames, these frames individually to determine the presence of objects.
undergo pre-processing to detect objects using YOLO
algorithm. In the event of any anomalies or suspicious activities
detected within the surveillance system, an alert message is
On the other hand, [9] regression-based algorithms,
promptly sent to the concerned user or designated authority. exemplified by the You Only Look Once method, fall into the
The results of the system demonstrate that the accuracy of the second category. YOLO takes a different approach by
proposed methods reaches 98.3%. predicting both the classes and bounding boxes for the entire
Keywords— You Only Look Once, Pre-processing, image in a single pass of the algorithm. Unlike region-based
Anomalous, Alert Message, Object Detection, Surveillance algorithms, YOLO does not require pre-selecting regions of
System interest in the image. Instead, it directly predicts the classes
and bounding boxes for all objects simultaneously using a
I. INTRODUCTION single neural network. This enables the system to detect
Object detection is a computer vision technology that multiple objects efficiently and rapidly.
identifies and localizes objects within digital images and
videos. [1] It plays a crucial role in real-time systems where The YOLO algorithm offers superior speed compared to
multiple objects need to be located simultaneously. other classification-based algorithms. By eliminating the
need for region selection and employing a single-pass
Imagine a scenario where a person is walking through a approach, YOLO achieves real-time object detection
public place or within the surveillance area, carrying a capabilities while maintaining accuracy. This makes it well-
concealed object that may pose a potential threat to others. In suited for various applications, including visual surveillance
such a situation, the advanced surveillance system equipped systems, where fast and accurate object detection is essential
with object detection capabilities instantly recognizes the for effective monitoring and analysis.
presence of dangerous object.

As soon as the system detects the concealed object, it


immediately sends an alert message to the concerned
authorities responsible for maintaining public safety. This
real-time alert allows the authorities to take prompt action
and intervene before any harm or dangerous situation can
arise.

979-8-3503-2142-5/23/$31.00 ©2023 IEEE 273


Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 29,2024 at 13:04:49 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Inventive Research in Computing Applications (ICIRCA 2023)
IEEE Xplore Part Number: CFP23N67-ART; ISBN: 979-8-3503-2142-5

II. LITERATURE SURVEY Priya Kumari , Sonali Mitra and their team introduced a
system which is YOLO Algorithm Based Real-Time Object
Farhan Asif Chowdhury introduced a system for Image
Detection.[14] The main objective of Real time object
Classification and Object Detection using CNN [10] which
detection is to locate an object in a given image with
leverages an existing collection of low- and high-resolution
accuracy, and then label it with the relevant category. The
traffic sign images to carry out the classification and detection
machine learning model in this system was trained using the
process. In this work, relatively excellent classification results
You Look Only Once (YOLO) approach for real-time object
were also obtained even with a tiny dataset; nevertheless,
recognition. YOLO is a smart neural network for real-time
detection accuracy was poor and significantly influenced by
object detection, and the COCO Dataset is used to train the
the size of the training dataset and the number of training
algorithm to recognise various things in a picture. 90%
iterations. Single Shot MultiBox Detector (SSD) and Faster R-
accuracy in real-time object detection is achieved after
CNN architecture were employed for the object detection job,
training with this technique.
while the author used a modified version of AlexNet for
.
classification.
Sandeep Kumar, Aman Balyan, Manvi Chawla introduced a III. METHODOLOGY
system for Object Detection and Recognition in Images and Video surveillance systems play a crucial role in maintaining
the model used is Easynet [11]. The global context informs security and overseeing operations in various environments.
Easynet model's predictions because it examines the entire They serve as a cornerstone in ensuring the safety and
image at test time. The model calculates scores based on monitoring of activities across a wide range of settings. The
whether an object falls under a specific category at the time of proposed system involves video acquisition, pre-processing,
prediction. A single network evaluation is used to produce segmentation, object recognition and generation of alert
predictions. Regression to geographically separated bounding message in case of any unusual activity
boxes and related class probabilities is being used here to
solve the challenge of item detection.. The object detection is Figure 1 : Workflow of the proposed system
done by first acquiring the image, enhancement of the image,
feature extraction, segmentation, object detection and last is
the representation and description of the object that is
detected.
Devashish Lohani, Carlos Crispim-Junior, Quentin
Barthélemy, Sarah Bertrand, Lionel Robinault, Laure Tougne
Rodet introduced Perimeter Intrusion Detection(PID) by
Video Surveillance [12] It seeks to find an unauthorised object
in a secured outdoor area at a specific time. On the basis of
this definition, the authors review the available techniques,
data sets, and evaluation procedures. It also offers an
appropriate evaluation technique for use in practical
situations. Finally, they used a variety of evaluation
techniques and criteria to assess the performance of the current
systems on datasets that were made accessible. The PID
methods that can be used are Pre-Processing, Detection,
Tracking, Joint Detection and Tracking, Post-Processing,
Alarm.
Dr Pawan Mishra, Gyanendra Saroha introduced the work A
Study on Video Surveillance System for Object Detection and Video acquisition is the initial step in the video processing
Tracking [13] The main purpose of a study on a video workflow as shown in figure:1 involving the process of
surveillance system for object detection and tracking [13] is to obtaining a video for further analysis or manipulation. It can
analyse and clarify object behaviours. It includes both static be accomplished through two primary methods: reading a
and moving object identification, as well as video tracking to pre-existing video file or capturing a video using a camera or
comprehend scene events. Finding the numerous approaches other recording devices.
to static and moving object recognition and moving object
tracking is the main goal of this survey research. Any video Converting a video into frames involves a straightforward
scene has objects that object detection techniques can identify. process. First, the video file is opened using a video
There are numerous categories of observed items, including processing library or framework. The library reads the video
people, moving objects, trees, and clouds. In higher level file and provides access to its frames. Then, frame extraction
applications, object tracking is utilised to determine the begins, where each frame is extracted individually from the
location of available items and their shape within each frame. video file. The desired frame rate is controlled by selecting a
In this essay, they utilised. Point tracking, Silhouette Tracking specific number of frames per second (fps) to extract.
and Kernel tracking are the different tracking methods
described by the author. Pre-processing is a crucial step in video processing that
involves a series of operations to transform and enhance the

979-8-3503-2142-5/23/$31.00 ©2023 IEEE 274


Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 29,2024 at 13:04:49 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Inventive Research in Computing Applications (ICIRCA 2023)
IEEE Xplore Part Number: CFP23N67-ART; ISBN: 979-8-3503-2142-5

video frames before they are fed into algorithms or further A. Object Detection
analysis. In the proposed system, the video frames undergo The object detection involves the following steps:
several pre-processing steps to enhance their quality and
facilitate better analysis, these steps involve, Step1: Dataset Preparation which involves downloading the
• Denoising: Noise can degrade the quality of video COCO dataset, which consisting of labeled images with
frames, affecting the accuracy of subsequent bounding boxes and class labels [15].
processing tasks. One can use Gaussian blur filter in
order to remove the noise.[8] Step 2: The next step is label Conversion which converts the
annotation format of the COCO dataset to the YOLO format.
• Conversion to Grayscale: Converting to grayscale
YOLO format includes the object's class label and the
reduces the data dimensionality and removes color- normalized coordinates of the bounding box (center x, center
related variations, focusing solely on the intensity or y, width, height) relative to the image size.
structural information in the image.
• Image Resizing and Cropping: Video frames may Step 3: After label conversion we Network Architecture
need to be resized or cropped to a specific resolution Choosing a YOLO variant [4] (e.g., YOLOv3, YOLOv4) and
or aspect ratio to standardize the input size for construct the network architecture. The proposed system
chooses YOLOv3 because of its multi-scale detection,
further processing or to match the requirements of powerful architecture, and efficient prediction mechanism, it
the subsequent algorithms. offers an effective solution for object detection in
surveillance systems.
.
Once the frame is converted into grayscale image, the Step 4: The pretrained Model initializes the YOLO network
segmentation techniques are applied to partition the grayscale with pretrained weights.
frames into meaningful regions or segments based on their
visual characteristics or properties. Segmentation in video Step 5: Next step is training the YOLO network on the COCO
processing refers to the process of partitioning a video training set. This involves feeding the training images
sequence into meaningful and semantically coherent regions through the network, computing the loss between the
or segments based on their visual properties. The goal of predicted bounding boxes and the ground truth, and updating
video segmentation is to identify and separate different the network parameters using backpropagation and gradient
objects, regions, or motions within the video frames to enable descent. Adjust hyperparameters such as learning rate, batch
further analysis, understanding, and manipulation of the size, and number of iterations or epochs.
video content. The proposed system uses a form of
segmentation known as instance segmentation. Instance Step 6: Next step involves periodically evaluating the trained
segmentation aims to identify and delineate individual model on the COCO validation set. Calculate metrics such as
objects within an image or video frame, assigning unique mean average precision (mAP) to assess the model's
labels to each distinct object instance. performance on object detection.

Object recognition is a computer vision methodology that Step 7: After training, evaluate the final YOLO model on the
focuses on identifying objects within video content. A COCO test set. Generate bounding boxes, class labels, and
popular technique for detecting objects in videos involves confidence scores for the detected objects.
breaking down the video into individual frames and applying
an image recognition algorithm to each frame. One widely Step 8: Apply post-processing steps to the model's output to
used algorithm for this purpose is YOLO[3] (you only look refine the object detections. This typically involves
once). By utilizing YOLO or similar algorithms, the system techniques like non-maximum suppression (NMS) to remove
can efficiently and accurately identify objects in video duplicate or overlapping detections and filtering based on
frames. confidence scores or predefined thresholds.

The YOLO (you only look once) algorithm is an object Alert Message : Once the objects or activities are detected,
detection method that relies on regression techniques[6]. In the system compares them against predefined criteria or
YOLO, the primary objective is to detect objects within an patterns to determine if they constitute unusual or suspicious
image, and this task is framed as a regression problem. The behaviour. If the detected objects or activities deviate
model is trained to directly estimate the bounding box significantly from the expected norms, an alert is generated.
coordinates and class probabilities for each object found in The generated alert serves as a notification to relevant
the image. By leveraging regression techniques, YOLO personnel, such as security personnel or system
enables the accurate prediction of object locations and administrators, about the unusual activity.
classes, offering an efficient approach to object detection.
This is in contrast to other approaches that involve multiple IV. WHY CNN?
stages, such as region proposal and classification
Convolutional Neural Networks (CNNs) are specifically
designed for image recognition tasks and have several
advantages over normal deep neural networks (DNNs) when
it comes to processing visual data. While DNNs can also be

979-8-3503-2142-5/23/$31.00 ©2023 IEEE 275


Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 29,2024 at 13:04:49 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Inventive Research in Computing Applications (ICIRCA 2023)
IEEE Xplore Part Number: CFP23N67-ART; ISBN: 979-8-3503-2142-5

trained on image data, they are generally less efficient and applied and the image is transformed into a grayscale
effective compared to CNNs because they lack the specific representation as seen in figure 4.
architectural design tailored for image recognition. CNNs'
ability to exploit local connectivity, parameter sharing,
hierarchical feature learning, translation invariance, and
weight regularization makes them more suitable for
processing images and achieving high accuracy in image
recognition tasks.
V. IMPLEMENTATION AND RESULT
Ensuring security and monitoring activities in diverse
environments heavily rely on video surveillance systems. The
proposed system encompasses essential components,
including video acquisition, pre-processing, segmentation, Figure 4: The Grayscale Image
object recognition, and the generation of alert messages when
detecting any suspicious or abnormal activities. Each stage
contributes significantly to the system's capability to identify By following this sequential preprocessing pipeline, the
potential threats and promptly alert appropriate authorities or image or frame is optimized for performing object detection
personnel. The CCTV surveillance provides the user with two and it is done using YOLO.It creates a grid of cells from the
options namely Read video and Capture video as shown in input image, with each cell responsible for determining the
figure 2. bounding boxes and class probabilities for possible items.
YOLO performs detection in a single pass. It predicts
bounding boxes by regressing their coordinates relative to the
cell, along with associated confidence scores for object
presence.

Figure 2: Main Page

Figure 5: The Enhanced Image


The next step involves converting the video into consecutive
frames. These frames them undergo various pre-processing
steps.

Figure 6: Object Detected

Figure 3: The Converted Frame


In case of if any hazardous objects are detected the user is
alerted with a message along with an alarm.
In the pre-processing stage, several steps are performed to
prepare the image or frame for object detection. The first step The proposed system was executed 50 times for the
is denoising, which involves removing any unwanted noise recognition of objects and it was found that 40 times the result
or artifacts from the image. This helps to improve the overall was correct and 10 times the object was wrongly identified.
quality and clarity of the image. Next, brightness and contrast [2]
of the image are adjusted to enhance the visibility of objects Here are some of the precision and recall values of the objects
and improve their distinguishability from the background. To identified by the system.
further refine the image and reduce noise, Gaussian blur is Precision= (True Positive)/(True Positive+ False Negative)

979-8-3503-2142-5/23/$31.00 ©2023 IEEE 276


Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 29,2024 at 13:04:49 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Inventive Research in Computing Applications (ICIRCA 2023)
IEEE Xplore Part Number: CFP23N67-ART; ISBN: 979-8-3503-2142-5

Recall= (True Positive)/(True Positive+ False Negative)


[9]. Akansha Bathija, Prof. Grishma Sharma “Visual Object
Table 1: Precision and Recall Detection and Tracking using YOLO and SORT” Vol. 8 Issue
11, November-2019.
Object in the Precision Recall
frame [10]. Farhan Asif Chowdhury” Image Classification and
Scissors 0.77 0.87 Object Detection using CNN A Comparative Study using
Person 1.00 1.00 Traffic Sign Imagery” The University of New Mexico,
Bottle 0.90 1.00 Albuquerque, USA
Motorbike 0.75 0.75
Cell Phone 1.00 0.80 [11]. “Object Detection and Recognition in Images” Sandeep
Kumar, Aman Balyan, Manvi Chawla Computer Science
&Engineering Department, Maharaja Surajmal Institute of
VI. CONCLUSION. Technology, New Delhi, India 2017
While it may be challenging to envision a future where
surveillance of monitored spaces is entirely automated, it is [12]. “Perimeter Intrusion Detection by Video surveillance”
evident that there is a pressing need to enhance existing A survey Devashish Lohani, Carols Gripim-junior, Quentin
surveillance technology with more effective tools to support Barthelemy, Lionel ,20222.
human operators. Fortunately, the growing accessibility of
affordable computing resources, advanced video [13]. A Study on Video Surveillance System for Object
infrastructure, and improved video analysis technologies will detection and Tracking” , Drpawan Mishra, Gyanendra
pave the way for the emergence of smart surveillance Saroha, IEEE,2016
systems. These systems have the potential to completely
replace conventional surveillance systems by offering [14]. Priya Kumari1, Sonali Mitra, Suparna Biswas, Sunipa
enhanced capabilities and more efficient monitoring. Roy, Sayan Roy, Chaudhuri, Antara Ghosal, Palasri Dhar,
Anurima Majumder, “YOLO Algorithm Based Real-Time
REFERENCES Object Detection”, Volume 8 Issue 1.
[1]. Geethapriya. S, N. Duraimurugan, S.P. Chokkalingam
“Real-Time Object Detection with Yolo” IJEAT ISSN: 2249 [15]. COCO Dataset https://cocodataset.org/#home
– 8958, Volume-8, Issue-3S, February 2019.
[2]. How compute accuracy for Object Detection works
https://pro.arcgis.com/en/pro-app/latest/tool-
reference/image-analyst/how-compute-accuracy-for-object-
detection-works.
[3]. https://www.geeksforgeeks.org/yolo-you-only-look-
once-real-time-object-detection/amp/

[4]. https://towardsdatascience.com/yolo-v3-object-
detection-53fb7d3bfe6b

[5]. https://link.springer.com/article/10.1007/s11042-022-
13644-y

[6]. https://neptune.ai/blog/object-detection-algorithms-
and-libraries

[7]. https://www.datacamp.com/blog/yolo-object-
detection-explained

[8]. https://pyimagesearch.com/2021/04/28/opencv-
smoothing-and-blurring/

979-8-3503-2142-5/23/$31.00 ©2023 IEEE 277


Authorized licensed use limited to: National Institute of Technology - Puducherry. Downloaded on July 29,2024 at 13:04:49 UTC from IEEE Xplore. Restrictions apply.

You might also like