0% found this document useful (0 votes)
83 views

Real-Time Weapons Detection System Using Computer Vision

The document presents a research study on a Real-Time Weapons Detection System utilizing computer vision technologies, specifically focusing on the use of Convolutional Neural Networks (CNN) and hybrid models like Detectron2 and YOLOv7. The study aims to enhance the accuracy of weapon detection in surveillance systems by compiling a diverse dataset and implementing advanced deep learning algorithms to minimize false positives. The research highlights the importance of automated surveillance in improving security measures while addressing the challenges of real-time weapon identification.

Uploaded by

sale saisannidh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Real-Time Weapons Detection System Using Computer Vision

The document presents a research study on a Real-Time Weapons Detection System utilizing computer vision technologies, specifically focusing on the use of Convolutional Neural Networks (CNN) and hybrid models like Detectron2 and YOLOv7. The study aims to enhance the accuracy of weapon detection in surveillance systems by compiling a diverse dataset and implementing advanced deep learning algorithms to minimize false positives. The research highlights the importance of automated surveillance in improving security measures while addressing the challenges of real-time weapon identification.

Uploaded by

sale saisannidh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

Real-Time Weapons Detection System using


Computer Vision
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR) | 979-8-3503-7086-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/STCR59085.2023.10396960

Pranav Nale Shilpa Gite


Department of Information Technology, Symbiosis Institute of Department of Artificial Intelligence and Machine Learning,
Technology, Symbiosis International (Deemed University), Symbiosis Institute of Technology, Symbiosis International
Pune, 412115, Maharashtra, India (Deemed University), Pune, 412115, Maharashtra, India
[email protected] [email protected]

Deepak Dharrao
Department of Computer Science and Engineering, Symbiosis Institute of Technology,
Symbiosis International (Deemed University),
Pune, 412115, Maharashtra, India
[email protected]

Abstract—The growing use of Closed-Circuit Television Installing surveillance cameras that can automatically
(CCTV) systems in modern security applications has driven the identify firearms and trigger an alarm to notify the operators
need for automated surveillance through computer vision. The or security personnel is the solution to the aforementioned
primary aim is to reduce human intervention while enhancing dilemma [2, 3]. However, there hasn’t been much research
early threat detection and real-time security assessments. done on algorithms for detecting weapons in Real-Time using
Although advanced surveillance technologies have facilitated surveillance cameras, and related studies frequently consider
monitoring, constant human oversight remains challenging. hidden weapon detection, typically utilizing X-rays or
This has prompted a quest for models capable of identifying millimeter wave pictures using conventional machine learning
unlawful activities with minimal human involvement. Real-time
approaches [4]. Convolutional neural networks (CNN), in
weapon detection, despite advancements in deep learning
algorithms and dedicated CCTV cameras, remains a formidable
particular, have produced ground-breaking breakthroughs in
challenge, especially with varying angles and potential object identification and categorization during the past few
obstructions. Existing detection systems are often expensive and years [5]. It has so far produced the best results in common
require specialized tools, necessitating a cost-effective and image processing issues, including grouping, detection, and
reliable alternative that minimizes false positives. This research localization. CNN automatically learns features from the data
focuses on creating a secure environment by utilizing real-time rather than choosing them manually.
resources and deep-learning algorithms for identifying The goal of this research is to make object detection
dangerous weapons. Without a predefined dataset for real-time
models more accurate using well-labelled cutting-edge
detection, the researchers compiled one from diverse sources,
including camera shots, internet images, movie data, YouTube
datasets, enhance the current weapon detection system using
CCTV recordings, and Roboflow Computer Vision Datasets. Real-Time detection with a combination of video and
The proposed weapon detection system employs a hybrid model photodetection, and run a firearm detection analysis on real-
of Detectron2 and YOLOv7, emphasizing precision and recall in time video, as these studies typically perform evaluations on
object detection, particularly in challenging conditions like low- fictitious datasets [6].
light environments. This research contributes to developing an
effective, reliable real-time weapon detection system tailored for II. LITERATURE REVIEW
diverse scenarios. The modern world is very concerned with security and
safety. A nation’s ability to attract foreign investment and
Keywords— Weapons, Detection System, YOLOv7, tourism depends on how safe and secure its environment is.
Detectron2, RealTime, Object Detection, Gun Detection, The most recent numbers show increased civilian gun
RealTime Object Detection ownership [7]. According to the most recent figures, there are
I. INTRODUCTION 71.1 million gun owners in India; China, 49.7 million;
Pakistan, 43.9 etc. [8], which indicates that risks from firearms
Global crime rates have increased as a result of the are growing internationally. Maintaining the Integrity of the
increasing use of pistols during gruesome crimes. For a Specifications. Real-time object recognition and
country to grow, the rule of law must be preserved. The categorization became a challenge as a result of significant
prevalence of gun-related criminality is a major concern in advancements in the field of CCTV, processing technology,
many parts of the world [1]. The majority of them are and deep learning models [9]. There has only been a small
countries where owning a gun is legal. The harm would have amount of research in this area, and most of it has been
been done even if the news they received was false and untrue focused on detecting concealed weapons.
if it hadn’t spread quickly worldwide owing to the media,
particularly social media. If a person is in a situation with a It was initially derived from imaging technologies like
weapon, they may become irrational and act violently. This is millimeter waves and utilized for baggage checking and
because people may be brainwashed. various airport security reasons before being employed for
weapon detection. For finding concealed weapons at airports
and other secure areas of the body,

979-8-3503-7086-7/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

Sheen et al. introduced the CWD approach, utilizing three- B. Real-Time Detection Dataset
dimensional millimeter (mm) wave imaging technology [10]. The binary classification for a real-world scenario is the
X. Zue et al. proposed an alternative CWD method based on a focus of this study; therefore, two classes were created, with
multi-stage decomposition method that fuses color visual the pistol class including photographs of pistols and revolvers
images with infrared (IR) pictures [11]. Meanwhile, R. Blum but there exist classes to reduce the confusion while training
et al. presented a CWD methodology that combines visual and the model and to decrease the chances for false positives.
IR or millimeter wave images. This approach incorporates a
multiple-resolution mosaic capability, emphasizing the C. Data Pre-Processing
concealed weapon in the target image [12]. These diverse The effectiveness of a Machine Learning (ML) model for
methods highlight the evolving landscape of concealed a given task is influenced by a variety of factors. The
weapon detection, exploring various imaging technologies representation and quality of the data are crucial at the
and fusion techniques for enhanced accuracy and efficiency. beginning. It is more difficult to find representation during the
E. M. Upadhyay proposed an image-fusion-based CWD training stage if there is a lot of redundant and irrelevant data
method. When the scene’s picture was present above and or noisy data. Processing time for ML problems is
beneath exposed areas, they employed IR image and visual significantly slowed down by data preparation and filtering
effusion to discover hidden weaponry [13]. They used a stages. Data cleansing, standardization, processing,
homomorphic filter taken at various exposure levels, which extraction, and feature selection are all part of the pre-
they then applied to visible and IR images. The current processing process. The obtained dataset underwent pre-
methods achieve high precision by combining different processing to create the final training dataset.
extractors and detectors, either by using simple methods like Pre-processing is extremely crucial for better training of a
boundary detection, pattern matching, and easy intensity model. Making the dataset the same size or resolution is the
descriptors or by using trickier methods like cascade first step in pre-processing, which is important
classifiers with boosting [14]. Rohith Vajhala published the
technique for handgun detection in CCTV systems. For for improved model training. The next step is to apply the
classification, they have combined the backpropagation of mean normalization. Making bounding boxes on these photos,
artificial neural networks with HOG as a feature extractor also known as annotation, localization, or labelling, is the next
[15]. The detection was carried out under various conditions, stage. Each image in the data has a bounding box that is
first with a weapon alone and then with HOG and background labelled. The labelled object’s width, height, and value x, and
subtraction techniques for people before the target object, with y coordinates are recorded in XML, CSV, or text format. The
a claimed accuracy of 83%. four primary phases in data preparation are as follows:
 Image Scaling
III. DATASET CONSTRUCTION AND PRE-PROCESSING
 Data Augmentation
A. Weapon Dataset Classes  Image Labeling
• Reason for choosing Pistol Class  Image Filtering using OpenCV
 RGB to Grayscale
We chose the short handheld weapons in the pistol class  Rotation and Perspective
based on our research and analysis after studying several
CCTV films of robberies and shooting incidents. We came to IV. METHODOLOGY
the conclusion that revolvers or pistols were utilized in A. Object Labelling
virtually all of those incidents. Fig. 1 displays a few real-time
samples taken from the pistol class dataset. This dataset served as the starting point for this project as
it is a hybrid of manually scouted class images and a pre-
The dataset for this class consists of image samples of the labelled dataset from Roboflow. There were 2693 total photos
following weapons: in this collection, 2080 of which were classified as pistols
 Pistol from the pre-labelled Roboflow dataset, and 613 were
 Revolver manually classified as pistol/short firearms using an offline
 Short handheld firearms labelling tool Labelme.
Those 613 classified images were labelled using the
dynamic polygonal labelling tool in Labelme, which
intricately classifies the subject from the rest of the
background, making it better for training and testing in later
phases.

Figure 1 Dataset Samples of Pistol Class including Pistol


and other short-handled weapons
Figure 2 Polygonal Labelling method in Labelme

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

B. Object Recognition In order to produce the box with our chosen threshold,
It is a technique for identifying the actual class or category non-max suppression is utilised, as shown in Fig. 4 and 5. The
that an image belongs to by increasing the likelihood just for following attributes can be seen in the output:
that particular class. This technique is carried out quickly  Bounding Box
using CNNs. CNN is frequently used as a backend in cutting-  Probability
edge Classification and Detection algorithms.
According to Fig. 3, the classification of images and Object Detection tends to be a very CPU and GPU heavy
localization of objects fall under the recognition, and task. Hence, in the past object detection was highly limited due
combined classification & and localization are used to detect to a lack of data and poor computing power. As time went on,
objects. A quick summary of object categorization, however, computing power rose, and the world transitioned
localization, and detection is in order from CPUs to GPUs (GPU). Originally intended for gaming
and enhancing the graphics quality of computers, GPUs are
now widely employed for deep learning. Competitions began
in ImageNet and comprised around 1000 classes.
C. Classification and Detection Approach
Following are the classifier and object detectors are used
in this research work.
 Detectron2
Figure 3 Object Classification and Localization  YOLOv5
 YOLOv7
 Faster RCNN-Inception ResNetV2
Image Classification The feature maps are obtained by
applying a kernel/filter to the entire picture in the Detectron2 One of the most potent deep learning
classification model. It then guesses the label based on the toolboxes for image identification is Detectron2. Instance
likelihood of the extracted characteristic. segmentation, person keypoint detection, panoptic
Object Localization By providing the height and width that segmentation, object detection, and other activities may be
go along with the item’s coordinates, this technique produces easily switched between because of its versatile architecture
the precise location of an object within an image. Object [16] [17]. Popular datasets, including COCO, Cityscapes,
Detection The characteristics of the aforementioned LVIS, and PascalVOC are supported natively, in addition to
algorithms are used in this work. The detection technique other Faster/Mask R-CNN backbone combinations (Resnet +
provides the class name and the enclosed box’s x and y FPN, C4, Dilated-C). Additionally, it offers baselines with
coordinates along with width & and height. pre-trained weights that are ready for usage. The architecture
of Detectron2 is shown in figure 6.

Figure 4 Object Detection using YOLOv7


Figure 6 Architecture of Detectron2
The architecture of this network is depicted in the above
schematic. It consists of three blocks, namely:
• Backbone Network: Different scales of feature maps
are extracted from the input picture.
• Region Proposal Network: From the multi-scale
characteristics, it extracts object areas.
• Box Head: In order to acquire precise box positions
and classification results, it warps and crops feature
maps into a number of fixed-size features by using
proposal boxes.
Figure 5 Instance Segmentation using YOLOv7 YOLOv7 The newest member of the YOLO (You Only
Look Once) family of models is the v7 version. Single-stage
object detectors are YOLO models. Image frames in a YOLO
model are enhanced by a backbone [18] [19]. These
characteristics are merged and blended in the neck, where they

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

are subsequently transmitted to the network’s head. Use either weights as a classifier; however, if it is inaccurate, the
SI (MKS) or CGS as primary units. (SI units are encouraged.) backpropagation procedure and gradient descent technique are
English units may be used as secondary units (in parentheses). used.
An exception would be using English units as identifiers in
trade, such as “3.5-inch disk drive.”
Introducing the YOLOv7, the latest addition to the YOLO
(You Only Look Once) model family, focusing on single-
stage object detection. In YOLO models, image frames
undergo enhancement through a backbone [18] [19]. These
characteristics are then fused and integrated in the neck before
being transmitted to the network's head. Primary units are
encouraged to be either SI (MKS) or CGS, with English units
permissible as secondary units in parentheses. Noteworthy
exceptions include using English units as identifiers in trade,
such as "3.5-inch disk drive."
The locations and types of items around which bounding
boxes should be created are predicted by YOLO. YOLO Figure 8 Evaluation of YOLOv7 with its peer networks
conducts post-processing via non-maximum suppression E. Confusion Object Inclusion (using YOLOv7)
(NMS) to arrive at its final prediction [20].
We have designed the issue to decrease the frequency of
The industry has seen an increase in the number of YOLO false positives and negatives. This weapon class covers all the
models. As they learn about YOLO and machine Learning, hand weapons, revolvers, and other firearms, which helps train
developers can quickly catch up thanks to the compact the model to enhance the accuracy in low-light and
architecture, while practitioners can power their applications unfavourable angles and provides a trustworthy Real-Time
with the bare minimum of hardware thanks to the real-time solution. This model aims to reduce the confusion between
inference performance. The Architecture of YOLOv7 is items like mobile phones, metal detectors, selfie sticks, purses,
shown in Figure 7. YOLOv7 developers sought to advance etc., that are sometimes confused with pistol classifications.
object identification by creating a network architecture that
outperformed competitors in predicting bounding boxes at V. DATA AND RESULTS
comparable inference speeds. Achieving this, they made We have identified firearms in real-time streams that were
critical adjustments to the YOLO network and training poor quality, dark, and frame-per-second. Since most previous
procedures. It's emphasized to avoid combining SI and CGS work focused on recognizing high-quality photos and videos,
units, preventing confusion and dimensional imbalances in since such models were developed using good-quality
equations. If mixed units are necessary, the recommendation datasets, real-time recognition of low-resolution objects is not
is to explicitly state the units for each quantity in the equation. achievable. Following model training and model testing on the
datasets listed in Table 1, the outcomes are examined.
The outcomes for various approaches are assessed as
stated in the methodology section. Because pistols and
revolvers were utilized in 97% of the robbery incidents, our
key issue statement is real-time detection. As a consequence,
various outcomes for the YOLOv7 technique have been
assessed here.

Figure 7 Architecture of YOLOv7

In essence, YOLOv7 represents a significant stride in


enhancing the accuracy of bounding box predictions,
underscoring the importance of refined network architecture
and meticulous training procedures
D. Training Mechanism
The overall approach taken in training and optimization is
shown in Fig. 3. Starting with issue definition, locating the Figure 9 Training Flow Diagram
necessary dataset, utilising pre-processing techniques, and
A. Dataset Experimentation Results
then training and assessing the dataset are the next steps.
Depending on the accuracy of the evaluation, we keep those For the highest-performing model, mean average precision
(mAP), along with the traditional metrics of F1-score and

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

frames per second, were used to compare the performance of detection in blurred images, and figure 14 shows Pistol image
various models. These terms are derived using equations 1, 2, detection in varied lighting conditions.
and 3 below. The precision-to-recall ratio is the F1 score.
= / + (1)
= / + (2)
1− =2∗ ∗ / + (3)
The model of our method that performs the best overall is
YOLOv7. Fig. 10 displays the YOLOv7 performance graph
for loss and mean average precision (mAP) on a validation
dataset.
Figure 13 Pistol image detection in blurred image

Figure 10 Precision & Recall Metrics

We can observe how smoothly the model loss curve Figure 14 Pistol image detection in varied lighting
converges to the optimal level and how exactly it does so, conditions
producing a very strong loss score of 0.84 and a mAP of
91.73%. The average precision values for the relevant class C. Detection Results - Pistol Class in Video
are averaged to provide the mAP. In this section, we have presented output result of pistol
detection from videos frames. Figure 15 shows Pistol
detection in pre-fed video and Figure 16 shows Pistol
detection in CCTV footage.

Figure 11 F1 Score & Confidence Metrics

Figure 15 Pistol detection in pre-fed video

Figure 16 Pistol detection in CCTV footage


D. Detection Results - Pistol Class in Real-Time
Figure 12 Confusion Matrix for trained dataset In this section we have presented output result of Pistol
detection in Real-Time. We have taken two out in two
different direction as shown in figure 17 Pistol detection in
B. Detection Results - Pistol Class in Images Real- Time (Right Angle) and figure 18 Pistol detection in
In this section, we have provided the output images Real-Time (Left Angle) We may infer from earlier trials that
showing detection of the pistol. figure 13 shows Pistol image

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.
2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR)

the idea made a difference in the modern weapon detection suggests integrating object identifiers with movement and 3-
systems. D position approximation for enhanced recall and accuracy.
Recommendations include limiting the identification of
common items and triggering alarms only for successive
frames with identified weapons, contributing to ongoing
efforts for a more accurate and reliable real-time weapon
detection system.
REFERENCES
[1] Bhatti, M.T., Khan, M.G., Aslam, M., Fiaz, M.J.: Weapon Detection in
Real-Time CCTV Videos Using Deep Learning. IEEE Access 9,
34366–34382 (2021)
[2] Olmos, R., Tabik, S., Herrera, F.: Automatic handgun detection alarm
in videos using deep learning. Neurocomputing 275, 66–72 (2018)
Figure 17 Pistol detection in Real-Time (Right Angle) [3] Xiao, Z., Lu, X., Yan, J., Wu, L., Ren, L.: Automatic detection of
concealed pistols using passive millimeter wave imaging. Proc. IEEE
Int. Conf. Imag. Syst. Techn. (IST) pp. 1–4 (2015)
[4] González, J.L.S., Zaccaro, C., Álvarez García, J.A., Morillo, L.M.S.,
Caparrini, F.S.: Real- time gun detection in CCTV: An open problem.
Neural Netw 132, 297–308 (2020)
[5] de Azevedo Kanehisa, R.F., de Almeida Neto, A.: Firearm Detection
using Convolutional Neural Networks. In: ICAART (2). pp. 707–714
(2019)
[6] Yadav, P., Gupta, N., Sharma, P.K.: A comprehensive study towards
high-level approaches for weapon detection using classical machine
learning and deep learning methods. Expert Systems with Applications
p. 118698 (2022)
Figure 18 Pistol detection in Real-Time (Left Angle) [7] Karp, A.: Estimating global civilian-held firearms numbers (2018)
[8] Reid, A.J.: The gun problem (2022)
E. Discussion [9] Darker, I.T., Kuo, P., Yang, M.Y., Blechko, A., Grecos, C., Makris, D.:
We may infer from earlier trials that the idea made a Automation of the CCTV-mediated detection of individuals illegally
difference in modern weapon detection systems. carrying firearms: Combining psycholog- ical and technological
approaches. Proc. SPIE 7341 (2009)
Fig. 10, 11 and 12 show the various results and metrics of [10] Sheen, D.M., Mcmakin, D.L., Hall, T.E.: Three-dimensional
experimentation on the dataset, namely, precision, recall, and millimeter-wave imaging for concealed weapon detection. IEEE Trans.
F1-score for evaluation. Microw. Theory Techn 49(9), 1581–1592 (2001)
[11] Xue, Z., Blum, R.S., Li, Y.: Fusion of visual and IR images for
Figures 13, 14, 15, and 16 show the detection of various concealed weapon detec- tion. In: Proceedings of the Fifth International
firearms in pre-fed footage in low-light and unfavorable Conference on Information Fusion. FUSION 2002.(IEEE Cat. No.
camera angles. 02EX5997). vol. 2, pp. 1198–1205 (2002)
[12] Blum, R., Xue, Z., Liu, Z., Forsyth, D.S.: Multisensor concealed
For the real-time situation, the object detection model weapon detection by using a multiresolution mosaic approach. In:
YOLOv7 performed admirably in terms of speed and IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall.
detection precision from Fig. 17 and 18. 2004. vol. 7, pp. 4597–4601 (2004)
[13] Upadhyay, E.M., Rana, N.K.: Exposure fusion for concealed weapon
Findings indicate that the optimum approach is to initially detection. Proc. 2nd Int. Conf. Devices Circuits Syst. (ICDCS) pp. 1–6
train in synthetic pictures before training in actual photos for (2014)
fine-tuning. [14] Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning
applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
VI. CONCLUSION & FUTURE WORK [15] Vajhala, R., Maddineni, R., Yeruva, P.R.: Weapon detection in
surveillance camera images (2016)
This study proposes an improved real-time automatic
[16] Pradhan, A., Niaz, S.Y., Pradhan, M.P., Pradhan, R.: DETECTION
weapon detection system for monitoring and command AND RECOGNITION OF TEXTS FEATURES FROM A
applications, addressing challenges related to distance- TOPOGRAPHIC MAP USING DEEP LEARNING. Suranaree Journal
dependent accuracy. The research aims to enhance security, of Science & Technology 29(5) (2022)
promoting economic benefits by attracting security-conscious [17] Hung, C.P., Choi, J., Gutstein, S.M., Jaswa, M.S., Rexwinkle, J.T.:
investors and visitors. Object detection algorithms utilizing Soldier-led adaptation of autonomous agents (SLA3). In: Artificial
Region of Interest (ROI) outperformed those without, with the Intelligence and Machine Learning for Multi-Domain Operations
Applications III. vol. 11746, pp. 743–754 (2021)
YOLOv7 model, trained on a new database, demonstrating
[18] Doan, T.S., Nguyen, T.K.T., Vo, T.A.: Weapon Detection with YOLO
exceptional results. It achieved a mean average precision Model Version 5, 7, 8 (2023)
(mAP) of 87.3%, an F1-score of 91%, and a confidence score [19] Kumar, S., Kumar, C.: Deep Learning based Target detection and
of nearly 98%, surpassing previous real-time studies. Recognition using YOLO V5 algorithms from UAVs surveillance
feeds. In: 2023 International Conference for Advancement in
The researchers prioritized real-time weapon detection Technology (ICONAT). pp. 1–5 (2023)
with minimized false positives and negatives, utilizing a new Li, P., Che, C.: SeMo-YOLO: a multiscale object detection network in
training database and the latest deep learning model. Future satellite remote sensing images. In: 2021 International Joint
work focuses on further reducing false positives and Conference on Neural Networks (IJCNN). pp. 1–8 (2021)
negatives, possibly expanding to more classes. The study

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:28:42 UTC from IEEE Xplore. Restrictions apply.

You might also like