0% found this document useful (0 votes)
6 views

Machine Learning Software For The Detect

Uploaded by

hemanath10042003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning Software For The Detect

Uploaded by

hemanath10042003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Journal of e-ISSN: 2581-3803

Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

Machine Learning Software for the Detection of Violence from CCTV Live
Footage
Adupa Nithin Sai1, Kowdodi Siva Prasad2*
1
Under Graduate Student, Department of Computer Science and Engineering (AIML), Hyderabad
Institute of Technology and Management, Hyderabad, Telangana, India
2
Professor, Department of Mechanical Engineering, Hyderabad Institute of Technology and
Management, Hyderabad, Telangana, India
*
Corresponding Author: [email protected]

Received Date: September 20, 2023 Published Date: September 30, 2023

ABSTRACT Real-time CCTV stream analysis, Real-time


Automated CCTV analytics have become a video, Report generating, Security cameras,
potent resource for boosting public safety. Video analytics, Video uploaded
The software program, which has
sophisticated video analytics capabilities, is
introduced in this paper. The software INTRODUCTION
program can instantly identify situations and
notify concerned agencies such as the police Modern society places a high
and medical staff. Software processes video in importance on public safety, and the application
three different ways: live streaming from of security measures has been revolutionized by
external links, video uploaded from a the employment of cutting-edge technologies.
computer, and real-time video from security The software program that uses automated
cameras. Using past data for analysis, it scans CCTV analytics to improve public safety is one
CCTV feeds from public spaces for instances innovative option. The software program can
of violent crime, theft, burglary, and other analyse CCTV feeds in real-time using cutting-
crimes. The software’s key features are real- edge video analytics skills to detect crimes such
time CCTV stream analysis, alarm and report as robbery, street crime, violence, and more [1].
generating and database updating for prompt The capacity of this software program to
response by the nearest police station. This function in three different video processing
software has the potential to significantly modes uploading footage, live streaming
improve public safety measures by enabling utilising third-party links, and analysing real-
proactive incident detection and real-time time video from CCTV cameras sets it apart
response, which will improve law from other similar products. The software
enforcement operations. The software program's reliance on historical data for analysis,
Program boasts key features such as real-time which it is always upgrading and refining,
CCTV stream analysis, the instantaneous enables it to be more accurate and successful at
generation of alarms and reports, and detecting problems. Key aspects of the software
seamless database updates accessible to the program include real-time analysis of live CCTV
nearest police station for immediate response. feeds, the creation of warnings and reports, and
This software's potential to significantly the upkeep of a sizable database for prompt law
enhance public safety measures lies in its enforcement response. This makes it possible to
capacity to enable proactive incident respond quickly and coordinate with pertinent
detection and real-time responses, promising agencies, such as the police and medical staff,
to optimize law enforcement operations and ensuring prompt intervention in urgent
overall security. situations.
The software program harnesses AI and
Keywords- Automated, Past data, Public safety, ML techniques to instantly analyse huge

12 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

volumes of CCTV video data to spot patterns, detection is equally vital in applications
abnormalities, and possible problems using such as crowd management and
complex algorithms and neural networks. The surveillance, where distinguishing between
software program continuously learns from normal and deviant behaviour is required. A
historical data, increasing its precision and lot of studies have proposed deep learning-
efficacy in recognizing problems and responding based nonviolence detection systems.
to shifting settings and scenarios [2]. This According to one recent study, a CNN is
cutting-edge AI and ML-powered analysis used to extract spatiotemporal data from
enables the software program to recognize issues video frames, which are then fed into a
like abandoned objects, suspicious activities, support vector machine (SVM) for
overcrowding, and odd behaviour, among others classification. On a benchmark dataset, the
and provides vital insights for law enforcement system attained an accuracy rate of 92%,
agencies to take prompt and informed action. Its demonstrating the usefulness of the
capabilities are also significantly influenced by suggested approach [4].
big data analytics to detect trends, patterns, and  Frame-Based Violence Detection: Using
correlations by analyzing massive amounts of Inflated 3D Convolutional Neural Network
video data from several CCTV cameras that may (I3D CNN) frame-based violence detection
not be obvious to human operators. systems identify violent events by
The software program represents a analyzing individual frames in a video clip.
ground-breaking approach to enhancing public One recent study offered a frame-based
safety through automated CCTV analytics, as strategy that extracts spatio-temporal data
well as a powerful fusion of advanced from video frames using the I3D algorithm.
technologies, such as AI, ML, computer vision, The collected features are subsequently
and big data analytics, to reform public safety classified using a Long Short-Term
through automated CCTV analytics. Memory (LSTM) network. On a benchmark
dataset, the suggested system attained an
LITERATURE REVIEW accuracy rate of 94.6%, exceeding existing
state-of-the-art algorithms [5].
 Video-based Violence Detection: To  Deep Learning and Transfer Learning
identify violent events, video-based for Violence Detection: Another recent
violence detection systems use computer work presented a violence detection system
vision techniques to analyze the visual that uses a CNN-based architecture with
aspects of video frames. Motion, colour, transfer learning and achieves an accuracy
texture, and shape are examples of these rate of 95.26% [6].
characteristics. Convolutional Neural  Multi-Modal Deep Learning for Violence
Networks (CNNs) and Recurrent Neural Detection: A recent study proposes a multi-
Networks (RNNs) are two deep learning- modal deep learning strategy for violence
based techniques that have been proposed detection that incorporates visual and audio
for video-based violence detection. One information from video material. The study
recent work presented a two-stage extracted spatiotemporal and auditory
technique for violence detection in which a features using a combination of CNN and
CNN first extracts spatiotemporal LSTM networks, with an accuracy rate of
properties from video frames, which are 92.3%. The suggested approach
subsequently input into an RNN for outperformed utilizing solely visual or
classification. On a benchmark dataset, the audio characteristics, indicating the efficacy
system attained a high accuracy rate of of multi-modal deep learning for violence
95%, confirming the usefulness of the detection [7].
proposed approach [3].
 Non-Violence Detection: Non-violence

13 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

METHODOLOGY

Figure 1: Working methodology.

CHRONOLOGICAL SEQUENCE recognition techniques may be used to


identify items, activities, and individuals in
The methodology's sequential processes video streams.
are detailed in Fig. 1, as well as the information  Threshold Layered Approach: In addition
flow between each activity's steps during a to AI and ML algorithms, the software
CCTV video footage analysis as shown in Fig. 2. program may include a threshold-layered
 Data Collection: The initial phase in the approach to improve the identification of
process is to collect CCTV video feeds patterns, abnormalities, and potential
from public locations, including live problems in real-time video inputs. Setting
streaming from external links, video specified thresholds or limitations for
supplied from a computer, and real-time specific metrics or features collected from
footage from security cameras. The video inputs is part of the threshold layered
information gathered should encompass a technique. These thresholds function as
wide range of scenarios, such as diverse filters or layers, allowing or blocking
places, periods, and types of crimes, such as specific inputs based on their levels. Inputs
violent crime, theft, burglary, and other that exceed or fall below these limits are
crimes. marked as potentially problematic or
 Data Preprocessing: To ensure the quality abnormal.
and usability of the acquired data for  AI and ML Analysis: The retrieved
analysis, it must be pre-processed. This features are then fed into the AI and ML
could include chores like data cleaning, algorithms included within the software
normalization, and integration. To minimize program. These algorithms use complicated
bias in the study, any irrelevant or algorithms and neural networks to spot
redundant data should be deleted, and patterns, abnormalities, and potential
missing data should be handled carefully. problems in real-time video inputs. The AI
 Feature Extraction: This process extracts and ML algorithms are always learning
useful features or characteristics from from prior data to enhance their accuracy
CCTV video streams to be utilized as inputs and efficacy in detecting difficulties.
for the video analytics routine. Object  Alarm and Report Generation: When the
detection, motion detection, and facial software program detects a potential

14 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

concern, such as a violent crime or shape could be (num_frames, height, width,


suspicious activity; it generates alerts and channels), where num_frames represent the
reports to contact relevant agencies, such as number of frames in a video, height and
law enforcement and medical personnel. To width represent the spatial dimensions of
enable rapid response, alarms and reports each frame, and channels represent the
should be created in real-time and provide number of colour channels (e.g., 3 for RGB
complete information about the detected images).
issue, including the location, time, and kind  Down-Sampling via Convolution:
of crime. Convolutional layers with appropriate
 Database Maintenance: The software filters, activation functions, and pooling
program should keep a thorough database to operations are used to down-sample the
save and update past data utilized for input data spatially and sometimes
analysis. To improve the quality and temporally, depending on the modality.
effectiveness of the analysis, the database This can help capture local information and
should be periodically updated with fresh minimise the complexity of the supplied
data. To ensure data security and privacy, data.
proper data management strategies such as  Mixed (Inception) Blocks: Inception
data encryption and access control should blocks are a sort of multi-branch
be applied. architecture that concatenates the outputs of
 Evaluation and Improvement: The various convolutional processes with
software program's performance should be differing filter sizes. These mixed blocks
examined regularly to determine its can assist in catching features at multiple
effectiveness in improving public safety. scales and improve the network's
This could include comparing detected representation capability.
incidences to actual crime records and  Down-Sampling (spatial and temporal):
monitoring the reaction time of the relevant Continue using down-sampling layers with
agencies. Based on the evaluation results, larger strides, such as pooling or
adjustments should be made to improve convolutional layers, to lower the spatial
their capabilities. and temporal dimensions of the input data.
 Integration with Existing Law  Classifying: Finally, add fully connected
Enforcement Operations: The software layers, flatten the data, and classify using
program should be integrated with existing SoftMax or other appropriate activation
law enforcement operations to allow for a functions. The number of units in the output
quick and coordinated reaction to suspected layer should be proportional to the number
occurrences. This may entail establishing of classes in your violence detection task.
communication lines with police and  Create the Model: Using a deep learning
medical personnel for instant notice, as well framework such as Tensor Flow or
as offering real-time access to the software PyTorch, build the CNN architecture by
and program's alerts and reports to allow for stacking the various layers as mentioned
quick decision-making. above. Define the proper hyperparameters
for training the model, such as learning rate,
RESULTS AND DISCUSSION batch size, and loss function.
 Load Weights: If you have access to pre-
A CNN architecture used for violence trained weights, you can load them into the
detection, using mixed (inception) blocks and model to initialize the weights of the
down-sampling layers results in 7 different convolutional layers. This can help the
layers to build a working model. Details are model perform better, especially when
given below training data is few.
 Determine the Right Input Shape:  Violence Detection Model: Once the CNN
Depending on the modality of the violence architecture has been developed, trained,
detection task, define the shape and size of and loaded with appropriate weights, it may
the input data, which could be pictures, be used as a violence detection model to
audio frames, or video frames. If you use input new data and predict whether or not
video frames as input, for example, the the input data contains violence [2].

15 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

Figure 2: CCTV footage for analysis.

PARAMETERS USED accuracy and reliability.


 Thresholding: Use thresholding techniques
 Frames Per Second (FPS): The rate at to identify regions that may contain blood
which video frames are collected, which in the collected characteristics. Depending
influences the video's temporal resolution. on the features of the blood in the video
Higher frame rates can record more detailed frames, this can be accomplished using
motion data, but they may also raise approaches such as pixel intensity
computational needs. thresholding or motion-based thresholding.
 Audio Characteristics: Pitch, intensity,  Object Detection: Object detection in
and spectral elements taken from the video's OpenCV (cv2) is based on visual
soundtrack can provide additional characteristics for real-time, accurate
information for violence identification. detection of things, including metal objects.
Aggressive or violent behaviour frequently  Humans Moments Detection: Detecting
results in distinct sounds, such as yells, the number of humans present and their
shouts, or weapon hits, which might be moments and distance between them to get
symptomatic of violence. accurate violence.
 Image Pre-Processing: Enhance the
relevant features of the acquired images by TECHNOLOGIES USED
applying image scaling, normalization,
filtering, or denoising procedures. This can  Backend Technologies Include
help increase the detection of metal objects' Python, Neural Networks, and OpenCV

16 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

 Frontend Development often span numerous frames in a film.


HTML, CSS, JavaScript The inflation strategy, which allows the model to
 Technologies for the Framework use pre-trained 2D CNNs, is a significant
Fast API: Fast API is a modern web framework innovation of the I3D algorithm. By duplicating
for creating Python APIs. It is well-known for its the learned weights along the temporal
speed and simplicity, making it a popular choice dimension, the 2D CNNs are "inflated" to 3D,
for creating backend APIs. Fast API can be used allowing the model to transfer knowledge learnt
to expose violent and non-violence detection from large-scale picture datasets to video-based
models as APIs, allowing front-end applications action identification tasks.
or other services to access them. Classification: After extracting spatial and
Bootstrap: Bootstrap is a popular front-end temporal information, the I3D model performs
framework that offers pre-designed UI classification using fully connected layers. The
components and styles for creating responsive characteristics are flattened and put into fully
web applications. It can be used to build user- connected layers, which are then activated with a
friendly interfaces for exhibiting the results of SoftMax activation function to generate
violence and non-violence detection, such as probabilities for various action classes. The
displaying video streams, visualizing identified identified human motion is the action class with
objects or actions, and allowing for user the highest probability.
interactivity. Training and Optimization: The I3D model is
 Algorithm Used to Detect Human Motion trained using labelled video datasets, with each
video tagged with an action label. To minimize
Working of Algorithm: I3D (Interactive three- prediction error, the model's weights are
dimensional) is a well-known deep learning optimized during training using techniques such
method for video-based action recognition, as gradient descent. To attain optimal
which includes human motion detection. It is a performance, the model's hyperparameters, such
3D extension of the 2D Convolutional Neural as learning rate, batch size, and regularization,
Networks (CNNs), capturing both spatial and are tuned [1].
temporal characteristics from video input.
Preparing the Video Data: The first step in CONCLUSION
using the I3D algorithm is to prepare the video
data. Videos are often encoded as frame Leveraging advanced technologies and
sequences, with each frame containing a picture. approaches from the above technologies, such as
The frames are scaled to a constant resolution hardware independence and browser
before being transformed into an RGB (Red, independence, has the potential to produce
Green, and Blue) image sequence. software solutions capable of effectively
Feature Extraction: The I3D algorithm extracts reducing violence in modern society. We can
features from video data using 3D convolutions. expand accessibility and reach a larger audience
3D convolutions work in the same way as classic by designing software that can run on a variety
2D convolutions in image processing, except of hardware devices and web browsers without
they work on 3D volumes rather than 2D limitations. This allows for the deployment of
images. To capture spatial characteristics from violence prevention programs, instructional
different scales of motion patterns, the I3D resources, and intervention tools across several
model employs a sequence of 3D convolutional platforms, allowing them to reach more
layers with varying kernel sizes [1]. individuals independent of their device or
Temporal Modeling: The I3D model browser preferences. Through early detection,
incorporates temporal features in addition to intervention, and preventive initiatives, such
spatial features by modeling motion patterns software can empower individuals, communities,
across frames. This is accomplished by and organizations to proactively address and
employing 3D convolutional layers with a bigger lessen violence. However, while developing and
temporal kernel size, allowing the model to deploying such software, it is critical to examine
capture the dynamics of motion over time. This ethical and privacy issues, assure inclusivity, and
aids in the detection of human motion patterns prioritize user safety. Technology may play a
such as walking, running, and punching, which critical role in creating a safer and more peaceful
society with careful planning, collaboration and

17 Page 12-18 © MAT Journals 2023. All Rights Reserved


Journal of e-ISSN: 2581-3803
Image Processing and Artificial Intelligence Volume-9, Issue-3 (September-December, 2023)

www.matjournals.com https://doi.org/10.46610/JOIPAI.2023.v09i03.002

invention. https://doi.org/10.1109/ICOMET.2019.867
REFERENCES 3496
5. N Honarjoo, A Abdari and A Mansouri
1. Google-deepmind/Kinetics-i3d, “I3D (2021). Violence detection using pre-
models trained on Kinetics”, [Online] trained models. 2021 5th International
Available at: https://github.com/google- Conference on Pattern Recognition and
deepmind/kinetics-i3d Image Analysis (IPRIA). IEEE, Available
2. I Kennedy Ihianle, A O. Nwajana, S Henry at:
Ebenuwa, et al (2020). A deep learning https://doi.org/10.1109/IPRIA53572.2021.9
approach for human activities recognition 483558
from multimodal sensing devices, IEEE 6. P Sernani, N Falcionelli, S Tomassini, et al
Access, 8, 179028-179038, Available at: (2021). Deep learning for automatic
https://doi.org/10.1109/ACCESS.2020.3027 violence detection: Tests on the AIRTLab
979 dataset, IEEE Access, 9, 160580-160595,
3. İ Üstek, J Desai, I López Torrecillas, et al Available at:
(2023). Two-stage violence detection using https://ieeexplore.ieee.org/stamp/stamp.jsp?
ViTPose and classification models at smart arnumber=9627980
airports, arXiv, Available at: 7. B Peixoto, B Lavi, P Bestagini, et al (2020).
https://doi.org/10.48550/arXiv.2308.16325 Multimodal violence detection in videos.
4. G Mehdi, N Ali, S Hussain, et al (2019). ICASSP 2020 - 2020 IEEE International
Design and fabrication of automatic single Conference on Acoustics, Speech and
axis solar tracker for solar panel. 2019 2nd Signal Processing (ICASSP). IEEE,
International Conference on Computing, Available at:
Mathematics and Engineering Technologies https://doi.org/10.1109/ICASSP40776.2020
(iCoMET). IEEE, Available at: .9054018

CITE THIS ARTICLE

Adupa Nithin Sai and Kowdodi Siva Prasad, Machine Learning Software for the Detection of
Violence from CCTV Live Footage, Journal of Image Processing and Artificial Intelligence,
9(3), 12-18.

18 Page 12-18 © MAT Journals 2023. All Rights Reserved

You might also like