0% found this document useful (0 votes)
29 views5 pages

1violence Detection in Real Life Videos Using Deep Learning6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

1violence Detection in Real Life Videos Using Deep Learning6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) | 978-1-6654-9400-7/23/$31.

00 ©2023 IEEE | DOI: 10.1109/ICAECT57570.2023.10117775

Violence Detection in Real Life Videos using Deep


Learning

Bhaktram Jain Aniket Paul P Supraja


Department of Networking and Department of Networking and Department of Networking and
Communications Communications Communications
SRM Institute of Science and Technology SRM Institute of Science and Technology SRM Institute of Science and Technology
Kattankulathur, India Kattankulathur, India Kattankulathur, India
[email protected] [email protected] [email protected]

Abstract — These days, it is essential to avoid or identify violence Human behaviour has lately been studied using deep learning
as soon as possible because it is spreading in an unpredictable way. and computer vision. Signal analysis allows one to quickly
Violence must be identified on real-time films taken by numerous identify a pattern that results from illegal thought. It hasn't been
surveillance cameras at all times and in all locations, which makes done yet since it isn't technologically feasible. With the use of
it difficult to perform. It might be time-consuming to continuously
deep learning based computer vision, we can now quickly
monitor CCTV surveillance cameras, thus it is essential to
automatically spot any unusual activity. As soon as violent identify aggressive behaviour in public places. Currently, most
behaviours take place, it should be able to make a trustworthy companies and organizations use CCTV systems.
real-time detection and notify the appropriate authorities. The Understanding violence and putting a stop to its harmful
dataset contains both violence and non-violence videos from real consequences, the government or policy-making authority may
life situations. A methodology for detecting violence has been find it useful to use effective violence detection tools. All
presented by us that uses a network similar to the U-NET with the humans and members of society want safe streets,
encoder mobilenetv2 to extract spatial features before moving on neighbourhoods, and workplaces. Explicit feature engineering
to an LSTM block for the extraction of temporal features and is not a part of deep learning. Machine learning cannot compete
binary classification. The results of the trial revealed that the
with it.
precision is 95% and the accuracy is 94% utilising a dataset based
on real life situations. The recommended model uses minimal
computer resources while yet producing useful results. II. THE CHALLENGES OF VIOLENCE DETECTION IN VIDEOS

Keywords — Violence Recognition, Intelligent Video Violence detection has garnered interest from a wide range of
Monitoring, U-NET, LSTM, Computer Vision. people and has made significant advancements, but the
detection algorithms are still in their infancy, and there aren't
I. INTRODUCTION any now that work in every situation. Additionally, there are
still a lot of significant issues in this subject that need to be
The prevalence of violence in daily life has long been resolved. The following factors primarily highlight the
considered one of the main problems. Any society's peace and research's challenges:
harmony are quickly destroyed by it. However, there was a
significant decrease in crime between 2014 and 2017. A. Changes in the Background Environment
However, it began to climb once more in 2017. In 2017–18, The main challenge in a variety of computer vision
there was a rise of 6.79%. Numerous variables contribute to applications might be attributed to the influence of variables
violent behaviour in public places. The root causes of violence
like backdrop environment. The variety of viewpoints is
include societal and economic instability, together with
personal greed, anger, and hatred. To address this issue, primarily present. Different two-dimensional representations
violence that is either anticipated or unanticipated has to be can be produced from the same movement when it is seen from
discovered early and stopped as soon as it is practical. various angles; The reciprocal occlusion between individuals
and between individuals and the background makes feature

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:02:33 UTC from IEEE Xplore. Restrictions apply.
extraction challenging. Other impacting elements include Over the past several years, a number of sites have been
dynamic environmental changes, crowded backdrops, equipped with these cameras and other surveillance equipment
adjustments to the lighting, and low-resolution photos. for the purpose of maintaining crime prevention. The
importance of violence detection in the video cannot be
B. Differences Between Classes and within Classes overstated because of the wide range of applications it has,
Even the same action can have distinct expressions for the including improving citizen security, preventing youngsters
majority of acts. Running, for instance, can be done in a variety from acting violently, identifying threats, reducing first
of background settings, at various speeds ranging from slow to responders' response times, etc. Therefore, it is crucial to
fast, and with various step lengths. Some non-periodic examine violent behaviours in people utilising surveillance
activities, such as running during a battle and escape, which is footage. As a result, this section addresses the various
visibly different from the regular periodic running, have similar approaches and strategies employed in earlier studies with a
effects to other acts. It is evident that there are numerous action particular emphasis on how to spot violent behaviour in
types and variations, which makes the study of behaviour surveillance footage. The detection of violent actions from
recognition quite challenging. Additionally, various people's video collections has recently benefited significantly from the
performance for the same action can vary substantially. introduction of various artificial intelligence techniques. To
further explain the advancement we divided this section into
C. The Complexity of the Target Subject two pieces to better reflect our study on the diagnosis of violent
The behaviour can be classified as simple solitary behaviour.
behaviour, interactive behaviour, or group behaviour
depending on how many subjects are involved. Simple action In a noteworthy work by Deniz et al. [1], rapid movement styles
recognition is often a behaviour of a single person. such as were created by applying random transformations to the load
sprinting, jumping, and waving. Hugging and touching have spectrum of subsequent video frames, they were subsequently
entailed the interaction of two different people, which are utilised as the major ingredient of the model to pinpoint violent
slightly more complex behaviours. Fighting and other violent behaviour. Studies show a 12% improvement in accuracy when
behaviours are frequently created with two or more persons. compared to state-of-the-art techniques for detecting violent
The complexity of the interactions between several issues situations.
likewise gets more complicated. Using temporal and multi-modal data, Penet et al. [2]
investigated the various Bayesian learning techniques for a
D. Complex Movement Patterns violent detection.
The speed and direction of the free movement are typically
the only two factors that determine the movement style. Given Suarez et al. [3].'s investigation of the effectiveness of
that violent behaviour frequently takes place in a little amount classifiers for machine learning to identify a brief combat based
of time and that the violent subject moves at a rapid pace, which on three video datasets. In a separate work, Fu et al. [4]
also causes rapid changes in direction and confusion in the presented a minimal computation method for automatically
activities of various subjects, modelling violent behaviour's detecting fights and the optical flow and BoW techniques were
motion patterns is extremely challenging. In response to the used as the foundation for the study's two feature extraction
aforementioned issues and challenges with violence detection, models.
we review numerous efficient violence detection algorithms
and research current approaches. Below, these techniques will Using spatial-temporal characteristics, a deep learning three
be thoroughly explained. stage violence identification system was recently suggested by
Ullah et al. [5] in investigations. Three publicly accessible
III. RELATED WORK datasets were analysed, and experimental results indicate that
the Hockey Fights dataset produced the best results in terms of
Security camera footage is examined employing violence accuracy.
identification techniques derived from computer vision. Over
the past several years, a number of sites have been equipped For the purpose of identifying aggressiveness in video
with these cameras and other surveillance equipment for the characteristics, a CNN-LSTM conjunction was innovated. The
purpose of maintaining crime prevention to keep an eye on works use an LSTM derivative by Sumon et al. [6] and [7,8,9]
people's movements in areas like schools, hospitals, banks, to categorise gathered traits as violent or non-violent.
marketplaces, and streets. Identifying whether a person's Localization of spatiotemporal information included in the
behaviour is acceptable or questionable is a component of video enables local motion analysis by combining CNN and
behaviour analysis. LSTM.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:02:33 UTC from IEEE Xplore. Restrictions apply.
in Figure 2, comprises 23 convolutional layers, works with very
IV. PROPOSED MODEL few training samples, and offers improved performance for
segmentation tasks. It permits the simultaneous use of global
location and context.

Figure 2. U-Net and MobileNet V2 network model

As a result, the second stage receives a fresh queue of frame


characteristics to extract temporal information from. The
sequential information between successive video slices is
obtained using LSTM. Using this information, violent or non-
violent incidents are classified using a 2 layer classifier with
dense layer. Figure 1 depicts the intended model's architecture.
The predictor for the method we employed was MobileNet V2,
a straightforward modern classifier for extracting spatial
features.

Table 1. TOTAL PARAMETERS

Figure 1. Proposed model architecture diagram


The binary cross-entropy loss function was chosen as the error
function in this study because the successfully performs the
Reducing computing difficulty for implementation on low end task of binary classification for the dataset containing videos.
devices is the suggested model's main goal while retaining The equation is shown below:
performance comparable to cutting-edge violence detection
techniques.
The proposed model first employs an extractor of spatial and
temporal features, then categorization. The U-NET design, seen

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:02:33 UTC from IEEE Xplore. Restrictions apply.
In this instance, yi stands for the label or class.
y^ stands for the projected probability of the data.

In summary, MobileNet V2 has improved accuracy while using


fewer computations and learning parameters. As seen in Figure
2, we added MobileNet V2 to the feature extractor that
resembles U-Net [10,11]. The encoder used by the model was
previously trained using the Imagenet dataset. As a result of the
frames' unlabeled geographical data, training efficacy is
increased. Most information about violence is transitory and
visible in action rather than still images. The locations where
the scenes in the security camera movie take place also vary
greatly. The difficulty of connecting the development of these
traits into aggressive behaviour over time is decreased by
providing a feature extractor that is effective and efficient
before training.

V. EXPERIMENTS AND RESULTS

A. Dataset

Real-Life Violence Situations Dataset, which has 2000 video


clips, is one of several datasets that are accessible; nonetheless,
in our study, we use it: Security cameras in diverse real-world
scenarios gathered thousand violent and thousand non-violent
films. Frequent clips and equal amounts of violence were
chosen as the training approach. The models' input comes from
the video data that is frame-by-frame data obtained from the
videos.

B. Results

In this part, we look at how well the suggested model performs


when identifying and categorising violent and non-violent
films. The trials showed that the suggested model performs well
and is relatively simple and quick.

The suggested violence detection model's performance was


validated using averages of the following:

Figure 3. Accuracy of MobileNetV2 during training and


validation in real world violent situations

The indicated design is operationally lightweight and


nevertheless produces good results, as evidenced by trials
utilising a complex real-life violent scenarios dataset that
exhibited an accuracy of 95.69% and precision of 94%. Figure
4 shows an analysis and evaluation of the performance of our
suggested model over the dataset using the evaluation
measures.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:02:33 UTC from IEEE Xplore. Restrictions apply.
ACKNOWLEDGMENTS

We would like to express our profound gratitude to Dr.


Supraja P. an Associate Professor in the SRM Institute of
Science and Technology's Department of Networking and
Communication, for providing us with the opportunity to work
on our project under her direction. She encouraged us to go into
the academic fields that piqued our interest while giving us the
freedom to do so,
REFERENCES

[1] O. Deniz, I. Serrano, G. Bueno and T. -K. Kim, "Fast violence detection
in video," 2014 International Conference on Computer Vision Theory and
Applications (VISAPP), 2014, pp. 478-485.
[2] C. Penet, C. -H. Demarty, G. Gravier and P. Gros, "Multimodal
information fusion and temporal integration for violence detection in
movies," 2012 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), 2012, pp. 2393-2396, doi:
10.1109/ICASSP.2012.6288397.
[3] Gracia, I. S., Suarez, O. D., Garcia, G. B., & Kim, K. (2015). Fast Fight
Detection. PLOS ONE, 10(4), e0120448.
https://doi.org/10.1371/journal.pone.0120448
[4] Eugene Yujun Fu, Hong Va Leong, Grace Ngai, and Stephen Chan. 2016.
Figure 4. Confusion Matrix and Classification report Automatic Fight Detection in Surveillance Videos. In Proceedings of the
14th International Conference on Advances in Mobile Computing and
Multi Media (MoMM '16). Association for Computing Machinery, New
York, NY, USA, 225–234. https://doi.org/10.1145/3007120.3007129
VI. CONCLUSION [5] F. U. M. Ullah, A. Ullah, K. Muhammad, I. U. Haq, and S. W. Baik,
“Violence Detection Using Spatiotemporal Features with 3D
This study presents a brand-new and useful technique for Convolutional Neural Network,” Sensors, vol. 19, no. 11, p. 2472, May
2019, doi: 10.3390/s19112472.
detecting abusive tendencies in actual surveillance film. The
[6] Sumon, Shakil Ahmed, Raihan Goni, Niyaz Bin Hashem, Tanzil Shahria,
suggested model uses a network similar to the U-NET with the and Rashedur M. Rahman. "Violence detection by pretrained modules
encoder mobilenetv2 to extract spatial features before moving with different deep learning approaches." Vietnam Journal of Computer
on to an LSTM block for the extraction of temporal features Science 7, no. 01 (2020): 19-40.
[7] Soliman, Mohamed Mostafa, Mohamed Hussein Kamal, Mina Abd El-
and binary classification. The design of the model makes it Massih Nashed, Youssef Mohamed Mostafa, Bassel Safwat Chawky, and
algorithmically light and speedy. Using a dataset of real-world Dina Khattab. "Violence recognition from videos using deep learning
violent incidents, five folds of cross-validation were carried techniques." In 2019 Ninth International Conference on Intelligent
out. By utilising the usefulness of our proposed system may Computing and Information Systems (ICICIS), pp. 80-85. IEEE, 2019.
[8] Butt, Umair Muneer, Sukumar Letchmunan, Fadratul Hafinaz Hassan,
potentially be improved using technologies of other smart grid Sultan Zia, and Anees Baqir. "Detecting video surveillance using VGG19
installations, which will speed up the reaction time of larger convolutional neural networks." International Journal of Advanced
systems operated by people inside of dwellings and other Computer Science and Applications 11, no. 2 (2020).
intricate structures. (such as factories, mines, parking lots, and [9] S. Sudhakaran and O. Lanz, "Learning to detect violent videos using
convolutional long short-term memory," 2017 14th IEEE International
shopping malls), as suggested by Wei et al. [12,13]. Conference on Advanced Video and Signal Based Surveillance (AVSS),
2017, pp. 1-6, doi: 10.1109/AVSS.2017.8078468.
The results of the experiments revealed an average precision of [10] Vijeikis, R., Raudonis, V., & Dervinis, G. (2022). Efficient Violence
Detection in Surveillance. Sensors (Basel, Switzerland), 22(6).
94% and accuracy of 95.69%. The proposed model, despite
https://doi.org/10.3390/s22062216
being small and computationally cheap, obtained high [11] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov,
accuracy. Our concept is useful for edge devices or time- and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear
sensitive applications. Such technologies can be used in CCTV bottlenecks." In Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 4510-4520. 2018.
surveillance of public locations to safeguard citizens [14]. The
[12] Wei, W., Xia, X., Wozniak, M., Fan, X., Damaševičius, R. and Li, Y.,
presence of weapons and other violent items could be analysed 2019. Multi-sink distributed power control algorithm for cyber-physical-
to evaluate the level of violence. Even though low-light film systems in coal mine tunnels. Computer Networks, 161, pp.210-219.
might be intellectually stimulating to classify, a variety of [13] Wei, W., Song, H., Li, W., Shen, P. and Vasilakos, A., 2017. Gradient-
driven parking navigation using a continuous information potential based
techniques for a approach to identify violent incidents in dim or on wireless sensor network. Information Sciences, 408, pp.100-114.
dark conditions can be explored using image recognition. [14] Patel, Mann. "Real-Time Violence Detection Using CNN-LSTM." arXiv
preprint arXiv:2107.07

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:02:33 UTC from IEEE Xplore. Restrictions apply.

You might also like