0% found this document useful (0 votes)

277 views52 pages

Video Based Fight Detection Using Deep Learning

This document presents a project that aims to develop a video-based fight detection system using deep learning. The project utilizes the Long-term Recurrent Convolutional Network (LRCN) model, which combines Convolutional Neural Networks and Recurrent Neural Networks, to analyze video footage and identify patterns associated with fighting. The system is intended to automate fight detection for surveillance cameras, sending alerts to authorities in real-time. It also includes an alarm mechanism to provide immediate alerts upon detecting a fight. The goal is to enhance public safety and security by enabling timely intervention for violent incidents.

Uploaded by

abdo abdok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

277 views52 pages

Video Based Fight Detection Using Deep Learning

Uploaded by

abdo abdok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

VIDEO-BASED FIGHT DETECTION

USING DEEP LEARNING

Project Report

Submitted in partial fulfilment of the requirements for the

award of the degree of Bachelor of Engineering

Electronics and Communication Engineering

Submitted by

Dawa (02190109) Sonam Drukpa (02180319)

Kuenzang Lhaden (02190117) Sangay Thinley (02190126)

Project Guide:
Guide: Ms. Karma Kelzang Eudon
Co-guide: Mr. Duk Bdr Powdyel

DEPARTMENT OF ELECTRONICS AND COMMUNICATION

ENGINEERING
COLLEGE OF SCIENCE AND TECHNOLOGY
RINCHENDING: PHUENTSHOLING, BHUTAN
June 2023
ROYAL UNIVERSITY OF BHUTAN
COLLEGE OF SCIENCE AND TECHNOLOGY
DEPARTMENT OF ELECTRONICS AND COMMUNICATION

CERTIFICATE

This is to certify that the B.E. project titled “Video-Based Fight Detection using Deep
Learning”, which is being submitted by Mr. Dawa (02190109), Ms. Kuenzang Lhaden
(02190117), Mr. Sangay Thinley (02190126) and Mr. Sonam Drukpa (02180319), the
students of BE Final year Electronics and Communication Engineering, during the
academic year 2019-2023 in partial fulfilment of the requirement for the award of
“Bachelor of Engineering in Electronics and Communication Engineering” as a record of
students work carried out at College of Science and Technology, Phuentsholing under my
supervision and guidance.

Mr. Duk Bdr Powdyel Ms. Karma Kelzang Eudon

Co-guide Project Guide
Acknowledgement

Firstly, our gratitude to the College of Science and Technology and Royal University of
Bhutan for giving us an incredibly valuable opportunity to gain hands-on experience with
practical experiments and theoretical knowledge. We owe the accomplishment of our
project's objectives to the unwavering guidance and support provided by our esteemed
mentors. We would like to extend our gratefulness to Madam Karma Kelzang Eudon,
Lecturer in the Electronic and Communication Department (ECED), and Sir Duk Bdr.
Powdyel, Assistant Lecturer in the same department. Their constant assistance and
invaluable support throughout the project duration were instrumental in our success.

We would also like to extend our appreciation to the Electronics and Communication
Engineering Department, as well as the members of the review panel. Their guidance,
constructive feedback, and valuable suggestions played a significant role in shaping our
project and enabling its successful completion.

Our deep appreciation to Mr. Kuenzang Thinley, the project coordinator, for his consistent
reminders, timely recommendations, and provision of all essential resources.

Finally, we express our gratitude and appreciation to the CST FabLab for their invaluable
assistance in fabricating the system case using 3D printing technology.

Group Members:

Dawa (02190109)
Kuenzang Lhaden (02190117)
Sangay Thinley (02190126)
Sonam Drukpa (02180319)

i
Abstract

The project “Video-Based Fight Detection using Deep Learning” aimed to address the
limitations of traditional surveillance systems in identifying and preventing violent
incidents in real-time. The current reliance on human operators to monitor multiple cameras
has proven to be inefficient and error-prone, resulting in the possibility of missing crucial
instances of violent behavior. This project proposed the utilization of the deep learning
techniques, specifically the LRCN model, to automate the detection of fights in video
surveillance, providing an intelligent system that enhances public safety and security. By
leveraging the LRCN model, which combines the power of CNNs for spatial analysis and
recurrent neural networks (RNNs) for temporal modeling, the system can analyze video
footage and identify patterns of behavior associated with fighting, even in crowded or busy
environments. The project also included implementing an alert mechanism using a GSM
module to notify the relevant authorities upon detecting a fight. Additionally, a spontaneous
alarm system is incorporated to provide immediate alerts when fights are detected. By
automating the detection process, this project can make a valuable contribution to public
safety efforts, enabling timely intervention and proactive measures to combat violence and
maintain security.

Keywords: Deep learning, LRCN, CNN, Recurrent neural networks.

ii
Nomenclature / Terminology

1. Deep Learning Model: A computational model that utilizes deep neural networks to

learn and make predictions.

2. LRCN: Abbreviation for Long-term Recurrent Convolutional Network, a hybrid

architecture that is a fusion of the Convolutional Neural Networks (CNNs) and the

Recurrent Neural Networks (RNNs).

3. Fight Detection: The task of identifying and recognizing instances of physical

altercations or fights.

4. Convolutional Neural Networks (CNN): Deep neural network that specializes in

analyzing and extracting features from visual data, such as images or videos.

5. Recurrent Neural Networks (RNN): Deep neural network designed to analyze

sequential data by considering temporal dependencies and context.

6. Prediction: The output or inference made by the deep learning model regarding the

presence or absence of a fight in a given input.

7. Training: The process of optimizing the parameters and weights of the model using a

labeled dataset to enable accurate predictions.

8. Dataset: A collection of labeled examples used for training and evaluating the deep

learning model's performance.

9. Loss Function: A mathematical function that quantifies the error or discrepancy

between predicted outputs and the true label in the training process.

10. Activation Function: A function used on the output of the layer in a neural network to

make the model more flexible and better at learning. It adds curves and bends to the

data, allowing the model to capture more complex patterns and relationships. This helps

the neural network to understand and process information in a way that is more similar

to how our brains work.

iii
11. Hyperparameters: Configurable settings and parameters that are not learned by the

model during training, but set by the user to control the learning process.

12. Epoch: One complete iteration over the entire training dataset during the training phase.

13. Batch Size: Refers to how many training data are grouped together and processed at

once during each epoch.

14. Overfitting: This phenomenon refers to a situation in which the deep learning model

exhibits superior performance on the training dataset but struggles to effectively

generalize its performance when presented with new or unseen data.

15. Evaluation Metrics: The metrics employed to evaluate the model's performance, such

as accuracy.

16. LSTM: short for long short-term memory, refers to a specific architecture within

recurrent neural networks (RNNs). It was developed to overcome the limitations of

conventional RNN when it comes to capturing long-term dependencies in sequential

data.

17. Kernel: Kernel is a small matrix used for convolutional operations in a CNN.

iv
Abbreviations – Epithets

Abbreviation Description

ANN Artificial Neural Network

AT Attention

CNN Convolutional Neural Network

DL Deep Learning

DNN Deep Neural Network

FC Fully Connected

GPIO General Purpose Input / Output

GPRS General Packet Radio Service

GPU Graphical Processing Unit

GSM Global System for Mobile Communication

IP Internet Protocol

LSTM Long Short-Term Memory

LRCN Long-Term Recurrent Convolutional Network

ML Machine Learning

OS Operating System

Pi Raspberry Pi

RAM Random Access Memory

ReLu Rectified Linear Unit

RNN Recurrent Neural Network

SMA SubMiniature version A

SMS Short Message Service

USB Universal Serial Bus

VNC Virtual Network Computing

WIFI Wireless Fidelity

v
List of Figures

Figure 2.1: Venn diagram to illustrate DL is a subset of ML ....................................... 5

Figure 2.2: Fully connected artificial neural network .................................................. 6
Figure 2.3: CNN architecture ......................................................................................... 7
Figure 2.4: Image convolution ........................................................................................ 8
Figure 2.5: Illustration of a max pooling ....................................................................... 9
Figure 2.6: Recurrent Neural Network ........................................................................ 10
Figure 3.1: Project methodology .................................................................................. 12
Figure 4.1: System block diagram ................................................................................ 15
Figure 4.2: LRCN architecture ..................................................................................... 16
Figure 4.3: Raspberry pi 4B.......................................................................................... 18
Figure 4.4: Pi camera .................................................................................................... 19
Figure 4.5: GSM SIM900 module ................................................................................. 20
Figure 4.6: System flowchart ........................................................................................ 22
Figure 4.7: Circuit design for hardware implementation ........................................... 26
Figure 4.8: Hardware implementation ......................................................................... 26
Figure 4.9: VNC viewer – monitor ............................................................................... 27
Figure 4.10: System case ............................................................................................... 28
Figure 5.1: Dataset distribution ................................................................................... 29
Figure 5.2: Model comparison...................................................................................... 30
Figure 5.3: Accuracy graph of LRCN model ............................................................... 31
Figure 5.4: LRCN model analysis................................................................................. 32
Figure 5.5: Initial system testing result ........................................................................ 33
Figure 5.6: Final system testing result ......................................................................... 33
Figure 5.7: Accuracy of model versus various parameters ........................................ 34
Figure 5.8: Response time versus various parameters ................................................ 35

vi
List of Tables

Table 4.1: Component list and specification ................................................................ 17

Table 4.2: Software required......................................................................................... 20
Table 5.1: Model comparison ....................................................................................... 30
Table 5.2: LRCN analysis .............................................................................................. 32
Table 5.3: Data obtained from system analysis ........................................................... 34
Table 5.4: Reliability of GSM communication ............................................................. 36
Table 5.5: Cost analysis ................................................................................................ 37

vii
Contents
Acknowledgement i
Abstract ii
Nomenclature / Terminology iii
Abbreviations – Epithets v
List of Figures vi
List of Tables vii
CHAPTER 1: INTRODUCTION 1
1.1 Introduction 1
1.2 Problem Statement: 1
1.3 Motivation and Need of the Project 2
1.4 Aim 2
1.5 Project Objectives 2
CHAPTER 2: LITERATURE SURVEY 3
2.1 Introduction 3
2.1.1 Related Work: 3
2.2 Artificial Intelligence (AI) 5
2.2.1 Deep Learning (DL) 5
2.2.2 Convolutional Neural Network (CNN) 6
2.2.3 Layers in CNN 7
2.2.4 Recurrent Neural Networks (RNN) 10
2.2.5 LSTM 11
CHAPTER 3: PROJECT METHODOLOGY 12
3.1 Introduction 12
3.2 Project Methodology 13
CHAPTER 4: DESIGN OF THE PROPOSED SYSTEM 15
4.1 Introduction 15
4.2 System Architecture 15
4.2.1 Long-term Recurrent Convolutional Network-LRCN 16
4.3 Hardware Requirements 17
4.3.1 Raspberry Pi 4B 18
4.3.2 Pi Camera 19
4.3.3 GSM Module 19
4.4 Software Requirement 20

viii
4.4 Development Process and Implementation Details 22
4.4.1 System Flowchart 22
4.4.2 Development Process 23
4.4.3 Implementation 25
4.5 Case Design 27
CHAPTER 5: RESULTS AND ANALYSIS 29
5.1 Introduction 29
5.2 Statistics of Datasets 29
5.3 Comparative Study of Models 30
5.4 Model Evaluation 31
5.5 Hyper-parameter Tuning Analysis 31
5.6 System Testing Result 33
5.7 System Performance Analysis 33
5.8 System Performance and Reliability for the GSM communication 35
5.9 Cost Analysis 36
CHAPTER 6: CONCLUSION AND FUTURE WORK 38
6.1 Conclusion 38
6.2 Future Work and Recommendation 38
REFERENCES 40

ix
CHAPTER 1: INTRODUCTION

1.1 Introduction

Closed Circuit Television (CCTV) is mainly utilized for observing and overseeing in order
to combat crimes. Its primary objective is to reduce criminal activity and social misconduct
while also enhancing security. A closed-circuit television (CCTV) system consists of a
camera mounted remotely without human presence and an operator. The camera records
video footage and sends it to a central monitoring station where the operator watches a
television screen to detect any suspicious activities or gather evidence. Nevertheless, the
operator's capacity to detect suspicious behavior is restricted by the attention they can
dedicate to each video feed displayed on the screen. Given the limited ratio of operators to
screens, it becomes impractical for the CCTV operator to consistently and fully focus on
every video feed, thereby increasing the risk of overlooking certain abnormal activities.
After considering video processing as a potential solution for the problem, it was
determined that utilizing deep learning for video classification and recognition, would be a
more effective approach to solving the problem. Video-based fight detection using deep
learning is an emerging technology that aims to enhance security surveillance by detecting
and alerting security personnel to potential altercations or violent behavior. Traditional
surveillance systems rely on human operators to monitor video feeds, which can be tedious
and error-prone, especially in busy environments. However, with deep learning algorithms
and computer vision techniques, it is now possible to automate the detection of violent
behavior in real-time, providing a more efficient and reliable security solution. This
technology utilizes complex neural networks to analyze video footage and detect patterns
of behavior associated with fighting. By integrating this technology with existing
surveillance systems, security personnel can quickly identify potential threats and take
action to prevent violent incidents from occurring.

In general, the implementation of deep learning for video-based fight detection shows great
promise as a technology that can greatly enhance public safety and security. It has the
potential to become an indispensable tool in combating crime and violence.

1.2 Problem Statement:

Advancement in technology does not ensure prevention of crimes around the world. The
rise in physical altercations and violent incidents in public spaces has become a major

1
concern for law enforcement agencies and public safety organizations worldwide.
Traditional surveillance systems, which rely on human operators to monitor multiple
cameras, have proven to be inefficient in identifying and alerting authorities to potential
fights or acts of violence in real-time. According to the Statistical Yearbook 2020 by Royal
Bhutan Police, Battery accounted 81% of all crimes committed against person and 78% in
2022. In 2022, there were 4327 persons arrested for various crimes. Of the 4327 people
arrested, 1038 people were arrested for battery. Similarly, 995 individuals in 2020 and 981
in 2019 were arrested for indulging in battery crime (RBP STATISTICAL, 2022). Hence, in
order to enhance the effectiveness of CCTV monitoring and surveillance, it is necessary to
automate the process of detecting suspicious activity in video surveillance. To that solution,
this project aims at automatic detection of fights based on video using deep learning
technology.

1.3 Motivation and Need of the Project

The project “Video-Based Fight Detection using Deep Learning” is to develop an

intelligent system that can help prevent violence and enhance public safety and security.
The need for such a system arises from the increasing incidents of violence and crime in
public places. Traditional methods of detecting and preventing violence, such as CCTV
cameras and human security personnel, have limitations in terms of accuracy and
efficiency. This is where video-based fight detection using deep learning comes in as an
innovative and promising solution. The project's primary objective is to use deep learning
techniques to analyze video footage and identify potential violent incidents in real-time.
Overall, the project's motivation is to develop a technology that can contribute to public
safety by detecting and preventing violent incidents in real-time.

1.4 Aim

 To design a system for detecting fights and an alert system using Deep Learning.

1.5 Project Objectives

The objectives of the project are:
 To detect fights.
 Design a system, which alerts the concerned authority upon detecting the fights
using GSM module.
 A spontaneous alarm system upon detection of the fights.

2
CHAPTER 2: LITERATURE SURVEY

2.1 Introduction

Given the increasing emphasis on safety and security, the exploration of intelligent systems
to identify violent behavior has become a critical domain of investigation. In this review of
relevant literature, we will examine the latest progress in utilizing deep learning techniques
for fight detection in security surveillance.

2.1.1 Related Work:

Several researchers have studied the problem of fight detection using different approaches.
Some of the notable works in this area are discussed below:

“Vision-based Fight Detection from Surveillance Cameras” by Akti et al. (2019)

Recognizing actions based on visual information posed a challenging area of exploration

within the computer vision and the pattern recognition. The objectives of the study was to
swiftly and precisely identify fight incidents captured by surveillance cameras. To
accomplish this objective, the researchers introduced a technique that combined attention
layers and Bi-LSTM networks. This approach significantly enhanced the accuracy of
detection and demonstrated promising outcomes. Furthermore, leveraging a pre-trained
Fight-CNN for feature extractions proved to be effective during experiments conducted on
the surveillance camera datasets. The results demonstrated that the approached method,
integrating the Xception model and Bi-LSTM, surpassed existing methods in successfully
identifying fight scenes.

“Real-Time Surveillance Using Deep Learning” by Iqbal et al. (2021)

The suggested approach involved employing quadcopter surveillance and video streaming
to identify anomalies within received video streams using deep learning models. The
researchers made adjustments to a widely recognized FasterRCNN algorithm to streamline
the initial feature extraction process and facilitate rapid learning. They assessed the
performance of four distinct CNNs, namely GoogleNet, ResNet-18, ResNet-50, and
SqueezeNet, for detecting relevant objects in surveillance images. The FasterRCNN
algorithm based on ResNet-50 attained the highest average accuracy, establishing itself as

3
a solution for the detection of threat. The system achieved an average accuracy of 79%
across all categories.

“Human Violence Detection Using Deep Learning Techniques” by Arun Akash et al.
(2022)

The detection of moving objects from CCTV footage was regarded as a highly impactful
computer vision algorithm. This research employed deep learning techniques as a computer
vision methodology to forecast and identify actions and attributes from videos. The study
utilized the Inception-v3 and Yolo-v5 models to identify instances of violence, count the
number of individuals involved, and detect the presence of weapons in a specific situation.
These deep learning models were employed to create a video detection system as part of
the research. The results of the study indicated that the proposed model achieved a 74%
accuracy rate.

“Real time violence detection in surveillance videos using Convolutional Neural

Networks” by Irfanullah et al. (2022)

The study involved a comparison of the performance of the MobileNet model proposed in
this research with that of the AlexNet, VGG-16, and GoogleNet models using
Convolutional Neural Network (CNN) models. Simulations were conducted in Python, and
accuracy and loss values were assessed for each model. AlexNet demonstrated an accuracy
of 88.99% and a loss of 2.480%. VGG-16 achieved an accuracy of 96.49% with a loss of
0.1669%, while GoogleNet achieved 94.99% accuracy and a loss of 2.92416%. In contrast,
the proposed MobileNet model achieved an accuracy of 96.66% and a loss of 0.1329%.
When applied to the hockey fight dataset, the MobileNet model proposed in this study
exhibited exceptional performance in terms of accuracy, loss, and computational time.

“Real-Time Violent Action Recognition Using Key Frames Extraction and Deep
Learning” by Ahmed et al. (2021)

The objective of the research was to investigate the application of convolutional neural
networks (CNNs) and Inception V4 for the detection and recognition of violence in video
data. The proposed framework involved extracting key frames to eliminate redundant
consecutive frames, thereby reducing the training data size and computational
requirements. For feature selection and classification, a sequential CNN with a single
kernel size was utilized, while the Inception V4 CNN employed multiple kernels across
4
different layers of its architecture. The study performed empirical analysis using four
standard datasets that encompassed various activities. The results showcased that the
proposed approach achieved a 98% accuracy, significantly reduced computational costs,
and surpassed existing techniques in violence detection and recognition.

2.2 Artificial Intelligence (AI)

This section encompasses the concepts, applications, and various related terms associated
with deep learning and artificial intelligence (AI).
John McCarthy initially put the concept of Artificial Intelligence (AI) forward in 1955. AI
is described as the science and engineering of creating intelligent machines. AI is known
for its ability to provide reliability, cost-effectiveness, and solutions to complex problems
while also preventing data loss. In today's world, artificial intelligence (AI) is being used
in many different areas, like business and engineering. One useful technique in AI is
reinforcement learning, which involves testing things out in real-life situations to see what
works and what doesn't. This helps make AI applications more dependable and reliable.
Figure 2.1 shows the relationship between deep learning and machine learning. It shows
how these two areas of AI are related to each other.

Figure 2.1: Venn diagram to illustrate DL is a subset of ML

2.2.1 Deep Learning (DL)

Deep learning is a subset of machine learning, that focuses on training artificial neural
networks (ANNs) to directly learn features and perform tasks from data. These ANNs are
designed to imitate the way our brains work. (LeCun et al., 2015). An artificial neural

5
network (ANN) is composed of interconnected nodes known as neurons, which analyze
and acquire knowledge from input data.

.
Figure 2.2: Fully connected artificial neural network

[Retrieved from: https://www.geeksforgeeks.org/introduction-deep-learning/]

Figure 2.2 shows a fully connected ANN. The fully connected deep neural networks
comprises the input layer and the consecutive hidden layers. In this architecture, each
neuron in the hidden layers receives input from either the preceding layer or the input layer.
The output from one neuron in a layer serves as input for the neurons in the subsequent
layer, continuing this pattern until the final layer produces the network's output. By
applying a series of nonlinear transformations, the layers of a neural network modify the
input data, enabling the network to comprehend intricate representations of the data.

2.2.2 Convolutional Neural Network (CNN)

Widely used type of DNNs are the convolutional neural networks (CNNs) which can process
data with a known grid topology, and are particularly popular for computer vision tasks such
as object classification. Unlike other neural networks, CNNs don't require manual feature
extraction as they automate the process of feature extraction (Goodfellow et al., 2016).

The Convolutional neural network (CNN) comprises various layers, including convolution,
pooling, and fully connected layers. The network implements a backpropagation algorithm to
learn spatial hierarchies of features autonomously and adjust to new data (Patil & Rane, 2021).
This system is comprised of three primary components: an input layer, a feature extraction
module, and a classification module. The feature extraction component is composed of multiple
6
layers that perform operations such as the convolution, pooling, and ReLu functions. These
operations help identify and differentiate various features within the input images during the
network's training process. The latter stages of the network includes the fully connected layer
and the output layer, which plays a crucial role in classifying the input images. The CNN
architecture is shown in figure 2.3.

Figure 2.3: CNN architecture

[Retrieved from: https://www.theclickreader.com/introduction-to-convolutional-neural-

networks/]

2.2.3 Layers in CNN

 Convolutional Layer

In a Convolutional Neural Network (CNN), the initial layer is responsible for extracting
diverse features from input images. This is accomplished through a mathematical operation
called convolution, wherein the input image is convolved with a filter of a specific size
denoted as KxK. By sliding the filter across the input image, the dot product is computed
between the filter and corresponding portions of the image, based on the filter size (KxK).
The resulting output is referred to as a feature map, which contains information about
various aspects of the image, such as its edges and corners. Afterwards, this feature map is
passed on to subsequent layers to learn additional features of the input image.

Once the convolution operation is applied to the input, the resulting output is transmitted
to the subsequent layer in the Convolutional Neural Network (CNN). The convolutional
layers in CNN have a vital role in preserving the spatial relationship between the pixels in

7
the input image, ensuring the integrity of the image's spatial information as it progresses
through the network.

Figure 2.4: Image convolution

[Retrieved from https://highlevel-synthesis.com/2017/05/26/convolutional-neural-

network-on-zynq-part-00-convolution-in-caffe/]

 Pooling Layer

Following the Convolutional Layer is a Pooling Layer that serves the purpose of down
sampling the feature map after convolution and decreasing the computational requirements.
The reduction in connections between the layers is accomplished to minimize the
complexity, and the pooling layer performs its operations on each feature map
independently. Max Pooling selects the largest element within a specific region of the
feature map, while Average Pooling computes the average of the elements in that region.
Likewise, Sum Pooling calculates the total sum of the elements within the defined region.
The Pooling Layer serves as a connecting link between the Convolutional Layer and the
FC (Fully Connected) Layer.

8
Figure 2.5: Illustration of a max pooling

[Retrieved from: https://www.geeksforgeeks.org/pooling-layer/]

Figure 2.5 illustrates a max pooling. Max pooling is the method utilized in deep learning
and the convolutional neural networks (CNNs) to reduce the data size. It works by dividing
the input into smaller sections and choosing the biggest value from each section. By picking
the biggest value, max pooling keeps the most important features while making the data
smaller. This helps make the computations faster, capture strong characteristics, and ensure
that the network can recognize objects regardless of their position in an image.

 Fully Connected Layers

The FC layer in a CNN is made up of neurons, weights, and biases and functions as a
connection between two distinct layers. In a typical CNN design, the FC layers are
positioned towards the end, just before the output layer. The input image, after going
through preceding layers, is flattened and transmitted to the FC layer. Subsequently, it
traverses a series of FC layers where mathematical operations are commonly performed.
This stage marks the initiation of the classification process. Connecting two fully connected
layers is often preferred over a single connected layer since it tends to yield better results,
reducing the reliance on human supervision in CNNs.

 Dropout

Connecting all the features to the FC layer can lead to overfitting in the training dataset,
where the model performs well on the training data but poorly on unseen data. To mitigate
this issue, a dropout layer can be utilized. During training, the dropout layer randomly
excludes certain neurons from the neural network, effectively reducing the model's size.

9
For example, if the dropout value is set at 0.3, 30% of the nodes are randomly eliminated
from the neural network. This technique improves the performance of the machine learning
model by streamlining the network and preventing overfitting.

 Activation Functions

To make a CNN model work well, it is necessary to choose the right activation function.
An activation function helps the model learn complicated relationships between different
parts of the network. It decides which information should be used to make predictions and
which should not. There are different types of activation functions, and each has its own
purpose. Some are better for binary classification, while others are better for multi-class
classification. The activation function uses math to decide which information is important
for making predictions.

2.2.4 Recurrent Neural Networks (RNN)

The Recurrent Neural Networks (RNNs) are a kind of neural networks that is best suited
for working with data that comes in a sequence, such as natural language or time series
data. The main idea behind RNNs is that they have the ability to store and recall past inputs,
which they can then use to make predictions. RNNs use previous outputs as new inputs and
have hidden states that help them remember previous information.

Figure 2.6: Recurrent Neural Network

[Retrieved from: https://dataaspirant.com/wp-content/upload/2020/11/3-Recurrent-

Neural-Network.png]

10
Figure 2.6 represents RNN. An RNN has a repeating module that can process a sequence
of inputs one by one. At each step, the module takes in an input and a hidden state from the
previous step, and it calculates a new hidden state and an output. The hidden state keeps
track of information from previous inputs, allowing the network to remember patterns in
the sequence. This process repeats for each step in the sequence, producing a sequence of
outputs. Different types of RNNs have additional mechanisms to help them remember long-
term dependencies and avoid problems with training.

The memory of an RNN, also known as the hidden state, maintains all the information that
has been processed up to a certain time step.

2.2.5 LSTM

LSTM is an abbreviation for Long Short-Term Memory. It is a type of neural network that
help computers understand sequences of information, like sentences or music. It's special
because it can remember important things from the past and decide which things to keep or
forget. This helps the computer make better predictions about what comes next in the
sequence. Think of it like a person trying to remember a long story - they only remember
the important parts and forget the less important details, which helps them understand the
story better. Similarly, LSTMs help computers understand and remember important details
in sequences, which is useful for many different applications.

11
CHAPTER 3: PROJECT METHODOLOGY

3.1 Introduction

The chapter includes an in-depth explanation of the methodology chosen for the project. It
introduces the specific approach or framework that was used to manage the project from
start to finish. The project methodology is depicted in Figure 3.1 and includes the following
steps: problem statement, literature review on machine learning and video classification,
design of block diagram and flowchart, model training in Google Colab, testing of the
trained network, deployment of the system, hardware implementation and analysis, and
documentation.

Problem Statement

Literature Review

System Design

Model Training

Model Testing and Deployment

Hardware Implementation

Testing and Analysis

Documentation

Figure 3.1: Project methodology

12
3.2 Project Methodology

The methodology for the project is divided into the several stages, each stage which
includes various activities. The project began by identifying and clearly defining a problem
statement, which served as the foundation for the project's focus and objectives. This
problem statement precisely outlined the specific issue or challenge that the project aimed
to address, providing a clear direction for the project's activities.

To gain the comprehensive understanding of the field and relevant techniques, a thorough
literature review was conducted. This review focused specifically on machine learning,
deep learning, and convolutional neural networks, with a particular emphasis on various
video classification papers. Through this literature review, the project team gained valuable
insights, acquired knowledge about existing approaches, and identified gaps or areas where
the project could contribute.

Next, the project moved into the design phase. During this phase, a block diagram and
flowchart were created to visually outline the structure and sequence of operations within
the system. This includes determining the arrangement and connections between the
Raspberry Pi camera, the GSM module, and the video classification model, outlining how
they interact with each other. These visual representation played a vital role in
conceptualizing and planning the architecture of project, helping the team understand the
various components and their interactions.

Model training encompasses the creation and refinement of a video classification model
designed to identify fight scenes. This process entails choosing suitable deep learning
models like the Convolutional Neural Network or the Recurrent Neural Network,
developing the model's structure, curating the dataset for training, and fine-tuning the
model's parameters through iterative optimization. The model is then trained using labeled
videos containing both fight scenes and non-fight scenes from the dataset. Google Colab, a
popular platform for machine learning development, was utilized for implementing and
training the model.

Following the training phase, the output or the trained network underwent thorough testing
to assess its accuracy, efficiency, and overall performance. The testing phase involved
evaluating the model against various test datasets or real-world scenarios to measure its
effectiveness and reliability in different contexts.
13
To ensure the practicality and viability of the solution, the project included deploying the
system in a real-world setting. This deployment allowed the team to observe and validate
the system's functionality and performance in practical scenarios, ensuring that it aligned
with the intended objectives and requirements.

In addition to software implementation, the methodology also encompassed the hardware

implementation of the system. This involved setting up the necessary hardware components
and integrating them with the software solution. Testing and analysis were performed to
ensure the compatibility, stability, and efficiency of the system within the hardware
environment.

Finally, the project concluded with comprehensive documentation. This documentation

served as a repository for all relevant information, results, and insights obtained throughout
the project's lifecycle. It captured the project's methodology, outcomes, and key learnings,
enabling effective knowledge preservation, future reference, and dissemination of the
project's findings.

14
CHAPTER 4: DESIGN OF THE PROPOSED SYSTEM

4.1 Introduction

This chapter includes the system architecture and the components used in the project will
be provided. In the subsequent sections of this chapter, a detailed implementation and
integration of each component will be discussed, including software and hardware
requirements, as well as any challenges or considerations encountered during the
development process.

4.2 System Architecture

The deep learning project focused on fight and non-fight video detection and utilized a
GSM module, Pi camera, Raspberry Pi, and speaker for alerts. The system implemented
the Long-term Recurrent Convolutional Network (LRCN) model to successfully train and
classify videos into two categories: "fight" and "non-fight". The Raspberry Pi serves as the
core processing unit, while the Pi camera captures high-resolution video footage. The GSM
module enables communication for alerts, and the speaker provides audible notifications.

Figure 4.1: System block diagram

15
4.2.1 Long-term Recurrent Convolutional Network-LRCN

Jeff Donahue and colleagues introduced the Long-term Recurrent Convolutional Network,
or LRCN, in 2016 (Donahue et al., 2017). LRCN is particularly useful for tasks that require
large-scale visual understanding, such as action recognitions, image captioning, and video
classification. The LRCN combines both the Convolutional Neural Networks (CNNs) and
the Long Short-Term Memory (LSTM) networks. Its purpose is to analyze video frames
and capture both the spatial and temporal information, which makes it ideal for analyzing
videos.

Figure 4.2: LRCN architecture

[Retrieved from: https://kobiso.github.io/research/research-lrcn/]

In the LRCN architecture, the CNN component acts as a visual feature extractor. It applies
convolutional layers to the input video frames, detecting visual patterns and features.
Techniques like batch normalization and max pooling are often utilized to enhance
performance and reduce overfitting. The CNN produces a sequence of high-level visual
features as its output. The LSTM component takes the sequence of visual features generated
by the CNN and analyzes them to capture the temporal dynamics within the video. LSTM
cells have a memory state that enables them to retain information over time, making them
capable to learn long-term dependencies. The LSTM processes the sequence of visual
features, updating its memory state, and generating an output at each time step.

16
To obtain a final prediction or classification, the LRCN architecture typically incorporates
a fully connected layer on top of the LSTM. This layer maps the LSTM outputs to the
desired output classes, such as “fight” and “non-fight”, utilizing appropriate activation
functions. The LRCN architecture combines the strengths of CNNs in extracting
meaningful visual features and the sequential modeling capabilities of LSTMs. This
combination allows the model to comprehend both the static visual content and the
temporal evolution of videos, making it suitable for various tasks like action recognition,
video captioning, and video classification.

For this project, LRCN (Long-term Recurrent Convolutional Network) model was found
to be highly suitable for the task of classifying videos into two categories: "fight" and "non-
fight." The LRCN architecture provided a robust framework for effectively analyzing both
visual and temporal information in videos. The CNN component of the LRCN model
extracted crucial visual features from video frames, capturing significant visual cues. On
the other hand, the LSTM component effectively modeled the temporal dependencies
between these frames, facilitating the recognition of patterns specific to fight or non-fight
sequences.

4.3 Hardware Requirements

Table 4.1: Component list and specification

Materials Specifications

Raspberry Pi processor RASPBERRY PI 4B

Pi Camera 5MP Raspberry PI Camera Board Module

Speaker Stereo Speaker Set

GSM Module Sim900A GSM Modem Module with SMA

Antenna

SD Card SAMSUNG EVO 128 GB MicroSD Card 100

MB/S Memory Card

Connecting Wires Male to Female 120 pin Jumper Wire

Power Adapter USB type-C 15.3W Power Supply

17
4.3.1 Raspberry Pi 4B

The Raspberry Pi 4 Model B (Pi4B) stands as the latest iteration in the Raspberry Pi series,
offering significant improvements in performance, memory capacity, and connectivity.
Released by the Raspberry Pi Foundation in June 2019, The Pi4B is a cost-effective,
compact, and highly capable single-board computer that has gained significant popularity
for its power and versatility.

Compared to its predecessors, the Raspberry Pi 4B features notable hardware upgrades,

making it well-suited for a wide range of projects. It is equipped with a quad-core ARM
Cortex-A72 processor, the Broadcom BCM2711, which delivers enhanced performance
and multi-threading capabilities. The Pi4B is available in various RAM configurations,
allowing users to choose the option that best suits their specific needs. It includes a microSD
card slot for storage and an MIPI CSI camera interface, enabling the utilization of the
official Raspberry Pi camera module or other compatible cameras.

With its GPIO pins, Raspberry Pi 4B enables hardware interfacing and expansion, making
it ideal for IoT projects and prototyping. It supports various operating systems, with
Raspberry Pi OS (formerly Raspbian) being the recommended and officially supported
option. Additionally, it is compatible with popular Linux distributions such as Ubuntu and
third-party operating systems tailored for specific use cases.

To supply power to the Pi4B, a reliable USB-C power supply capable of delivering 5V at
3A is required. Nevertheless, if the USB devices connected to the Pi4B consume less than
500mA, a 5V, 2.5A power supply is adequate.

Figure 4.3: Raspberry pi 4B

[Retrieved from: https://www.raspberrypi.com/]

18
4.3.2 Pi Camera

The Pi Camera module is designed specifically for Raspberry Pi boards, including the
Raspberry Pi 4B, to provide a dedicated camera functionality. It provides a compact and
user-friendly solution for capturing images and videos.

Figure 4.4: Pi camera

[Retrieved from: https://www.raspberrypi.com/]

For this project, the pi camera is used to capture video footage for analysis and
classification. It is connected to a Raspberry Pi board and configured to capture live video
streams or record video clips. This involves initializing the camera module, setting up
parameters such as resolution and frame rate, and starting the video capture.

Specifications:
 Model : 5 Megapixel Omni-vision 5647 Camera Module
 Picture resolution : 2592 by 1944 Pixels
 Size : 25mm by 23mm by 8mm
 Weight : 3 grams

4.3.3 GSM Module

A GSM module is a compact electronic device that incorporates GSM functionality into a
device or system, enabling it to connect to GSM networks and facilitate wireless data
transfer. It comprises essential components such as a GSM modem, an antenna for signal
reception and transmission, a SIM card slot for authentication and identification, and

19
interface circuitry. The primary purpose of GSM modules is to establish connectivity and
enable communication over GSM networks, allowing devices to send and receive data,
make voice calls, and exchange SMS messages. These modules are known for their
lightweight design, user-friendly operation, and low power consumption.

Figure 4.5: GSM SIM900 module

[Retrieved from: https://lastminuteengineers.com/sim900-gsm-shield-arduino-tutorial]

The Sim900 GSM Module with an SMA Antenna has been chosen for use. This module
enables GSM/GPRS communication across four frequency bands: 850 MHz, 900 MHz,
1800 MHz, and 1900 MHz. It offers reliable voice call support, SMS messaging
capabilities, TCP/IP connectivity, and an AT command interface for easy control and
configuration. The SMA antenna connector allows for improved signal reception, making
it suitable for applications requiring robust GSM communication in various locations.

In this project, the primary application of the GSM module is to enable mobile
communication, facilitating voice calls and text messaging (SMS) for mobile phones and
smartphones. It plays a vital role in establishing connectivity, ensuring continuous
communication between the system and the relevant authorities, during emergencies.

4.4 Software Requirement

The software required for this project are listed in the table 4.2.

Table 4.2: Software required

Sl.no Software/ Environment Version

1 Google Colab Free version

20
2 VNC Viewer 7.1.0

3 PuTTY 0.76

4 Raspberry Pi Imager 1.7

5 Fritzing 0.9.10

6 Python 3.7.9

7 Tensorflow 2.11

8 NumPY 1.23.5

9 Keras 2.12.0

10 OpenCV 4.7.0

11 AT commands 2.2.0.

12 Raspian Bullseye 64bit OS Debian 11

13 Thonny 3.2

14 Minicom 3.0

Model training was done on Google Colab since it provides the use of free GPU and
preinstalled libraries. VNC viewer stands for Virtual Network Computing. Its purpose is to
enable the sharing of a remote desktop over a network connection. When both a laptop and
Pi4B are connected to the same network, the laptop can be utilized to control the Pi4B.
PuTTY is a Windows-based software that serves as an implementation of SSH and Telnet
protocols. It is employed to establish a connection between a laptop and Pi4B using SSH,
enabling communication and remote access between the two devices.

The Raspberry Pi Imager is a software tool that allows users to easily install operating
systems on Raspberry Pi devices. It provides a user-friendly interface to select and write
different operating system images onto an SD card or other storage media. This enables
users to quickly set up their Raspberry Pi with the desired operating system without the
need for complex manual installation procedures. The Fritzing app is a software tool that
assists in the design and documentation of electronic circuits. OpenCV is a programming
function library that is focused on real-time computer vision. It is a library that is utilized
for processing of images.

21
TensorFlow is a free open-source library that is utilized for computing numerical and large-
scale machine learning tasks. Additionally, it can be used to train and run deep neural
networks (DNNs). Python is the language used in TensorFlow due to its simplicity in
learning and implementation processes. NumPy, another Python library, is utilized for
working with arrays. Keras is a Python-based open-source software library used to build
artificial neural networks, acting as a bridge between users and the TensorFlow library.
Thonny, on the other hand, is an integrated development environment for Python
programming.

Raspian Bullseye was selected among the Raspbian operating system. Raspbian Bullseye
is an operating system specifically built for Raspberry Pi devices, utilizing a 64 -bit
architecture. It is an upgraded version of the Raspbian OS, tailored to provide improved
performance and compatibility with newer Raspberry Pi models.

The AT command set is a set of commands utilized for managing and establishing
communication with devices that are compatible with AT commands. On the contrary,
Minicom is a terminal emulation software that enables users to interact with devices using
a serial connection, such as modems, routers, or embedded systems.

4.4 Development Process and Implementation Details

4.4.1 System Flowchart

Figure 4.6: System flowchart

22
The flowchart helped in guiding the development and the implementation of the project. It
enables efficient decision-making and helps streamline the implementation of various
components, such as the Pi camera, Raspberry Pi, and GSM module.

 Real-Time Video Input: The flowchart begins with the Pi camera capturing real-time
videos. This ensures that the system is continuously monitoring the surroundings for
any potential fight incidents.
 Video Processing: the Raspberry Pi then processes the captured videos. Using
appropriate algorithms, the system analyzes the video frames to determine whether they
contain a fight or not.
 Fight detection: Based on the video processing results, the system identifies whether
the video depicts a fight or a non-fight scenario. If it is determined to be a fight video,
the flow moves to the next step.
 GSM Activation: Upon detecting a fight video, the flowchart triggers the activation of
the GSM module. The GSM module allows the Raspberry Pi to communicate via
cellular networks. This activation enables the system to send notifications or alerts
regarding the detected fight, providing a means to inform relevant authorities about the
incident.
 Speaker Activation: Along with the GSM activation, the flowchart includes the
activation of a speaker. This can be utilized to emit an audible alert or warning in the
vicinity, notifying nearby individuals or authorities about the ongoing fight and
possibly deterring further escalation.
 Continuous Monitoring: If a fight is not detected in the current video, the flow returns
to the beginning, and the camera continues capturing videos for further analysis. This
ensures that the system remains vigilant and continues to monitor the environment until
a fight video is detected.

4.4.2 Development Process

The development process began with collecting and labeling a dataset of fight and non-
fight videos. Colab, a cloud-based Jupyter notebook environment, was used for training the
deep learning model due to its computational resources and pre-installed libraries. Once
trained, the model was deployed and executed on the Raspberry Pi.

23
After conducting thorough research on existing methods and technologies, these were the
key aspects of development:

 Data Collection: Gathered a diverse dataset of videos containing both fight and non-
fight scenarios. These dataset served as the basis for the training and testing the
detection model. These datasets formed the foundation for training and evaluating the
detection model. A collection of 900 videos was obtained, with 450 videos in each
category: fight and non-fight.
 Model Training: To train video datasets using LRCN in Colab, we set up our
environment, installed necessary libraries, and preprocessed the data. Next, we defined
the model architecture, created data generators, and compiled and trained the model.
After training, we evaluated the model's performance and fine-tuned it if necessary to
improve results.
 Model Evaluation: This process included assessing the performances of the trained
model using evaluation metrics like accuracy. Next, the validation stage involves
testing the model's capability to correctly detect videos that depict fights and non-fights.

Some of the issues faced with LRCN model are:

 Limited GPU resources: While Google Colab provides access to GPUs, the allocated
resources are limited. Training a video dataset using LRCN is computationally
intensive and time consuming.
 Limited computational power: Running complex deep learning models like LRCN was
resource-intensive, which led to slower performance.
 Limited dataset availability: Developing a robust deep learning model for our project
required a diverse and sufficiently large dataset.
 Model optimization for real-time processing: Real-time processing of video streams
from a Pi Camera requires careful optimization of the LRCN model to efficiently handle
the incoming video frames in real-time.

Apart from the constraints mentioned above, there were additional challenges to consider,
such as managing dependencies, handling connectivity issues, addressing interruptions in
Colab sessions, ensuring proper version control, and dealing with the labor-intensive task
of labeling and annotation.

24
To overcome the issues, we meticulously fine-tuned the training parameters, by tweaking
hyper-parameters such as epochs, batch size, and regularization techniques to achieve
optimal performance. Additionally, we spent a lot of time and effort assembling a larger
and more diversified dataset, ensuring it had a variety of important occurrences, edge
circumstances, and potential challenges. By using this technique, we were able to train the
model on a sizable and varied set of data, increasing its ability to generalize and handle a
range of situations.

4.4.3 Implementation

The implementation involved installing the necessary software libraries and dependencies
on the Raspberry Pi, ensuring compatibility and smooth execution. Integration with the Pi
camera, GSM module, and speaker required establishing appropriate connections and
interfaces based on hardware specifications.

To initialize the Raspberry Pi 4B:

1. Setting up Raspberry Pi:
 Installed the Raspbian operating system.
 Performed basic configuration (e.g., setting up Wi-Fi, enabling VNC
viewer).
2. Installed required libraries and dependencies:
 Opened a terminal on the Raspberry Pi or connected remotely via VNC
viewer.
 Updated the package lists
 Installed Python libraries:
 Installed Pi camera library:
 Installed OpenCV for image processing (optional):
 Installed necessary libraries for the GSM module.

25
Figure 4.7: Circuit design for hardware implementation

The above figure 4.7 depicts the circuit design done using the Fritzing software for
hardware implementation of the project.

To deploy the trained model we transferred the trained LRCN model:

1. Downloaded the trained model files from Colab to the local machine in .h5 format.
2. Transferred the model files to the Raspberry Pi.

Figure 4.8: Hardware implementation 26

Figure 4.9: VNC viewer – monitor

Throughout the development and implementation processes, considerations such as

resource constraints, computational efficiency, and reliability are taken into account.
Optimization techniques like model quantization or compression were employed to ensure
efficient execution on the Raspberry Pi's limited resources.

We had to conduct rigorous testing of the integrated system to verify its performance and
reliability by iterating and refining the algorithms. Following the system implementation,
video frames captured by the Pi camera undergo preprocessing before being inputted into
the deep learning model. The model's output activates alerts and notifications through the
GSM module and speaker, according to predetermined conditions.

4.5 Case Design

For the compact enclosure of the system, a case was designed. After specifying the
dimensions and shape, the design was done in fusion 360 and 3D printed in FabLab. The
case was designed to make the system compact and portable while providing access to
necessary ports and functionality. The dimension of the case is 10 cm by 10 cm by 11cm.

27
Figure 4.10: System case

28
CHAPTER 5: RESULTS AND ANALYSIS

5.1 Introduction

This chapter comprises the results obtained from training and testing the models, which
were conducted to replicate a real scenario. Additionally, the analysis of the system through
testing in various scenarios and cases is included. The chapter also covers a discussion on
the performance of the model and the methods used to improve its performance.
Furthermore, the results of the system's performance are analyzed using various
parameters. The chapter concludes with a discussion on the cost analysis of the system.

5.2 Statistics of Datasets

The pie chart below represents the datasets that we have collected for training the model.
The datasets are collected from various sources like GitHub, Kaggle and custom-made
videos.

Figure 5.1: Dataset distribution

The dataset statistics indicate that it contains 900 video samples, with an equal distribution
of 450 fight videos and 450 non-fight videos. Each video sample has an average duration
of 10 seconds, resulting in a substantial amount of data for the training and testing the fight
scene detection system. The dataset was split into training and testing ratio of 80:20 in
percentage.

29
5.3 Comparative Study of Models

The table below provides a comparative analysis of five different models namely LRCN,
CNN+RNN, MobileNetV2, ConvLSTM, and MobileNet+LSTM. These models were
evaluated based on their accuracy, which we got during the model training and real-time
testing processing time per batch when we tested it in the system.

Table 5.1: Model comparison

Sl.no Model Description Evaluation Real-time Testing

metrics:
Accuracy
Processing time Per
batch

1. LRCN Long Term Recurrent 87.3% 25ms

Convolutional
Network

2 CNN+RNN Convolutional Neural 26ms

Network and 85%
Recurrent Neural
Network

3 MobileNetV2 MobileNet version 2 86% 25ms

4 ConvLSTM CNN with LSTM 79.43% 27ms

5 MobileNet+LSTM MobileNet with 84.53% 24ms

LSTM

Figure 5.2: Model comparison

30
The LRCN model, which combines recurrent and convolutional neural networks, achieved
the highest accuracy of 87.3%. It also demonstrated a fast-processing time of 25ms per
batch during real-time testing. The ConvLSTM model, combining a convolutional neural
network with LSTM layers, achieved an accuracy of 79.43% but had a slightly longer
processing time of 27ms per batch, which shows the model is not good to deploy in our
system.

These results indicate that the LRCN and MobileNetV2 models achieved the highest
accuracy while maintaining real-time processing capabilities. However, for the system we
deployed the LRCN model considering the trade-offs between accuracy and processing
time of the LRCN model.

5.4 Model Evaluation

The model was evaluated based on the accuracy for the LRCN model that we deployed in
the system to detect fights.

Figure 5.3: Accuracy graph of LRCN model

The above graph represents the accuracy of the LRCN model trained. The accuracy got
while training the model was 88.12% and it shows that the model has high accuracy.

5.5 Hyper-parameter Tuning Analysis

The performance of the model can be optimized by tuning hyper-parameters. Therefore, to

optimize the model, hyper-parameter tuning was performed on the LRCN (Long Term

31
Recurrent Convolutional Network) architecture. The hyper-parameters that were tuned
includes the number of frame sequence and the number of epochs for training.

Table 5.2: LRCN analysis

Model Frame sequence Epoch Accuracy Processing time per batch

LRCN: 5 30 85.92% 23ms

Long Term
Recurrent 10 20 85.74% 24ms
Convolutional
Network 15 40 87.89% 24ms

20 45 88.12% 25ms

25 50 86.23% 27ms

Figure 5.4: LRCN model analysis

From the analysis, it can be observed that the LRCN model achieves the highest accuracy
of 88.12% when trained with 20-frame sequence and for 45 epochs. The real-time testing
results show that the processing time per batch ranges from 23ms to 27ms, indicating that
the model can process video frames within seconds in real-time.

Therefore, it shows that by tuning the hyper-parameters of the LRCN model, it was possible
to optimize its performance and achieve high accuracy in detecting fights. Furthermore,
these findings provide insights into the ideal configuration for the fight detection system,
enabling effective deployment and real-time monitoring.

32
5.6 System Testing Result

To evaluate the system accuracy, the system was tested on real-time input video. The
following figures shows the result of system testing done.

Figure 5.5: Initial system testing result

Figure 5.6: Final system testing result

The detection accuracy or confident score for non-fight incidents is 94%, while for fight
incidents, it is 99.6%. These result indicate that the model was effective in accurately
identifying and detecting fights.

5.7 System Performance Analysis

The system performance is affected by various parameters like camera angle, range of the
camera and the light intensity of the scene to be captured and analyzed. The analysis of the
system performance was based on these parameters, and conducted system testing on
various parameters in different scenarios. The table below shows the experimental data that
were collected during system analysis.

33
Table 5.3: Data obtained from system analysis

Parameters Values Response time Accuracy (%)

(ms)

Light Low(0-50 lux) 28 85.2

Intensity
Medium(50-100 lux) 24 91.6

High(100-200 lux) 20 95.8

Distance Short(2-3m) 22 93.4

Medium(4-6m) 26 89.7

Long(7-10m) 30 82.1

Camera Wide(180°) 22 92.7

Angle
Moderate(90°) 26 88.4

Narrow(2°) 30 83.2

Figure 5.7: Accuracy of model versus various parameters

34
Figure 5.8: Response time versus various parameters

The table presents an overview of the system's performance based on different parameters,
including light intensity, distance, and camera angle. It demonstrates that as the light
intensity increases from low to medium and high, the system's accuracy improves to 91.6%
and 95.8%, respectively, from an initial value of 85.2%. Similarly, as the distance increases
from short to medium and long, the accuracy decreases to 89.7% and 82.1%, respectively,
from an initial value of 93.4%. The camera angle also influences accuracy, with wider
angles (180°) achieving a higher accuracy of 92.7% compared to moderate angles (88.4%)
and narrow angles (83.2%). The corresponding performance times in milliseconds indicate
the speed at which the system processes the data. These findings emphasize the significance
of considering environmental factors and parameter settings for optimizing the
performance of the fight detection system.

5.8 System Performance and Reliability for the GSM communication

The table presents experimental results for violence detection, specifically focusing on
communication delay and response time. These results offer valuable insights into the
performance of the communication system, highlighting the delay in transmitting
information and the time required to generate a response.

35
Table 5.4: Reliability of GSM communication

No. of trials Communication Delay (ms) Response Time

(seconds)

1 10 2

2 20 3

3 15 2.5

4 12 2.2

5 18 3.1

6 25 2.8

The recorded values for communication delay and response time reflect the performance
and reliability of the system in transmitting alerts and receiving timely responses. A lower
communication delay signifies efficient communication between the GSM module and
relevant authorities, enabling faster alert transmission. Similarly, a faster response time
demonstrates the responsiveness and effectiveness of the concerned authorities in
addressing detected fight scenes.

Analyzing the system based on communication delay and response time provides valuable
insights into its performance and reliability in facilitating timely communication and
response to fight scenes. By optimizing the system to minimize communication delays and
improve response times, it can significantly enhance public safety by enabling prompt
actions from the authorities.

5.9 Cost Analysis

Table 5.5 shows the cost analysis of the project.

36
Table 5.5: Cost analysis

Components Quantity Cost

Raspberry Pi processor 1 Nu. 12000

Pi Camera 1 Nu. 999

Speaker 1 Nu.1000

GSM Module 1 Nu.2999

SD Card 1 Nu.999

Connecting Wires 1 Nu.399

Power Adapter 1 Nu.599

Total Nu. 18,995.

The cost analysis above includes the prices for all the materials required for the project.
The Raspberry Pi processor, Pi Camera, Speaker, GSM Module, SD Card, Connecting
Wires, and Power Adapter have been listed along with their respective quantities and costs.
The total cost of all the materials combined is Nu. 18,995.

37
CHAPTER 6: CONCLUSION AND FUTURE WORK

6.1 Conclusion

The primary objective of this project was to employ a deep learning model to develop a
fight detection system. By putting in place a reliable system using a Raspberry Pi 4B, a Pi
camera, a speaker, and a GSM module, the goal was accomplished. In order to categorize
the incoming video feeds from the camera as fight or non-fight, the Long-term Recurrent
Convolutional Networks (LRCN) architecture, which combines the Convolutional Neural
Networks (CNN) and the Recurrent Neural Networks (RNN) was trained and deployed.
Custom dataset and a Kaggle and GitHub dataset were utilized for this. The system
efficiently detects fights by analyzing video feeds in real-time and is made to be put in
various locations. Upon detection, it automatically starts a call to the police, sending them
the particular location, and plays an alert at the scene of the incident.

The project focused on creating a reliable and effective fight detection system and assessing
its performance in practical situations. By providing an automated and pro-active method
for recognizing and resolving disputes in public places, the system is an effective adoption
to lower the crimes and increase public safety and security. It has the advantage of real-
time fight detection over conventional CCTV systems, removing the need for human
monitoring and speeding up response times.

6.2 Future Work and Recommendation

Future work for improving the system's performance includes addressing challenges related
to lighting conditions, camera angles, distance variations, and the use of multiple cameras.
To improve categorization in varied light situations, methods such as picture enhancement,
adaptive thresholding, and dynamic lighting modifications can be explored. Preprocessing
video frames to equalize lighting can help maintain accuracy across different lighting
scenarios. Enhancing performance in various camera angles involves training the model o n
a diverse dataset with videos captured from different viewpoints. This enables the system
to handle varying camera angles and classify conflicts more precisely. Additionally,
investigating changes to the dataset or synthesizing different camera angles can further
improve the system's ability to handle different viewpoints.

38
To address differences in distance from the camera, future research can focus on developing
techniques to calculate distances between objects or people in video frames. This can
enhance the system's accuracy in scenarios where fights occur at various distances. By
utilizing distance estimation algorithms, the system can adjust its categorization strategy
based on the proximity of the fight, improving overall performance and accuracy. Another
potential avenue for improvement is using multiple cameras to capture the scene from
different angles. This approach provides a more comprehensive view of the incident,
enhancing the accuracy of fight detection and classification.

39
REFERENCES

Ahmed, M., Ramzan, M., Khan, H. U., Iqbal, S., Khan, M. A., Choi, J. I., Nam, Y., &
Kadry, S. (2021). Real-time violent action recognition using key frames extraction and
deep learning. Computers, Materials and Continua, 69(2), 2217–2230.
https://doi.org/10.32604/cmc.2021.018103

Akti, S., Tataroglu, G. A., & Ekenel, H. K. (2019). Vision-based Fight Detection from
Surveillance Cameras. 2019 9th International Conference on Image Processing
Theory, Tools and Applications, IPTA 2019.
https://doi.org/10.1109/IPTA.2019.8936070

Arun Akash, S. A., Sri Skandha Moorthy, R., Esha, K., & Nathiya, N. (2022). Human
Violence Detection Using Deep Learning Techniques. Journal of Physics: Conference
Series, 2318(1). https://doi.org/10.1088/1742-6596/2318/1/012003

Bhagya Divya, P., Shalini, S., Deepa, R., & Reddy, B. S. (2017). Inspection of Suspicious
Human Activity in the Crowdsourced Areas Captured in Surveillance Cameras.
International Research Journal of Engineering and Technology. www.irjet.net

Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko,
K., & Darrell, T. (2017). Long-Term Recurrent Convolutional Networks for Visual
Recognition and Description. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 39(4), 677–691. https://doi.org/10.1109/TPAMI.2016.2599174

Gayathri, M. (2020). Suspicious Activity Detection and Tracking through Unmanned

Aerial Vehicle Using Deep Learning Techniques. International Journal of Advanced
Trends in Computer Science and Engineering, 9(3), 2812–2816.
https://doi.org/10.30534/ijatcse/2020/51932020

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
http://www.deeplearningbook.org

Iqbal, M. J., Iqbal, M. M., Ahmad, I., Alassafi, M. O., Alfakeeh, A. S., & Alhomoud, A.
(2021). Real-Time Surveillance Using Deep Learning. Security and Communication
Networks, 2021. https://doi.org/10.1155/2021/6184756

Irfanullah, Hussain, T., Iqbal, A., Yang, B., & Hussain, A. (2022). Real time violence
detection in surveillance videos using Convolutional Neural Networks. Multimedia
Tools and Applications, 81(26), 38151–38173. https://doi.org/10.1007/s11042-022-
40
13169-4

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
https://doi.org/10.1038/nature14539

Lim, F. J. (2019). Smart Security Camera Using Machine Learning. January, 54.

Patel, M. (2021). Real-Time Violence Detection Using CNN-LSTM.

http://arxiv.org/abs/2107.07578

Patil, A., & Rane, M. (2021). Convolutional Neural Networks: An Overview and Its
Applications in Pattern Recognition. Smart Innovation, Systems and Technologies,
195, 21–30. https://doi.org/10.1007/978-981-15-7078-0_3

Shiranthika, C., Premakumara, N., Chiu, H. L., Samani, H., Shyalika, C., & Yang, C. Y.
(2020). Human Activity Recognition Using CNN & LSTM. Proceedings of ICITR
2020 - 5th International Conference on Information Technology Research: Towards
the New Digital Enlightenment, January.
https://doi.org/10.1109/ICITR51448.2020.9310792

RBP. (2022). STATISTICAL.

Tiwari, R. K., & Verma, G. K. (2015). A Computer Vision based Framework for Visual
Gun Detection Using Harris Interest Point Detector. Procedia Computer Science, 54,
703–712. https://doi.org/10.1016/j.procs.2015.06.083

Velasco-Mata, A., Ruiz-Santaquiteria, J., Vallez, N., & Deniz, O. (2021). Using human
pose information for handgun detection. Neural Computing and Applications, 33(24),
17273–17286. https://doi.org/10.1007/s00521-021-06317-8

Zong Chen, D. J. I. (2020). Smart Security System for Suspicious Activity Detection in
Volatile Areas. Journal of Information Technology and Digital World, 02(01), 64–72.
https://doi.org/10.36548/jitdw.2020.1.006

Computer Science and Engineering: Under The Supervision of MR - Nikunj Kumar
No ratings yet
Computer Science and Engineering: Under The Supervision of MR - Nikunj Kumar
11 pages
Honeywell - Surveillance-Software-HDCS - Manual-EN
No ratings yet
Honeywell - Surveillance-Software-HDCS - Manual-EN
78 pages
Final Report
No ratings yet
Final Report
51 pages
Prowatch5 0
No ratings yet
Prowatch5 0
10 pages
800-26847-A MAXPRO R700 Known Issues Bulletin
No ratings yet
800-26847-A MAXPRO R700 Known Issues Bulletin
17 pages
Aarthi_report
100% (1)
Aarthi_report
28 pages
Privacy Preserving
No ratings yet
Privacy Preserving
66 pages
Signature Forgery Detection
No ratings yet
Signature Forgery Detection
6 pages
Title Lightweight Model Implementation Using Neural Network For Fruit Recognition
No ratings yet
Title Lightweight Model Implementation Using Neural Network For Fruit Recognition
48 pages
AI Project Cycle
No ratings yet
AI Project Cycle
7 pages
IOT Streetlight Controller System
No ratings yet
IOT Streetlight Controller System
28 pages
Data Warehouse On Hadoop Platform For Processing of Big Educational Data
No ratings yet
Data Warehouse On Hadoop Platform For Processing of Big Educational Data
4 pages
ATreatiseontheDifferentialGeometryofCurvesandSurfaces 10024282
No ratings yet
ATreatiseontheDifferentialGeometryofCurvesandSurfaces 10024282
487 pages
WTW 263 2024 Study Guide
No ratings yet
WTW 263 2024 Study Guide
24 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
25 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
MAXPRO License Plate Recognition (LPR) User Guide
No ratings yet
MAXPRO License Plate Recognition (LPR) User Guide
34 pages
Mca, Bca Project List 2023-2024
No ratings yet
Mca, Bca Project List 2023-2024
90 pages
Chapter 2 - DFA
No ratings yet
Chapter 2 - DFA
13 pages
Final Report
100% (1)
Final Report
33 pages
Disease Prediction Using ML
100% (1)
Disease Prediction Using ML
43 pages
Block Design-Based Key Agreement For Group Data Sharing in Cloud Computing
No ratings yet
Block Design-Based Key Agreement For Group Data Sharing in Cloud Computing
24 pages
Black Book
No ratings yet
Black Book
58 pages
AUTOMATIC METHODS FOR CLASSIFICATION OF PLANT DISEASES USING CONVOLUTION NEURAL NETWORK Paper Final
No ratings yet
AUTOMATIC METHODS FOR CLASSIFICATION OF PLANT DISEASES USING CONVOLUTION NEURAL NETWORK Paper Final
7 pages
Graph & Sorting Algorithm - Unit VI
No ratings yet
Graph & Sorting Algorithm - Unit VI
38 pages
Final Intership Report
No ratings yet
Final Intership Report
32 pages
2022 - Wang-Zhang-Xiao-Song - A Review On Graph Neural Network Methods in Financial Applications - Journal of Data Science
No ratings yet
2022 - Wang-Zhang-Xiao-Song - A Review On Graph Neural Network Methods in Financial Applications - Journal of Data Science
24 pages
Who Is Bayes? What Is Bayes?: Michal Oleszak
No ratings yet
Who Is Bayes? What Is Bayes?: Michal Oleszak
25 pages
Chapter03a Annotated PDF
No ratings yet
Chapter03a Annotated PDF
86 pages
Cloudera Ref Arch Generic Cloud
No ratings yet
Cloudera Ref Arch Generic Cloud
35 pages
PW 6.0 SCTU User Guide
No ratings yet
PW 6.0 SCTU User Guide
18 pages
Notes Data
No ratings yet
Notes Data
31 pages
2019-A Bi-Objective Hyper-Heuristic Support Vector Machines For Big Data Cyber - Security
No ratings yet
2019-A Bi-Objective Hyper-Heuristic Support Vector Machines For Big Data Cyber - Security
11 pages
Yahya Thesis - Draft
100% (1)
Yahya Thesis - Draft
58 pages
HSV HC35W25R3 01 Us (0622) DS y Final
No ratings yet
HSV HC35W25R3 01 Us (0622) DS y Final
2 pages
QM2022
No ratings yet
QM2022
105 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
G.R.Anantha Raman - 1A Review On Big Data Analytics in The Field of Agriculture
No ratings yet
G.R.Anantha Raman - 1A Review On Big Data Analytics in The Field of Agriculture
16 pages
Off-Street Parking Design Standards
100% (1)
Off-Street Parking Design Standards
4 pages
MCA Project Report Format - MU - Updated
100% (1)
MCA Project Report Format - MU - Updated
20 pages
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
No ratings yet
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
10 pages
RELP
No ratings yet
RELP
13 pages
Goal Programming: Multiple Objectives and Minimization of Goal Deviational Variables
No ratings yet
Goal Programming: Multiple Objectives and Minimization of Goal Deviational Variables
19 pages
Classification of Fruits and Detection of Disease Using CNN: Bachelor of Engineering IN Information Technology
No ratings yet
Classification of Fruits and Detection of Disease Using CNN: Bachelor of Engineering IN Information Technology
65 pages
Frank Wolfe
No ratings yet
Frank Wolfe
10 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
Ludhiana ICCC DPR Revised 14nov2018 WO Cost PDF
No ratings yet
Ludhiana ICCC DPR Revised 14nov2018 WO Cost PDF
202 pages
Ooad Record Abinash
No ratings yet
Ooad Record Abinash
241 pages
Nesting in The Sheet Metal Industry Dealing With C
No ratings yet
Nesting in The Sheet Metal Industry Dealing With C
8 pages
Stop Using The Elbow Criterion For K-Means
No ratings yet
Stop Using The Elbow Criterion For K-Means
7 pages
Exp 1
No ratings yet
Exp 1
7 pages
Fruit Old
No ratings yet
Fruit Old
37 pages
MScFE 620 DTSP Compiled Notes M5
No ratings yet
MScFE 620 DTSP Compiled Notes M5
16 pages
PPT-Secure Distributed Deduplication Systems With Improved Reliability
No ratings yet
PPT-Secure Distributed Deduplication Systems With Improved Reliability
46 pages
Alerton Integration Engine Release: Frequently Asked Questions
No ratings yet
Alerton Integration Engine Release: Frequently Asked Questions
2 pages
Exam DP 100 Data Science Solution On Azure Skills Measured
No ratings yet
Exam DP 100 Data Science Solution On Azure Skills Measured
6 pages
Enhancing Data Security in Iot Healthcare Services Using Fog Computing
No ratings yet
Enhancing Data Security in Iot Healthcare Services Using Fog Computing
36 pages
Lab # 7 Control System
No ratings yet
Lab # 7 Control System
15 pages
ECI Indicator
No ratings yet
ECI Indicator
4 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Project Report 2020-21-Merged
No ratings yet
Project Report 2020-21-Merged
50 pages
Pole Placement Adaptive Control For
No ratings yet
Pole Placement Adaptive Control For
5 pages
Project Final Report
100% (1)
Project Final Report
44 pages
English Cer
No ratings yet
English Cer
2 pages
Python and Machine Learning: A Practical Training Report On
No ratings yet
Python and Machine Learning: A Practical Training Report On
65 pages
PM4 Shikha MBA-B
No ratings yet
PM4 Shikha MBA-B
58 pages
Documentation - Secure Distributed Deduplcation
No ratings yet
Documentation - Secure Distributed Deduplcation
64 pages
Adding New Protocol To ns2
No ratings yet
Adding New Protocol To ns2
27 pages
REPORT FILE of FACE MASK DETECTION
No ratings yet
REPORT FILE of FACE MASK DETECTION
45 pages
18 Converging Blockchain and Machine Learning For Healthcare
No ratings yet
18 Converging Blockchain and Machine Learning For Healthcare
3 pages
Classification of Lung Sounds Using CNN
No ratings yet
Classification of Lung Sounds Using CNN
10 pages
RSA (Cryptosystem) : 1 History
No ratings yet
RSA (Cryptosystem) : 1 History
9 pages
The Miller-Rabin Randomized Primality Test
0% (1)
The Miller-Rabin Randomized Primality Test
10 pages
New Design of High-Gain Beam-Steerable Dipole Antenna Array For 5G Smartphone Applications
No ratings yet
New Design of High-Gain Beam-Steerable Dipole Antenna Array For 5G Smartphone Applications
5 pages
Grievance Portal
No ratings yet
Grievance Portal
44 pages
Presentation On Discriminant Analysis, Factor Analysis & Conjoint Analysis Presented by Sandip Magar
No ratings yet
Presentation On Discriminant Analysis, Factor Analysis & Conjoint Analysis Presented by Sandip Magar
12 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
Chapter-4 - CS-411 Compiler Construction
No ratings yet
Chapter-4 - CS-411 Compiler Construction
8 pages
E Learning Project Report
No ratings yet
E Learning Project Report
53 pages
MAXPRO NVR 2.5 Operators Guide
No ratings yet
MAXPRO NVR 2.5 Operators Guide
195 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Big Data
No ratings yet
Big Data
30 pages
Cells With Uplink Interference Are Those Whose RSSI: Step 2
No ratings yet
Cells With Uplink Interference Are Those Whose RSSI: Step 2
2 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Activity No.1 Engineering Data Analysis
No ratings yet
Activity No.1 Engineering Data Analysis
1 page
DSAP_lab_1
No ratings yet
DSAP_lab_1
2 pages
A Summer Training Report On Chat Massenger
No ratings yet
A Summer Training Report On Chat Massenger
62 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Calender2024-2025(1)
No ratings yet
Calender2024-2025(1)
1 page
Software Engineering
No ratings yet
Software Engineering
8 pages
MCA 504 Modelling and Simulation: Index
No ratings yet
MCA 504 Modelling and Simulation: Index
138 pages
Getting Started with Big Data Query using Apache Impala
From Everand
Getting Started with Big Data Query using Apache Impala
Agus Kurniawan
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)