0% found this document useful (0 votes)

22 views38 pages

Document (AutoRecovered)

This document introduces sensor-based human activity recognition (HAR), which predicts actions based on sensor data from devices like smartphones and wearables. It discusses the importance of deep learning techniques, particularly LSTM-2D CNN models, for accurately classifying human activities and highlights various sensors used in smartphones for data collection. The document also emphasizes the growing applications of HAR in healthcare, fitness tracking, and elderly monitoring, while addressing the challenges and advancements in machine learning methods.

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views38 pages

Document (AutoRecovered)

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 38

CHAPTER 1

INTRODUCTION

SENSOR-BASED human activity recognition (HAR), a time series classification task,

intends to predict a person's action or movement (such as walking, standing, jogging, etc.) based
on sensor data. Applications for HAR that are useful includes gesture recognition, video
surveillance, and fitness tracking. Due to the rise of ubiquitous computing enabled by
smartphones, wearables, and Internet-of-Things devices, HAR has lately seen an increase in
research activity despite being a well-studied and mature subject. Using sensors like
accelerometers, barometers, GPS, and gyroscopes, one can always and everywhere monitor a
person's physical movement. Because only one person normally uses a single smartphone or
smartwatch and because each individual has a unique motion while jogging, walking, or
climbing stairs, the human activity detection challenge is quite "personal."

1.1 Introduction to HAR

Several real-time sensing applications are becoming developed, particularly in the fitness
and health tracking fields. In order to better understand human behaviour, these applications
utilize mobile sensors included in smartphones to identify human activity. By training a
supervised learning model and showing the outcomes in accordance with the input obtained from
our accelerometer sensor and LSTM -2D CNN model, the HAR system is needed to detect six
fundamental human actions such as walking, standing, walking upstairs, walking downstairs,
laying and sitting.

Many useful mobile applications have recognized the advantages of wearable sensors,
including abnormal driving detection, healthcare systems for remotely monitoring elderly
persons, sport performance tracking, and mobile assistance systems for individuals who have
vision problems. Because of improvements in health, the proportion of elderly individuals in the
global population is higher than it has ever been. As a result, there is a higher need for social
support of the physical and emotional health of those who live alone. There are many reasons to
believe that machine learning and AI will be able to detect tasks.

PAGE \* MERGEFORMAT 34
For seniors who want to age in place, Activity recognition (AR) might be used to keep
track on their well-being, detect any disturbing changes in routine, and notify responders right
away in case of an emergency. According to the hardware used to gather data, augmented reality
(AR) may be split into three categories: camera video, wearable technology, and binary sensors.
Due to concerns about privacy invasion and practical issues, such as discomfort from the device
and higher maintenance requirements, cameras and wearable technology are less than ideal
solutions. This research developed a device-free, privacy-protecting way to investigate data-
driven AR based on deep learning. The binary sensor-based method provides a solution to the
problem of long-term activity monitoring in the actual world.

Figure 1. Human activity recognition (HAR)

The representation and extraction of features are necessary for the AR process to be
complete. In order to effectively classify and identify actions that are frequently conflated, such
as standing, sitting, lying down, and walking etc, this study set out to extract a meta-action by
evaluating the causal influence between a collection of sensor activations. Because each person's
activities are a reflection of their unique set of values, customs, and routines, making human
activity a process variable.

PAGE \* MERGEFORMAT 34
Even if the activity areas are comparable, a user's habits and lifestyle may have an influence on
the specific sequence or characteristic of sensor activation in any specific activity. This variation
may be defined as a causality between sensors.

Furthermore, enhancements that can improve the wearable activity detection model's capability
to provide more accurate evaluation of a variety of activities may be developed by using machine
learning approaches. These approaches using standard machine learning, however, generally rely
on heuristic manual feature extraction and are hence typically limited to understanding of the
human domain. System performance in terms of classification accuracy and other evaluation
metrics for systems using standard machine learning are constrained as a result of this limitation.
Deep learning (DL)-based techniques are utilised in this study to overcome these limitations.

1.2 About Sensors

Smartphones use sensors as a device to detect various environmental conditions. They
detect the information for which they were created, then act accordingly. Nowadays,
smartphones include a variety of sensors that are built-in and help in the functioning of the
device. In short, they aim to enhance user experience.
1.2.1 Motion Sensors:
Motion sensors are helpful for monitoring how a device moves, such as when it tilts,
shakes, rotates, or swings. Smartphones utilise an accelerometer to determine their position. The
motion sensors found in accelerometers can be utilised in medical equipment or to detect
earthquakes.
1.2.2 Positioning Sensors:
These sensors track a device's actual location. Whether your smartphone is in Portrait or
Landscape mode, or whatever way around. Magnetometers and orientation sensors are included
in this category.
1.2.3 Accelerometer:
It is most important sensor which should be available in every smartphone. It helps
phone to check its orientation. For Example, if you rotate your phone in landscape mode, then
all icons present on screen also moves to landscape mode, and when you want you can change
it into portrait mode, this is because of these sensors.

PAGE \* MERGEFORMAT 34
1.2.4 Gyroscope:

A gyroscope sensor is a tool that can measure and keep track of an object's rotation and
angular velocity. While accelerometers can only monitor linear motion, they can measure the tilt
and lateral orientation of the item. The terms "angular rate sensor" and "angular velocity sensor"
are also used to refer to gyroscope sensors. These sensors are put in situations where it is
challenging for humans to determine an object's orientation.
1.2.5 Pedometer:
A pedometer is a device that tracks and counts the steps a person takes while walking.
Pedometers are increasingly extensively used by fitness enthusiasts for fitness-related activities.
We were able to use smartphones in this research to develop pedometer functionality since the
majority of modern smartphones include an integrated accelerometer. We utilised this pedometer
in our project to cut the price of the Fitbit devices, which currently cost us between 5k-6K.
1.2.6 Magnetic Field Sensors:
By detecting the planar magnetic field using magnetic properties, the magnetic field
sensor can identify the direction and strength of the magnetic field. It is frequently used in
conventional compass or map navigation to help mobile phone users get accurate positioning.
The magnetic field sensor may be used to calculate the mobile phone's magnetic field
intensity in the x, y, and z directions. If you rotate your phone so that the value in only one
direction is not zero, it will indicate south. Numerous mobile phone compass applications use the
information from this sensor.

1.3 Deep Learning

One may classify deep learning as a subset of machine learning. It is a field that is built
on self-improvement through analysis of computer algorithms. While machine learning relies on
more basic ideas, deep learning makes the use artificial neural networks, which are created to
mimic how people learn and perceive. Up until recently, neural networks could only be as
intricate as their computational capacity would allow. Computers can now monitor, understand,
and respond to complicated events more quickly than people because of developments in big
data analytics that have made it possible to build larger, more complex neural networks. Image
recognition, language translation, and speech recognition have all benefited from deep learning.
Any pattern recognition issue can be resolved using it, and it does so automatically.

PAGE \* MERGEFORMAT 34
Deep learning is performed by artificial neural networks, which include numerous layers.
Such networks include deep neural networks (DNNs), where each layer is capable of carrying
out complicated operations like representation and abstraction to make sense of text, sound, and
image data. Deep learning, often regarded as the machine learning area with the greatest rate of
growth, is being employed by more and more firms to develop novel economic models.

Similar to how the human brain is composed of neurons, neural networks are layers of
nodes. Individual layer nodes are connected to neighboring layer nodes. Based on how many
layers the network has, it is deemed to be deeper. In the human brain, a single neuron gets
hundreds of impulses from other neurons. Signals go between nodes and apply appropriate
weights in an artificial neural network. A node with a higher weight will have an impact on the
nodes in the layer below it. The weighted inputs are compiled to create an output in the final
layer. Deep learning algorithms need advanced hardware because they process a lot of data and
perform several intricate mathematical calculations. However, training a neural network can be
challenging even with such sophisticated technology.

Large data sets are fed into deep learning systems since they need a lot of information to
get correct results. A sequence of binary true or false given inputs extremely complicated
mathematical calculations are used to classify the data as it is processed by artificial neural
networks.

For a human activity recognition, the deep learning algorithms are used for better results.
In the deep leaning there are a greater number of algorithms are present for Activity recognition.
They are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-
Term Memory (LSTM), Radial Basis Function Networks (RBFN), Multilayer Perceptron’s
(MLP). LSTM-2D CNN is best among all the other Models for Activity Recognition.

1.4 Long-Short Term Memory (LSTM)

The temporal relationship between sensor readings might be used by RNN. Although RNN
can extract temporal information from sequential data, its gradient disappearing problem makes
it difficult for the network to represent the relationship between raw sensor data and human
activity across a wide context window. This restriction might not exist with LSTM, an RNN
variant.

PAGE \* MERGEFORMAT 34
Due to its unique memory cells, LSTM outperforms convolutional neural networks in
terms of feature extraction from sequence data. In order to more effectively extract the temporal
characteristics from the sequence data, the input data in this work first goes through two layers of
LSTMs. The LSTMs have 32 memory cells per layer. To control the functioning of each memory
cell, the inputs are passed through different gates, such as input gates, forgetting gates, and
output gates.

1.5 Convolutional Neural Network (CNN)

The advancements in deep learning have been excelled due to CNN. It is deep learning
algorithm which can assign importance to the images and process them and differentiate them
from one another. They are very different from the ordinary neural networks as they are far more
efficient. A convolution neural network is the very simple application of a filter to an input that
results in activation. CNN is a branch of deep learning that exclusively deals with the image
recognition and image processing.

As proposed model mainly deals with the Human activity recognition using the hybrid
deep learning networks this approach has been considered for efficient and accurate results. They
are modeled for classifying specific arrays and images and also have multiple layers including
hidden layers. Hence this technique is very apt and had proven to be efficient in several previous
models.

A general CNN architecture consists of basic layers they are convolutional layer, max
pooling layer, dropout, fully connected layer and activation functions. This basic structure can be
modified by adding or subtracting the layers. Our model is constructed with 4 convolutional 2D
layers, batch normalization, max pooling layers, flatten layer and dense layer stacked on top of
one another. This is entirely trained with a data set of six activities like sitting, standing, walking,
laying, walking upstairs etc.

PAGE \* MERGEFORMAT 34
1.5.1 Convolution 2D Layer

The convolutional layer in CNN is the major building blocks which are generally used at
the beginning as it is the first layer. The convolution layer extracts high level features such as
edges from the input images. The convolved images are dimensionally reduced or increased. The
purpose of convolution layer is to construct the outputs for given inputs. The difference between
normal convolution layer and convolution 2D layer is that it takes 2D inputs whereas
convolution layer takes linear inputs. It also decreases the image size. 4 convolution 2D layers
are used to meet the requirements of this experiment with a dataset.

1.5.2. Batch Normalization

In deep learning, the batch normalization layer provides a standard the inputs for each
layer in a batch. Dropout rate is closely attached to batch normalization. Without regularizer,
batch normalization is less effective and is more likely to notice an activity's poor performance
and outcomes.

With the use of batch normalization, the training period's epoch count may be decreased
while also stabilizing the learning process for deep learning. Because it is also a regularizer, it
can decrease the internal covariate shift in the layers as well as instability between the layers.
The over fitting problem will be less severe the deeper the networks become. The utilization of
batch normalization in combination with other regularization strategies is usually beneficial.

PAGE \* MERGEFORMAT 34
By adding additional layers to the deep neural network, batch normalization is a
technique for speeding up and stabilizing neural networks. Additionally, it enables each layer to
learn independently on its own and continuously uses the output from the layer before the neural
network layer.

1.5.3. Max Pooling Layer

Max pooling layer is used for reducing the dimensions of the feature maps. It summarizes all the
parameters and reduces them. It also reduces the amount of learning time and computational
amount. It is used to size down the samples which is very useful for avoiding the over fitting
problem which is commonly seen in CNN algorithms. It also makes certain distortion invariant
of the model.

1.5.4. Flatten Layer

Flatten layer converts pooled feature maps into single column that is passed from fully connected
layer. This function does not include the batch size. It is always used after 2D convolution layers
in the CNN architecture. Its general purpose is to convert data into one dimensional data for the
next layers input which is finally connected to the fully connected layer.
1.5.5. Dense Layer
The outputs are completely fed from previous layers to the neurons with the help of
dense layer where one neuron can provide output to only one layer. The most basic layer in
neural networks is dense layer. It is also one of the hidden layers of CNN. It is the most common
and very frequently used layer. The difference between dense layer and convolution layer is that
former one provides the learning features from each and every previous layer whereas latter only
provides to very specific and selective fields.
The vanishing gradient problem is the main disadvantage of activation functions that
generally occurs due to the hyperbolic tangent and sigmoid activation functions. It is followed by
hidden layers and softmax layer. It usually performs a matrix vector multiplication whose values
used in matrix are actually parameters that can trained and updates with the help of back
propagation algorithm. The dense layer is used to change the dimensional vectors because the
output of dense layer is generally in an ‘m’ dimensional vector. It also applies the operations like
rotation, scaling, translation on the vector.

PAGE \* MERGEFORMAT 34
1.5.6. Activation Functions
The activation function applied in this architecture ReLu. The rectified linear unit (ReLu)
function is an activation function that works after the training is completed along with the
convolutional layers.

The ReLu gives the output from the input as negative or zero. The vanishing gradient
problem is the main disadvantage of activation functions that generally occurs due to the
hyperbolic tangent and sigmoid activation functions. This vanishing gradient problem can be
solved by using the ReLu function by letting the models learn more and perform better. This
ReLu is a default function used in every CNN and multilayer perception. This ReLu layer is used
in CNN after the convolution layer and before the max pooling layer. The major purpose of
using this function is to increase the non-linearity in the images that are being processed for the
training. The convolved image is where a ReLu function is applied in the entire CNN.

1.6. Transfer Learning

Transfer learning is mostly focused on knowledge storage which has been attained while
solving problems that it can be utilized to a completely different problem that is why it is
considered as a research problem as well. Let us consider an example the knowledge that has
been gained during the image recognition of two wheelers can be used in image recognition in
four wheelers. Transfer learning is a technique of using the pre trained models for other
algorithms and functions. They store the knowledge from previous model and utilize them with
some modifications on current model.

The transfer learning and python are mostly preferred combination for the LSTM-2D
CNN techniques. The major advantage of transfer learning is that it helps in avoiding the over
fitting problem that generally occurs in image processing and convolutional neural networks.
While modeling a second task in the transfer learning optimization it helps to improve the
performance as well as turbulent free progress. From the previously learned tasks it can improve
the modeling which can help in seamless algorithm building.

PAGE \* MERGEFORMAT 34
1.4 Motivation

Human activity recognition is an important and challenging research area with many applications
in the healthcare, smart environments and surveillance and security. Human activity recognition
is a field that specifically deals with this issue through the integration of sensing and reasoning,
in order to deliver context-aware data that can be employed to provide personalized support in
many applications.

As a simple example, imagine a smart home equipped with ambient sensors able to detect
people’s presence and the activation of household applinance

1.7 Organization of the Thesis

Before delving into the project work, we must first understand the technical aspects,
system requirements, and organization of the project report, which are detailed further below.
The project report is divided into five chapters and references.

 Chapter 2: Specifically, it provides a literature survey consisting of existing techniques.

 Chapter 3: presents the proposed work and technologies that are used in this study.
 Chapter 4: presents experimental results.
 Chapter 5: presents the conclusion and future work of this thesis.

PAGE \* MERGEFORMAT 34
CHAPTER 2

LITERATURE SURVEY

The literature study is primarily performed to evaluate the history of the present project,
which aids in identifying flaws in the current system and guidelines for resolving unresolved
problems. The following work discusses the project's history as well as the challenges and
limitations that led to the proposal of remedies and the work of this project.

As a result of the extensive research that has been conducted by researchers in

researching various sensing technologies, several methods for modelling and recognizing human
behavior have been published in recent years. The challenging task of recognizing human
activity in time based on received signal strength data in a wireless sensor network covers a wide
variety of study areas. For a variety of reasons, it is important to monitor on someone's
behaviour. Recently, it has been showed that a machine learning technique is capable of
automatically identifying human activities using raw sensor data. R. Pinky et al. proposed
supervised machine learning techniques are used to train three sensor nodes to detect the
received signal intensity of activities, while unsupervised machine learning is used to monitor
random activities and identify any abnormal behaviour of a person. Later, the model is enhanced
to detect human beings acting randomly.

I. N. Yulita et al. Human activity recognition (HAR) is a rapidly expanding area of study
with several uses. A wearable-based HAR system called the Magnetic Induction-based Human
Activity Recognition System (MI-HAR) has been proposed for collecting human movements and
identifying activities based on the gathered data. In this study, we mainly concentrated on the
performance examination of several machine learning classifiers utilising artificial MI-motion
data (signals based on magnetic induction). This research' primary goal is to assess how well six
popular classifiers perform in HAR applications. Additionally, we evaluated the categorization
performance obtained from MI-motion data with results obtained from similar research
employing accelerometer data. According to our findings, Random Forest had the greatest
performance on MI-motion data, scoring 91.5%.

PAGE \* MERGEFORMAT 34
Smart phone and smart watch sensors may be used to extract information about the user
context, notably the activities. Machine learning algorithms may classify human behavior’s using
raw data gathered from the sensors. Studies that concentrate on identifying mile activity typically
employ motion sensors like an accelerometer and gyroscope. M. C. Sorkun et al. proposed the
effectiveness of activity categorization when various sensors are applied individually or
collectively. We extract numerous characteristics from raw data using a dataset that was gathered
from fifteen people and included six distinct activities, and then supervised machine learning
techniques are used to train and validate the findings. Performance analysis is measured using
five distinct classifiers and several validation techniques.

Yu Zhao et al. proposed to use residual bidirectional long short-term memory (LSTM)
cells in a deep network design. One advantage of the new network is that a bidirectional link may
combine the forward state of positive time and the reverse state of negative time (backward
state). Second, remaining connections between stacked cells act as gradient highways, allowing
them to transmit underlying data straight to the top layer and therefore circumvent the gradient
vanishing problem. In general, the suggested network displays improvements on the spatial
(deeply stacked residual connections) and temporal (using bidirectional cells) dimensions,
aiming to increase the recognition rate. The accuracy was improved by 4.78% and 3.68%,
respectively, when evaluated using the Opportunity data set and the public domain UCI data set,
in comparison to earlier results. The public domain UCI data set's confusion matrix was then
analysed.

Sakorn Mekruksavanich et al. Proposed HAR framework for smartphone sensor data
based on time-series domains of Long Short-Term Memory (LSTM) networks. To examine the
effects of using various types of smartphone sensor data, four baseline LSTM networks are
compared. Additionally, a 4-layer CNN-LSTM hybrid LSTM network is suggested to enhance
recognition performance. On a public smartphone-based dataset of UCI-HAR, the HAR
technique is assessed using several configurations of sample generation methods (OW and
NOW) and validation protocols (10-fold and LOSO cross validation). Additionally, Bayesian
optimization methods are employed in this study since they are useful for fine-tuning each
LSTM network's hyperparameters. Compared to earlier state-of-the-art methods, the

PAGE \* MERGEFORMAT 34
experimental findings show that the proposed 4-layer CNN-LSTM network performs well in
activity recognition, increasing the average accuracy by up to 2.24%.

Pei Tang et al. proposed minimum redundancy and maximum relevance measure for the
purpose of recognizing human activity in smart home environments, the maximum relevance
(mRMR) algorithm (under D-R and D/R criteria) has been used to pick features and to build
various feature subsets based on observed motion sensor events. Following that, the chosen
feature subsets were assessed, and two probabilistic algorithms—the hidden Markov model and
the naive Bayes (NB) classifier—were used to compare the activity identification accuracy rates
(HMM). The experimental results demonstrate that not all characteristics are helpful for
recognizing human activity, and different feature subsets provide varied rates of accuracy for
recognizing human activity. Additionally, even the same feature subset has a distinct impact on
the accuracy rate of recognizing human activity for various activity classifiers. It is crucial for
researchers who are working on human activity recognition to take into account both the relation
between characteristics and actions as well as feature redundancy. Generally speaking, feature
selection and positive to activity recognition may be accomplished using both the maximal
relevance measure and the mRMR method.

Hong yang et al. proposed for the purpose of forecasting daily activity category and
occurrence time mutually and iteratively, a prediction model based on multi-task learning is
presented. First, a feature space of everyday activity is formed by pre-processing raw sensor
signals. As the forecast model, a convolutional neural network (CNN) and bidirectional long
short-term memory (Bi-LSTM) units are combined in a simultaneous multi-task learning model.
Finally, the suggested model is assessed using five different datasets. According to the
experimental findings, this model outperforms the most recent single-task learning models in
accuracy by at least 2.22% and in the metrics of NMAE, NRMSE, and R2 by at least 1.542%,
7.79%, and 1.69%, respectively. The average accuracy is 84%

Smart Homes are typically seen as the ultimate answer to all livability issues, particularly those
involving the care of the elderly and disabled, energy conservation, etc. The secret to home
automation in smart homes is human activity recognition, which enables the smart services to
operate automatically in accordance with human thought.

PAGE \* MERGEFORMAT 34
Although a lot of recent research has been done in this area, much of it can only identify
default actions, which is probably not what smart home services need. Furthermore, because to
insufficient scalability, such research cannot be used outside of the lab. In this paper, we unravel
this problem and Yegang Du et al. proposed a novel framework to not only identify but also
anticipate human behaviour. The framework has three stages: activity prediction in advance,
activity recognition during the activity, and activity recognition after the activity. The hardware
cost of our framework, which uses passive RFID tags, is also sufficiently low to make it widely
used. Additionally, the outcome of the experiment shows that our framework is very scalable and
can achieve good performance in both activity detection and prediction.

In a more intelligent interactive cognitive environment, Human Activity Recognition is

also utilized to identify movement problems. In order to deal with the postural change and
overcome the classification of activities outside of the pre-made class, Ortiz et al. proposed the
TAHAR (Transition-Aware Human Activity Recognition) framework. The supervised learning
technique is often used to gather recordings of human behaviour, while semi-supervised and
unsupervised methods have also been proposed. Predictive models like binary decision trees and
threshold-based classifiers are used in some supervised learning techniques using frequentist and
Bayesian models. These models comprise probabilistic classification techniques including Naive
Bayes, Hidden Markov Models, Artificial Neural Networks (ANN), and Support Vector
Machines (SVM). The average accuracy for SVM, KNN, RF is 87%, 83%, 86%.

Agarwal et al proposed lightweight deep learning model for HAR and put into use on a
Raspberry Pi3. A shallow RNN and the LSTM algorithm were used to generate this model.
Although just one dataset with six activities was evaluated, the recommended model is fairly
accurate and has a straightforward design, which does not demonstrate how well it may be
extended. In the study [16], recurrent neural networks, neural networks, and a deep learning
combination of inception and the model (InnoHAR) are utilised to categorise activities. The
authors used separate convolution in place of traditional convolution, which proved effective for
its intended use in model settings. The results are great; however, it took a while for the model to
barely converge during the duration of the learning phase.

PAGE \* MERGEFORMAT 34
Davide buffelli and Fabio vandin et al proposed an innovative deep learning framework,
TrASenD, which is based only on an attention-based mechanism, outperforms a novel deep
learning framework based on state-of-the-art. We show that our proposed attention-based
architecture outperforms earlier approaches with an average accuracy improvement of more than
7% over the previous best-performing model. We also consider the problem of modifying HAR
deep learning models, which is essential in many applications. The average accuracy is 84%.

PAGE \* MERGEFORMAT 34
CHAPTER 3

PROPOSED METHODOLOGY

This chapter describes the methodology followed in the project. In the first stage we are
going to work with a dataset which we will use to train a Weapon Detection. The YOLO (You
Only Look Once) series is one of the most advanced object detection models. In contrast to
other region proposal-based techniques, it divides the input image into a S x S grid and then
predicts the probability and bounding boxes for an object whose centre falls into a grid cell.

The target of this experiment is to identify weapons in CCTV footage. An object

detection model is deployed in this study. The model has two categories for object detection.
Recurrent convolutional neural network (RCNN) is the first, while YOLO (You only Look
Once) series is the second. The YOLO series is suited for individuals which can learn quickly
and with high accuracy. The variants in this YOLO series are v1, v2, v3, v4, and v5. The newest
version is now YOLOV5. YOLOv5, which is a lot more portable and user-friendly than previous
versions, is the first of the YOLO models to be developed on the PyTorch framework instead of
Darknet. There are four model variants for YOLOV5. They are YOLOV5s(Small),
YOLOV5m(Medium), YOLOV5l(Large), and YOLOV5x (Extra-large). It is simple to
implement YOLOV5s in all of them. Because YOLOV5s uses less storage space and provides
good accuracy, it is implemented in this work. YOLOV5s follows the bounding box regression
technique for object detection. Each bounding box in the image has the following attributes:
Bounding box centre (Bx, By), Class(C) like pistol, knife and gun, Width (Bw), Height (Bh) and
Pc represents if an object exists in each grid, its value is 1, otherwise 0.

The YOLO principle is Y= (C, Bw, Bh, Bx, By, Pc)

3.1 Study of the System

The suggested system's main goal is to detect weapons in CCTV footage by using Deep
Learning technique. To train and validate the classifier, all of these technique make use of
labelled data. Finally, by developing the model then we detect the weapons in CCTV footage.

It involves following steps to develop a deep learning model to solve problem.

They are:

PAGE \* MERGEFORMAT 34
 Collect the data needed for the train and test the model.
 Preprocess the data to eliminate the unnecessary information from the data and split the
data into train and test set.
 Determine the structure of the learned function and the associated learning algorithm.
 Design the model.
 Evaluate the accuracy, precision, recall and F1 score of estimator.

3.2 System Architecture of YOLOV5

YOLO, which stands for "You Only Look Once," is an object detection technique that
organizes images into a grid. The task of finding objects inside one's own cell falls to each grid
cell. Due to its efficiency and precision, YOLO is among the most well-known object detection
algorithms. In this thesis YOLOV5 algorithm used for weapon detection.

Figure 3.1 Architecture of YOLOV5

PAGE \* MERGEFORMAT 34
YOLOV5 contains three components in it’s architecture as shown in the figure
3.1 they are: Model Head, Model Neck, and Model Backbone. Model Backbone's main goal is to
pull out important details from an input data. To take a source image and extract meaningful,
significant features from it. In YOLO v5, Cross Stage Partial Networks (CSPDarknet) serve as
the framework for extracting detailed information from the images. Model Neck produces
feature pyramids as its main objective. Models can generally scale images well according to
feature pyramids. It is useful to be able to detect the same object in various scales and sizes. On
unobserved data, feature pyramid models perform well. Other models, such as Feature Pyramid
Network (FPN), BiFPN, and Path Aggregation Network (PANet) used for other feature pyramid
methodologies. The final detecting stage is primarily carried out using the model Head. It used
bounding box to apply to the features and generated final output vectors with bounding boxes,
objectness scores, and confidence score.

3.3 Software Requirements

We will discuss the Software that we have used. When discussing Deep learning, it is
typical to utilize Python as the primary programming language. That is the first tool we will
pretend to utilize and for YOLOV5 PyTorch is used.

3.3.1 Python

Python includes a number of built-in libraries. Many of the libraries are related to AI and
machine learning. Tensorflow (a high-level neural network framework), Scikit-Learn (for data
mining, data analysis, and machine learning), and others are among the most popular. The list
goes on indefinitely. Python offers a simple OpenCV implementation. Python's popularity stems
from its strong yet simple implementation. For other languages, students and researchers must
first learn the language before attempting ML or AI with it. Python, on the other hand, is not like
this. Tensorflow is one of the most crucial Python libraries that we will use.

3.3.2 PyTorch

The Torch library-based machine learning framework PyTorch was created by Meta AI and is
now a member of the Linux Foundation. It is used for applications like computer vision and
natural language processing. It is open-source software that is available for free under a modified

PAGE \* MERGEFORMAT 34
BSD licence. PyTorch features a C++ interface, even though the Python interface is more refined
and the main focus of development.

Many deep learning applications, including Tesla Autopilot, Pyro from Uber, Transformers from
Hugging Face, PyTorch Lightning, and Catalyst, are built on top of PyTorch. PyTorch offers the
following two top features:

1. Tensor computation (similar to NumPy) with significant GPU acceleration (GPU)

2. Deep cnn based on a tape-based automatic differentiation mechanism

3.3.3 Tensorflow

Tensorflow is a high-performance numerical calculation software package that is open

source. Its adaptable design enables simple computing deployment over a wide range of
platforms (CPUs, GPUs, TPUs), from PCs to clusters of servers to mobile and edge devices. It
was created by researchers and engineers from Google's AI organization's Google Brain team,
and it comes with significant support for machine learning and deep learning, as well as a
versatile numerical computing core that is utilised across many different scientific areas. This
library serves as a backend in this project, and another library runs on top of it. It is called Keras.

3.3.4 Keras

Keras is a Python-based open source neural network library. It can run on top of
Tensorflow as well as alternative backends. It is user-friendly, modular, and expandable, with the
goal of enabling rapid experimentation with deep neural networks. It is incapable of doing low-
level operations like as tensor products, convolutions, and so on. That is why the task is
delegated to the backend (such as Tensorflow, which does the job perfectly).

Keras includes a plethora of implementations of frequently used neural network building

components such as layers, goals, activation functions, optimizers, and a slew of tools for
working with picture and text data. The code is hosted on GitHub, and the GitHub problems page
is included in the community help forums. Furthermore, it includes various datasets that can be
simply imported, such as MNIST or CIFAR10.

PAGE \* MERGEFORMAT 34
3.3.5 Numpy, Matplotlib and Scikit-Learn

There are other additional libraries that we may import and use. Numpy, as you may or
may not be aware, is one of the most popular Machine Learning libraries. Numpy includes
support for huge, multi-dimensional arrays and matrices, as well as a wide set of high-level
mathematical functions for working with these arrays.

Matplotlib is one of the most popular and capable frameworks for data visualisation.
Matplotlib is a Python 2D plotting package that generates high-quality figures in a range of
hardcopy and interactive formats across platforms. Matplotlib is a Python library that may be
used in Python scripts, Python and IPython shells, Jupyter notebooks, web application servers,
and four graphical user interface toolkits. With just a few lines of code, you could make plots,
histograms, power spectra, bar charts, error charts, scatterplots, and so on.

Scikit-Learn is a Python-based free software machine learning library. It includes support

vector machines, random forests, gradient boosting, k-means, and DBSCAN as classification,
regression, and clustering algorithms, and is designed to work with the Python numerical and
scientific libraries NumPy and SciPy.

3.3.6 Google Colaboratory

Colaboratory is a Google research initiative that was developed to spread machine

learning research and teaching resources. It's a Jupyter Notebook environment that requires no
setup and operates entirely in the cloud. Collaboratory notebooks are saved in Google Drive and
may be shared in the same way that Google Sheets and Google Docs are. Colaboratory is a no-
cost service.

Colaboratory supports the execution of Python 2 and Python 3 programmes, as well as

the execution of Tensorflow and the visualisation of certain graphs using matplotlib, among
other things.

The major reason we'll utilise this programme is that you have access to a strong graphics
card, which will allow us to run deep neural networks faster. The Nvidia Tesla K80 will be used
as the graphics card. It enables you to execute your code continuously for twelve hours.

PAGE \* MERGEFORMAT 34
3.4 Hardware Requirements

In deep learning hardware requirements gives the performance results based on the
computational parameters. Here is the list of hardware tools that are considered for implementing
the system.

CPU : i5-8250U CPU@ 1.60 GHz or high

RAM : 8 GB or higher

GPU : Nvidia Tesla K80

3.5 YOLOV5 Model Building

In this work YOLOV5 algorithm is used for object detection. YOLOV5 algorithm uses
the Pytorch for object detection. YOLOV5 repository can be downloaded at
https://github.com/ultralytics/yolov5. Which is the repository's official home page. It is used to
the training of particular trained objects in images and videos. Depending on the confidence
value, the system detects any weapons present in the images and videos when the training is
over.

3.5.1 Learning Rate

The learning rate of a system can be described as the parameter that affects the
model modification with respect to the errors occurred after change of particular values or
weights. The choice of the optimal learning rate is crucial for a model because, as seen in figure
3.2

PAGE \* MERGEFORMAT 34
Figure 3.2 Effect of Learning Rate on Deep Learning

A poor choice of learning rate can result in a very time-consuming training process and a
poor choice of learning rate can result in a process that is unstable or moves too quickly. As a
result, when training the model and obtaining the final result, our model adjusted the learning
rate with 0.001 as the initial learning rate.

3.5.2 YOLO labeling format

The majority of annotation platforms allow for the output of one text file per image in the
YOLO labelling format. For each object in the image, a bounding-box (BBox) annotation is
present in a separate text file. The annotations, which range from 0 to 1, are normalized to the size
of the image. The format in which they are given is as follows:

< object-class-ID> <X center> <Y center> <Box width> <Box height>

3.5.3 Configuration files

Three YAML files that are included with the repo contain the configurations for the
training. Depending on the work, we will modify these files to meet our specific requirements.

 The dataset parameters are described in the data-configurations file. The paths to the train,
validation, and test (optional) datasets, the number of classes (nc), and the names of the
classes in the same order as their index will all need to be added to this file since we are
training on our own unique dataset.

 The model architecture is determined by the model-configurations file. The P5 models that
Ultralytics supports include the following YOLOv5 architectures: YOLOv5n (nano),
YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and YOLOv5x (extra large).
These architectural designs work well for training using 640x640 pixel images. Additional
series, known as P6, that are designed for training with larger images of 1280*1280
(YOLOv5n6, YOLOv5s6, YOLOv5m6, YOLOv5l6, YOLOv5x6). An additional output
layer is included in P6 models for the detection of larger objects. They reap the greatest
advantages from training at higher resolution and deliver superior outcomes.For each of the
aforementioned architectures, Ultralytics offers built-in model configuration files in
the'models' directory. If you're starting from scratch and want to train a model, select the

PAGE \* MERGEFORMAT 34
model-configurations YAML file for the architecture you want (in this tutorial, it's
"YOLOv5s6.yaml"), then change the number of classes (nc) parameter to reflect the number
of classes that should be present in your custom data.

 The learning rate, momentum, losses and augmentations are all defined in the
hyperparameters-configurations file along with other training-related hyperparameters. The
directory "data/hyp/hyp.scratch.yaml" contains a default hyperparameters file provided by
Ultralytics. For the most part, starting your training with the default hyperparameters is
advised to establish a performance baseline.

3.5.4 Training

The model will perform best when it is trained entirely from scratch with a sufficiently
large dataset. In the thesis the dataset contains 2536 images with three classes. By giving the
weights argument an empty string (' '), the weights are initialised at random. In the training we
provide the number of epochs, batch size, dataset path, initiation weights and image size. It does
the training and then it provides the results as accuracy, precision and recall scores. In the thesis
the number of epochs are 50, the batch size is 24 and the image size is 640 X 640.

3.5.5 Validation

The validation script will be used to assess our model. The 'task' option controls whether
performances are evaluated across the training, validation, or test dataset divides.

3.5.6 Transfer Learning

It is anticipated that transfer learning will lead to better outcomes than traditional training.
Although there is support for additional pre-trained models, Ultralytic's default model was pre-
trained over the COCO dataset (VOC, Argoverse, VisDrone, GlobalWheat, xView, Objects365,
SKU-110K). COCO is a dataset for object detection that includes pictures of commonplace
settings. There are 80 classes in it. By giving the name of the pre-trained COCO model to the
'weights' argument, our model will be initialized with weights from that model. The pre-trained
model will be downloaded automatically.

PAGE \* MERGEFORMAT 34
3.5.7 Feature Extraction

The backbone layer, which acts as a feature extractor, and the head layer, which computes
the output predictions, make up the two fundamental components of a model. To further
compensate for a small dataset size, we’ll utilize the same backbone as the pretrained COCO
model, and simply train the model’s head. The "freeze" parameter will fix the 12 layers that make
up the YOLOv5s6 backbone.

3.5.8 Fine Tuning

The last potential training phase, called fine-tuning, is unfreezing the entire model we
previously got and retraining it using our data at a very slow learning rate. By gradually adjusting
the pretrained features to the new data, this has the potential to produce significant improvements.
The hyperparameters-configurations file has an adjustment for the learning rate parameter. We'll
use the hyperparameters from the built-in "hyp.finetune.yaml" file, which have a much slower
learning rate than the default, for the tutorial demonstration. The weights that were saved in the
previous stage will be used as the initial weights.

3.6 YOLOV5 algorithm Steps

Steps for weapon detection using Yolov5

1. Take the input data

2. Preprocess the data

3. Split the data into train and test sets

4. Import all yolov5 modules

5. Train dataset using yolov5

6. Validate the data using yolov5

7. Run different commands towards detect

PAGE \* MERGEFORMAT 34
3.7 Summary

This section focuses on a few key requirements, such as system architecture for the
proposed system, dataset used for the project, hardware and software requirements, training,
validation, Transfer learning and Fine Tuning.

CHAPTER 4

Experimental Evaluation

This chapter discusses the performance of estimator results or outputs obtained by

executing the proposed system on a dataset comprising a collection of features, which includes
data sets as inputs and graphs as outputs. The project code is written entirely by using Py Torch
with help of Google Colab.

4.1 Dataset

The dataset used for this study includes weapon dataset with 2536 images. It
contains images of the knife class, pistol class and axe class. Two sets labels and images
constitute the dataset. The information needed to detect weapons is gathered from publicly

PAGE \* MERGEFORMAT 34
accessible websites, CCTV videos on YouTube, GitHub repositories, and the imfdb.org online
library of movie firearms. In the dataset remove the noisy data by using image restoration and
reshape the image size according to YOLO format. There are training and testing sets for the
dataset. This dataset is utilized for training in the proportion of 80%, and testing in the proportion
of 20%. In YOLOV5 dataset consists of images and labels. These labels include the bounding
box coordinates. YOLOV5 included a yaml file. The dataset information such as number of
class, class names and dataset path are provided to the yaml file.

4.2 Evaluation Metrics

In this thesis mainly four Evaluation Metrics were used for detecting the weapons in cctv
footage. They are Accuracy, Precision, Recall and F1-Score.

TP+ FP
A ccuracy= (1)
TP+ FP+TN + FN

TP
Precision= (2)
TP+ FP

TP
Recall= (3)
TP+ FN

2∗Precision∗Recall
F 1 Score= (4)
Precision+ Recall

4.3 Experimental Results

All of the experiments in this paper are performed out using 4GB RAM, an Intel Core i5,
5th generation CPU, and a Google Collaborator GPU with 4GB memory. The YOLOV5 system
was trained using 50 epochs, a batch size of 24, and a learning rate of 0.001 to identify weapons
in videos and images.

PAGE \* MERGEFORMAT 34
The mean average precision of the weapon detection system using the YOLOV5
algorithm is shown in Figure 4.1. It is clear that the accuracy has improved. The system's
accuracy was close to 96.6%. Figure 4.2 shows how precise the system is. The model achieved
98% precision in our work. Figure 4.3 depicts the system recall and it is almost 95.7%.

Figure 4.1 Accuracy

PAGE \* MERGEFORMAT 34
Figure 4.2 Precision

Figure 4.3 Recall

Figure 4.4, 4.5 and 4.6 represents the training box, class and object losses. Where box
loss decreases with increase in epochs, there is no class loss in the model and the object loss also
decreases with increase in epochs.

Figure 4.4 Training box loss

PAGE \* MERGEFORMAT 34
Figure 4.5 Training class loss

Figure 4.7, 4.8 and 4.9 represents the Validation box, class and object losses.
Where box loss decreases with increase in epochs so that the box loss is limited, there is no class
loss in the validation and the object loss also decreases with increase in epochs. So, the model is
fit for detecting weapons in images and videos.

Figure 4.6 Training object loss

PAGE \* MERGEFORMAT 34
Figure 4.7 Validating box loss

Figure 4.8 Validating class loss

PAGE \* MERGEFORMAT 34
Figure 4.9 Validating object loss

4.4 Output Results

PAGE \* MERGEFORMAT 34
Figure 4.10 Output for Video

Figure 4.11 Output for Pistol

PAGE \* MERGEFORMAT 34
Figure 4.12 Output for Axe

Figure 4.13 Output for Knife

PAGE \* MERGEFORMAT 34
Figure 4.10 represents output obtained when the video input is processed as 91%
confidence ratio. Figure 4.11 represents the output of pistol detection in images with a
confidence ratio of 95%. Figure 4.12 represents the output of axe detection in images with a
confidence ratio of 77%. Figure 4.13 represents the output of knife detection in images with a
confidence ratio of 79%.

4.5 Summary

This chapter provides a comprehensive examination of the obtained results for the
specified data set, and it is carried out in a sequential way. The experimental findings of Deep
Learning Technique that is YOLOV5 algorithm are shown in the graph above; they provide
higher performance metrics.

PAGE \* MERGEFORMAT 34
CHAPTER 5

CONCLUSION AND FUTURE WORK

An effective real-time automatic weapon detection system has been proposed in

this study to monitor and control applications. Three classes—the class of knives, the class of
pistols and the class of axes has been used in this study. In comparing to all earlier YOLO series,
the implementation of YOLOV5 algorithm improves accuracy, precision, and recall, which
raises the F1 score as well. For all kinds of images and videos, the model achieved a mean
average precision (mAP) score of 96.6% and an F1-score of 96%.

As there is still opportunity for improvement, future attempts will concentrate on

lowering false positives & negatives. Further efforts to increase the F1 score may also include
adding more classes or objects.

PAGE \* MERGEFORMAT 34
REFERENCE

[1] M. T. Bhatti, M. G. Khan, M. Aslam and M. J. Fiaz, "Weapon Detection in Real-Time

CCTV Videos Using Deep Learning," in IEEE Access, vol. 9, pp. 34366-34382, 2021,
doi: 10.1109/ACCESS.2021.3059170.
[2] S. Xu and K. Hung, "Development of an AI-based System for Automatic Detection and
Recognition of Weapons in Surveillance Videos," 2020 IEEE 10th Symposium on
Computer Applications & Industrial Electronics (ISCAIE), 2020, pp. 48-52, doi:
10.1109/ISCAIE47305.2020.9108816.
[3] J. Ruiz-Santaquiteria, A. Velasco-Mata, N. Vallez, G. Bueno, J. A. Álvarez-García and O.
Deniz, "Handgun Detection Using Combined Human Pose and Weapon Appearance," in
IEEE Access, vol. 9, pp. 123815-123826, 2021, doi: 10.1109/ACCESS.2021.3110335.
[4] Atharv Belurkar , Ashish Waghmare , Sahil Mallick , Nikhil Waghamode , Prof. Reshma
Totare “Weapon detection using YOLOV4 and CNN” in access ISSN: 2321-9653; IC
Value: 45.98; SJ Impact Factor: 7.538 Volume 10 Issue IV Apr 2022- Available at
www.ijraset.com doi:https://doi.org/10.22214/ijraset.2022.41702
[5] W. E. I. B. W. N. Afandi and N. M. Isa, "Object Detection: Harmful Weapons Detection
using YOLOv4," 2021 IEEE Symposium on Wireless Technology & Applications
(ISWTA), 2021, pp. 63-70, doi: 10.1109/ISWTA52208.2021.9587423.
[6] N. Hnoohom, P. Chotivatunyu, N. Maitrichit, V. Sornlertlamvanich, S. Mekruksavanich
and A. Jitpattanakul, "Weapon Detection Using Faster R-CNN Inception-V2 for a CCTV
Surveillance System," 2021 25th International Computer Science and Engineering
Conference (ICSEC), 2021, pp. 400-405, doi: 10.1109/ICSEC53205.2021.9684649.
[7] H. Jain, A. Vikram, Mohana, A. Kashyap and A. Jain, "Weapon Detection using Artificial
Intelligence and Deep Learning for Security Applications," 2020 International Conference
on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 193-198, doi:
10.1109/ICESC48915.2020.9155832.
[8] R. M. Alaqil, J. A. Alsuhaibani, B. A. Alhumaidi, R. A. Alnasser, R. D. Alotaibi and H.
Benhidour, "Automatic Gun Detection From Images Using Faster R-CNN," 2020 First
International Conference of Smart Systems and Emerging Technologies (SMARTTECH),
2020, pp. 149-154, doi: 10.1109/SMART-TECH49988.2020.00045.

PAGE \* MERGEFORMAT 34
[9] T. S. S. Hashmi, N. U. Haq, M. M. Fraz and M. Shahzad, "Application of Deep Learning
for Weapons Detection in Surveillance Videos," 2021 International Conference on Digital
Futures and Transformative Technologies (ICoDT2), 2021, pp. 1-6, doi:
10.1109/ICoDT252288.2021.9441523.
[10] Singh, T. Anand, S. Sharma and P. Singh, "IoT Based Weapons Detection System for
Surveillance and Security Using YOLOV4," 2021 6th International Conference on
Communication and Electronics Systems (ICCES), 2021, pp. 488-493, doi:
10.1109/ICCES51350.2021.9489224.
[11] K. Ding, X. Li, W. Guo and L. Wu, "Improved object detection algorithm for drone-
captured dataset based on yolov5," 2022 2nd International Conference on Consumer
Electronics and Computer Engineering (ICCECE), 2022, pp. 895-899, doi:
10.1109/ICCECE54139.2022.9712813.
[12] L. Xiaomeng, F. Jun and C. Peng, "Vehicle Detection in Traffic Monitoring Scenes
Based on Improved YOLOV5s," 2022 International Conference on Computer Engineering
and Artificial Intelligence (ICCEAI), 2022, pp. 467-471, doi:
10.1109/ICCEAI55464.2022.00103.
[13] M. Jindal, N. Raj, P. Saranya and S. V, "Aircraft Detection from Remote Sensing Images
using YOLOV5 Architecture," 2022 6th International Conference on Devices, Circuits
and Systems (ICDCS), 2022, pp. 332-336, doi: 10.1109/ICDCS54290.2022.9780777.
[14] J. Zhou, M. Yan, C. Luo and X. Xing, "Underwater Sonar Target Detection Based on
YOLOv5," 2021 International Conference on Electronic Information Engineering and
Computer Science (EIECS), 2021, pp. 729-732, doi: 10.1109/EIECS53707.2021.9588050.
[15] Kisaezehra, M. U. Farooq, M. A. Bhutto and A. K. Kazi, "Real-time safety helmet
detection using yolov5 at construction sites," Intelligent Automation & Soft Computing,
vol. 36, no.1, pp. 911–927, 2023.
[16] Li Z, Song J, Qiao K, Li C, Zhang Y and Li Z (2022) Research on ecient feature
extraction: Improving YOLOv5 backbone for facial expression detection in live streaming
scenes. Front. Comput. Neurosci. 16:980063. doi: 10.3389/fncom.2022.980063
[17] W. Liu, Y. Hu and D. Fan, "Safety Helmet Wearing Recognition Based on Improved
YOLOv5," 2022 11th International Conference of Information and Communication
Technology (ICTech)), 2022, pp. 466-470, doi: 10.1109/ICTech55460.2022.00099.

PAGE \* MERGEFORMAT 34
CONFERENCE CERTIFICATE

PAGE \* MERGEFORMAT 34

2 Marks Question Bank
50% (2)
2 Marks Question Bank
18 pages
How To Speak AI 1719345908
100% (1)
How To Speak AI 1719345908
102 pages
(2021) Attention-Based Sensor Fusion For Human Activity Recognition Using IMU Signals
No ratings yet
(2021) Attention-Based Sensor Fusion For Human Activity Recognition Using IMU Signals
32 pages
Thesis On Wearable Technology
100% (3)
Thesis On Wearable Technology
5 pages
Deep Learning Lab Manual
100% (10)
Deep Learning Lab Manual
30 pages
Human Activity Recognition Using Smart Phone
100% (1)
Human Activity Recognition Using Smart Phone
8 pages
Lecture06 Mobile Sensing
No ratings yet
Lecture06 Mobile Sensing
51 pages
HARfinal 1
No ratings yet
HARfinal 1
41 pages
Informatics 05 00027 PDF
No ratings yet
Informatics 05 00027 PDF
37 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
No ratings yet
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
25 pages
Expert Systems With Applications: Review
No ratings yet
Expert Systems With Applications: Review
29 pages
A Systematic Review of Human Activity Recognition Based On Mobile Devices Overview Progress and Trends
No ratings yet
A Systematic Review of Human Activity Recognition Based On Mobile Devices Overview Progress and Trends
40 pages
Human Activity Recognition Using ML Techniques
100% (1)
Human Activity Recognition Using ML Techniques
5 pages
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
No ratings yet
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
30 pages
8 Ijmperdjun20198
No ratings yet
8 Ijmperdjun20198
14 pages
A Survey On Activity Detection and Classification Using Wearable Sensors
No ratings yet
A Survey On Activity Detection and Classification Using Wearable Sensors
18 pages
Human Activity Resources Project
No ratings yet
Human Activity Resources Project
64 pages
Sensors: Human Physical Activity Recognition Using Smartphone Sensors
No ratings yet
Sensors: Human Physical Activity Recognition Using Smartphone Sensors
18 pages
Sensors 19 00458
No ratings yet
Sensors 19 00458
19 pages
Sensors 19 01716
No ratings yet
Sensors 19 01716
20 pages
SM3315
No ratings yet
SM3315
19 pages
Smartphone Sensor Technologies
No ratings yet
Smartphone Sensor Technologies
17 pages
Sensors 22 00174
No ratings yet
Sensors 22 00174
19 pages
Recent Trends in Machine Learning For Human Activity Recognition - A Survey
No ratings yet
Recent Trends in Machine Learning For Human Activity Recognition - A Survey
16 pages
Sensors 19 03731
No ratings yet
Sensors 19 03731
20 pages
Abinaya (Hybrid Posture) 01.01
No ratings yet
Abinaya (Hybrid Posture) 01.01
13 pages
Human Activity Recognition Based On Acceleration Data From Smartphones Using HMMs
No ratings yet
Human Activity Recognition Based On Acceleration Data From Smartphones Using HMMs
16 pages
Thesis Report
No ratings yet
Thesis Report
14 pages
A Public Domain Dataset For Real-Life Human Activi
No ratings yet
A Public Domain Dataset For Real-Life Human Activi
14 pages
Electronics 10030308
No ratings yet
Electronics 10030308
21 pages
Papee 2
No ratings yet
Papee 2
19 pages
Sensors: Deep Convolutional and LSTM Recurrent Neural Networks For Multimodal Wearable Activity Recognition
No ratings yet
Sensors: Deep Convolutional and LSTM Recurrent Neural Networks For Multimodal Wearable Activity Recognition
25 pages
Human Activity Recognition System For Moderate Per
No ratings yet
Human Activity Recognition System For Moderate Per
18 pages
Optimizing Physical Activity Recognition Using LSTM Network
No ratings yet
Optimizing Physical Activity Recognition Using LSTM Network
14 pages
Enhancing Smartphone Motion Sensing With Embedded Deep Learning
No ratings yet
Enhancing Smartphone Motion Sensing With Embedded Deep Learning
15 pages
TII Deep Learning PA Accepted
No ratings yet
TII Deep Learning PA Accepted
12 pages
Seminar
No ratings yet
Seminar
22 pages
Deep Learning Models For Real-Time Human Activity Recognition
No ratings yet
Deep Learning Models For Real-Time Human Activity Recognition
13 pages
Basic Activity Recognition From Wearable
No ratings yet
Basic Activity Recognition From Wearable
20 pages
Accelerometer Gyroscope2020
No ratings yet
Accelerometer Gyroscope2020
6 pages
Unit 2
No ratings yet
Unit 2
11 pages
Vmdjoo2016 Reload Xlts
No ratings yet
Vmdjoo2016 Reload Xlts
6 pages
Human Activity Recognition Models Using Limited Consumer Device Sensors and Machine Learning
No ratings yet
Human Activity Recognition Models Using Limited Consumer Device Sensors and Machine Learning
6 pages
Oluwalade2021Human Preprint
No ratings yet
Oluwalade2021Human Preprint
6 pages
Hassan 2018
No ratings yet
Hassan 2018
7 pages
Temporal Approaches For Human Activity Recognition Using Inertial Sensors, GARCIA, FA
No ratings yet
Temporal Approaches For Human Activity Recognition Using Inertial Sensors, GARCIA, FA
6 pages
Foul Legacy
No ratings yet
Foul Legacy
17 pages
Human Activity Recognition With Convolutional Neural Networks
No ratings yet
Human Activity Recognition With Convolutional Neural Networks
12 pages
Ijisav11n1spl 25
No ratings yet
Ijisav11n1spl 25
7 pages
IT-8 Major Project
No ratings yet
IT-8 Major Project
7 pages
Smart-Wearable Sensors and CNN-BiGRU Model A Powerful Combination For Human Activity Recognition
No ratings yet
Smart-Wearable Sensors and CNN-BiGRU Model A Powerful Combination For Human Activity Recognition
12 pages
HumanActivity Recognition Deep Learning
No ratings yet
HumanActivity Recognition Deep Learning
6 pages
Smartwatch-Based Human Activity Recognition Using Hybrid LSTM Network
No ratings yet
Smartwatch-Based Human Activity Recognition Using Hybrid LSTM Network
4 pages
Synopsis
No ratings yet
Synopsis
4 pages
Ineuron 12mnths
No ratings yet
Ineuron 12mnths
26 pages
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
No ratings yet
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
6 pages
Early ESP Detection Failure Using AI
No ratings yet
Early ESP Detection Failure Using AI
36 pages
Deep Learning For Sensor-Based Activity Recognition
No ratings yet
Deep Learning For Sensor-Based Activity Recognition
1 page
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Introduction To C# - Unit 1
No ratings yet
Introduction To C# - Unit 1
28 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
CD Jntuk
No ratings yet
CD Jntuk
2 pages
CNN-AttBiLSTM Mechanism A DDoS Attack Detection Method Based On Attention Mechanism and CNN-BiLSTM
No ratings yet
CNN-AttBiLSTM Mechanism A DDoS Attack Detection Method Based On Attention Mechanism and CNN-BiLSTM
10 pages
AI and Robotics Complete Practice Set
No ratings yet
AI and Robotics Complete Practice Set
48 pages
Computer Organization& Architercture-Syllabus
No ratings yet
Computer Organization& Architercture-Syllabus
3 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
Soft Computing Techniques (PE)
No ratings yet
Soft Computing Techniques (PE)
2 pages
Artificial Intelligence For Drug Development Precision Medicine and Healthcare 1st Edition by Mark Chang ISBN 0367362929 978-0367362928
No ratings yet
Artificial Intelligence For Drug Development Precision Medicine and Healthcare 1st Edition by Mark Chang ISBN 0367362929 978-0367362928
42 pages
Review of DL in Minimally Invasive Surgery
No ratings yet
Review of DL in Minimally Invasive Surgery
21 pages
An Ingression Into Deep Learning - FP
No ratings yet
An Ingression Into Deep Learning - FP
17 pages
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
No ratings yet
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
6 pages
Recurrent Neural Networks For Time Series Forecasting
No ratings yet
Recurrent Neural Networks For Time Series Forecasting
22 pages
Vietnamese Sentiment Analysis
No ratings yet
Vietnamese Sentiment Analysis
27 pages
A Novel Hybrid Deep Learning Architecture For Dynamic Hand Gesture Recognition
No ratings yet
A Novel Hybrid Deep Learning Architecture For Dynamic Hand Gesture Recognition
14 pages
Buffelli 2021
No ratings yet
Buffelli 2021
10 pages
Electronics 11 01125 v2
No ratings yet
Electronics 11 01125 v2
31 pages
Ethical Hacking Syllabus
No ratings yet
Ethical Hacking Syllabus
2 pages
Fsad LM 3
No ratings yet
Fsad LM 3
56 pages
Deep Learning Based On PINN For Solving 2 D0F Vortex Induced
No ratings yet
Deep Learning Based On PINN For Solving 2 D0F Vortex Induced
24 pages
Stock Prediction Report
No ratings yet
Stock Prediction Report
27 pages
An Empirical Evaluation of Generic Convolutional and Recurrent Networks For Sequence Modeling
No ratings yet
An Empirical Evaluation of Generic Convolutional and Recurrent Networks For Sequence Modeling
14 pages
A Soft Sensor Model Based On CNN-BiLSTM and IHHO Algorithm For Tennessee Eastman Process
No ratings yet
A Soft Sensor Model Based On CNN-BiLSTM and IHHO Algorithm For Tennessee Eastman Process
14 pages
Content-Aware Network Traffic Prediction Framework For Quality of Service-Aware Dynamic Network Resource Management
No ratings yet
Content-Aware Network Traffic Prediction Framework For Quality of Service-Aware Dynamic Network Resource Management
18 pages
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
No ratings yet
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
18 pages
Limnology Oceanography - 2024 - Gorski - Deep Learning of Estuary Salinity Dynamics Is Physically Accurate at A Fraction
No ratings yet
Limnology Oceanography - 2024 - Gorski - Deep Learning of Estuary Salinity Dynamics Is Physically Accurate at A Fraction
16 pages
TripleThe Interpretable Deep Learning Anomaly Detection Framework Based On Trace-Metric-Log of Microservice
No ratings yet
TripleThe Interpretable Deep Learning Anomaly Detection Framework Based On Trace-Metric-Log of Microservice
10 pages
Achieving Open Vocabulary Neural Machine Translation With Hybrid Word-Character Models
No ratings yet
Achieving Open Vocabulary Neural Machine Translation With Hybrid Word-Character Models
10 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
TST-GAN A Legal Document Generation Model Based On Text Style Transfer
No ratings yet
TST-GAN A Legal Document Generation Model Based On Text Style Transfer
4 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Gesture Recognition: Unlocking the Language of Motion
From Everand
Gesture Recognition: Unlocking the Language of Motion
Fouad Sabry
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet

Document (AutoRecovered)

Uploaded by

Document (AutoRecovered)

Uploaded by

CHAPTER 1

SENSOR-BASED human activity recognition (HAR), a time series classification task,

1.1 Introduction to HAR

Figure 1. Human activity recognition (HAR)

1.2 About Sensors

1.3 Deep Learning

1.4 Long-Short Term Memory (LSTM)

1.5 Convolutional Neural Network (CNN)

1.5.2. Batch Normalization

1.5.3. Max Pooling Layer

1.5.4. Flatten Layer

1.6. Transfer Learning

1.7 Organization of the Thesis

 Chapter 2: Specifically, it provides a literature survey consisting of existing techniques.

As a result of the extensive research that has been conducted by researchers in

In a more intelligent interactive cognitive environment, Human Activity Recognition is

The target of this experiment is to identify weapons in CCTV footage. An object

The YOLO principle is Y= (C, Bw, Bh, Bx, By, Pc)

3.1 Study of the System

It involves following steps to develop a deep learning model to solve problem.

3.2 System Architecture of YOLOV5

Figure 3.1 Architecture of YOLOV5

3.3 Software Requirements

1. Tensor computation (similar to NumPy) with significant GPU acceleration (GPU)

2. Deep cnn based on a tape-based automatic differentiation mechanism

Tensorflow is a high-performance numerical calculation software package that is open

Keras includes a plethora of implementations of frequently used neural network building

Scikit-Learn is a Python-based free software machine learning library. It includes support

3.3.6 Google Colaboratory

Colaboratory is a Google research initiative that was developed to spread machine

Colaboratory supports the execution of Python 2 and Python 3 programmes, as well as

CPU : i5-8250U CPU@ 1.60 GHz or high

GPU : Nvidia Tesla K80

3.5 YOLOV5 Model Building

3.5.1 Learning Rate

3.5.2 YOLO labeling format

3.5.3 Configuration files

3.5.6 Transfer Learning

3.5.8 Fine Tuning

3.6 YOLOV5 algorithm Steps

Steps for weapon detection using Yolov5

1. Take the input data

2. Preprocess the data

3. Split the data into train and test sets

4. Import all yolov5 modules

5. Train dataset using yolov5

6. Validate the data using yolov5

7. Run different commands towards detect

This chapter discusses the performance of estimator results or outputs obtained by

4.2 Evaluation Metrics

4.3 Experimental Results

Figure 4.1 Accuracy

Figure 4.3 Recall

Figure 4.4 Training box loss

Figure 4.6 Training object loss

Figure 4.8 Validating class loss

4.4 Output Results

Figure 4.11 Output for Pistol

Figure 4.13 Output for Knife

CONCLUSION AND FUTURE WORK

An effective real-time automatic weapon detection system has been proposed in

As there is still opportunity for improvement, future attempts will concentrate on

[1] M. T. Bhatti, M. G. Khan, M. Aslam and M. J. Fiaz, "Weapon Detection in Real-Time

You might also like