0% found this document useful (0 votes)
72 views

Abnormal Vehicle Behavior Detection Using Deep Learning and Computer Vision

In the modern era, usage of video surveillance has increased which in fact increase the size of data. Video surveillance is widely using in both public and private areas for improving the security and safety of human being. Hence, it is important to identify and analyse the video in different angle so as to extract the most important information from the video. The video may contain both usual or unusual event, mostly the users need to find out the unusual event from the video that may affect th
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Abnormal Vehicle Behavior Detection Using Deep Learning and Computer Vision

In the modern era, usage of video surveillance has increased which in fact increase the size of data. Video surveillance is widely using in both public and private areas for improving the security and safety of human being. Hence, it is important to identify and analyse the video in different angle so as to extract the most important information from the video. The video may contain both usual or unusual event, mostly the users need to find out the unusual event from the video that may affect th
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Abnormal Vehicle Behavior Detection using Deep


Learning and Computer Vision
1 2
Susmitha John Bino Thomas
Computer Science and Engineering (of KTU), Computer Science and Engineering (of KTU),
St.Joseph’s College Of Engineering and Technology St.Joseph’s College Of Engineering and Technology
(of KTU), Kottayam,India (of KTU), Kottayam, India

Abstract:- In the modern era, usage of video surveillance automatically can an important role in data analytics. The
has increased which in fact increase the size of data. system would then notify operators or users accordingly.
Video surveillance is widely using in both public and This technology includes detection, tracking and counting
private areas for improving the security and safety of all the movable objects from video and analyzing their
human being. Hence, it is important to identify and behaviors, and reply to them accordingly. Most challenging
analyse the video in different angle so as to extract the part is detection of abnormal events from a video and
most important information from the video. The video informing it to responsible authority. The abnormal behavior
may contain both usual or unusual event, mostly the is difficult to explain, but can be easily notified when it
users need to find out the unusual event from the video happens. Abnormal behavior is a psychological term for
that may affect their security. To differentiate both the defining actions that are different from what is considered as
events separately, here we are considering a special normal in a particular society or culture or in any other
scenario related with vehicle. The vehicles on road can environment. This abnormal behavior definition is
move in different ways, where they can follow or violate functional and useful for many purposes. However, most
traffic rules, illegal U-turns, accidents etc. In this paper, definitions of abnormal behavior also take into account that
the unusual event considered is the accidents on the from a psychological point of view, mental illness, pain, and
road. The technology used is deep learning and stress often play a major role in behavioral patterns.
computer vision. The neural network selected is the Abnormal events include the situations which are
DenseNet. The DenseNet is a convolutional neural unnecessary or unpredicted events like road accidents,
network. The peculiarity of a DenseNet architecture is traffic violations, etc.
that each layer in a network is connected to every other
layer. For each layer, the feature maps of all the The monitoring of video from surveillance system can
preceding layers are used as inputs, and its own feature be analyzed and detected for object from video which have
maps are used as input for each subsequent layer. The several applications. The enhancement in video surveillance
deployment of DenseNet along with computer vision system also allows several other editing and storing of
increases the accuracy of the system. videos in more efficient way. The processing and analysis of
such video is of great importance. It contains many valuable
Keywords: Deep Learning, Computer Vision, Segmentation, information that can be used for finding out different
Tracking. activities from the video. The current video surveillance can
use many interesting technologies like computer vision and
I. INTRODUCTION deep learning.

The increase in the population rate also increases the  Objective and Scope:
need of safety and security of human beings in public and The capturing of video and processing such video for
private areas. The usage of video surveillance has become a further analysis to extract important feature is a challenging
vast concern of everyday life. As a consequence of these the task. According to the area of interest we need to process the
deployment of cameras has done almost everywhere. Video data because there is no need of the whole data. We have to
surveillance are widely used in smart cities, smart offices, simplify and change the representation of an image into
etc. Such videos are analyzed and studied through different something that is more meaningful and easier to analyze. An
technologies for extracting important information. And, it is abnormal behavior detection framework based on deep
currently a well-researched area and has mainly learning algorithm is used. The objectives of this proposed
applications. The most attractive areas include system are:
activityrecognition from the video surveillance system. The
main focus is on understanding the activities involved for  Developing a system for detecting the abnormal vehicle
the detection and classification of the targets of interest and behavior.
analyzing the activities included in the data. The detection  Detection is done using the specialized framework where
and reporting of situations of special interests from a video both neural network and computer vision technologies
is vital step, where unexpected things may happen. In such are used.
cases, the video surveillance system which can easily
interpret the scenes and recognize the abnormal behaviors

IJISRT23APR1240 www.ijisrt.com 892


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Trying to acquire high range of accuracy with less  Step 1: Use real-time traffic video sources to identify
complex structure. vehicles.
 Step 2: Using identified cars to extract features of cars.
The normal movement of vehicle is considered as  Step 3: Utilizing the vehicles tracked, the traffic
normal event where as any accident situation is considered anomalies are detected.
as abnormal event. Abnormal events are always something
that is unexpected from the normal situation. The use of  Deep Spatio-Temporal Representation
neural network will help us to differentiate between these A novel approach for the automatic detection of traffic
two cases. accidents in video surveillance was put forth by the author
[7]. Instead of using conventional hand-crafted features, the
The scope of the system includes the deployment in proposed approach method automatically learns the feature
different areas like smart cities etc. Also, the usage of representation with the help of spatiotemporal patterns of
computer vision helps the computers to gain high level of basic pixel intensity values. They define the vehicle crash as
understanding about the video and images. an exceptional occurrence or event. The suggested system
uses denoising the autoencoders that have been trained on
II. LITERATURE SURVEY videos of typical traffic to extract deep representation. The
probability of the deep description and the reconstruction
 Unsupervised Anomaly Detection error are used to calculate the chances of an accident. A one
The significance of anomaly identification in traffic class support vector machine is used to train an
footage for intelligent transportation systems has recently unsupervised model for the probability of the deep
attracted more attention. They introduced a quick representation. Additionally, the intersecting locations of the
unsupervised anomaly detection system in their paper [2], vehicle's trajectories are employed to lower the rate of false
which consists of three modules: pre-processing, candidate alarms and boost the overall system reliability.
selection, and backtracking anomaly detection. Outputs from
the pre-processing module include stationary objects found  Adaptive Video-Based Algorithm
in videos. The candidate selection module then employs K- On highways and expressways, a unique vision-based
means clustering to discover probable anomalous regions method for detecting the traffic accidents is presented [8].
after removing the incorrectly classified stationary items By utilizing Farneback Optical Flow for motion detection
using a nearest neighbour method. The backtracking and a statistic heuristic approach for accident identification,
anomaly detection algorithm then determines the onset time this approach is based on an adaptive traffic motion flow
of the anomaly and computes a similarity statistic. modelling technique. On a collection of videos of traffic and
accidents on highways, the algorithm was used. The
 Temporal Segmentation outcomes demonstrate the effectiveness and applicability of
They introduce a temporal segmentation and a the suggested approach when just 240 frames are used to
keyframe selection techniques for user-generated video in describe traffic movements. This approach avoids using a
this paper [3]. A user generated video temporal sizable database in the absence of suitable and widespread
segmentation technique has been suggested that creates a accidents videos benchmarks.
partition-based video on a categorization of camera motion
because user generated video is rarely arranged in shots and  SVM
user interests are typically exposed through camera In this paper [9], they have used the important statistics
movements. It has been proposed that a Hierarchical Hidden for regulators and policy makers which is proposed in an
Markov Model (HHMM) which generates a user- automated fashion. These statistics contain lane usage
meaningful user generated video temporal segmentation be monitoring, vehicle counting, vehicle speed estimation from
fed motion-related mid-level information. A keyframe video and classification of vehicle type. The vital part of
selection approach that chooses a key frame for camera such a proposed system is to detect and classify the vehicles
motion patterns with fixed content, like zoom, still, or shake, in traffic videos. For this purpose, they implement two
and a group of keyframes for the translation of patterns with models- first is a Mixture of Gaussian with SVM system and
dynamic content has also been suggested. the second one is based on Faster RCNN, which is a
recently developed popular deep learning architecture for
 YOLO the detection of objects in images. In the experiments, the
The implementation of intelligent real-time systems Faster RCNN performs better than Mixture of Gaussian in
that can identify unusual vehicle actions may notify law the detection of vehicles that may be static, overlapping or
enforcement and transportation organisations of potential in other situations like night-time conditions. Faster RCNN
offenders and help prevent traffic accidents. By creating an also have better performance than the SVM in the
application for the identification of anomalous driving classification task of vehicle types based on appearances.
behaviour utilising traffic video content, they address this
issue in this study [4]. Real videos from traffic cameras are  Cooperative Vehicle Infrastructure System
used for evaluation in order to find halted cars and other They proposed a methodology for the detection of
potential anomalies in driving behaviour. The following accidents caused by an automatic car which is based on the
steps make up the suggested algorithm for detecting aberrant Cooperative Vehicle Infrastructure Systems (CVIS) and
vehicle behaviour: machine vision [10]. Firstly, the CAD-CVIS dataset is

IJISRT23APR1240 www.ijisrt.com 893


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
established which is a novel image, with an intention to is set up in the scene, the behavior of vehicle is tested. This
increase the accuracy of detecting accidents based on smart process is repeated and the stored behavior data can be used
roadside devices in CVIS. Particularly, the CAD-CVIS further for the reconstruction of the scene again for a
consisted of variety of accident types, weather conditions secondary analysis.
and accident location, which indeed helps in improving the
self-adaptability of the accident detection methods among In this work [1], they make the assumption that visual
different traffic situations. Secondly, a deep neural network elements occurring in a temporal sequence reflect the
model YOLO-CA based on CAD-CVIS and deep learning occurrences of traffic accidents. They had divided the video
algorithms to detect accident are also developed. For segmentation two categories- spatial and temporal. To locate
enhancing the performance of the detection of small objects, the objects spatially in different frames, the spatial
they use the Multi-Scale Feature Fusion (MSFF) and loss segmentation classifies the interested objects from the video.
function with dynamic weights in this model. The segmentation helps in identifying and tracking the
objects from the video. The model architecture proposed by
 Trajectory Tracking Based Method author extract the visual features followed by the
Here, the abnormal vehicle behavior detection is done identification of temporal patterns. The public dataset is
by tracking the trajectories effectively, the complete used for in the training phase where the visual and temporal
procedures are divided mainly into three steps: the target features are learned using the convolution and recurrent
detection and vehicle tracking, analysis of vehicle network.
trajectories, and vehicle behavior analysis [11]. Firstly, a
three-frame differencing method is used to achieve the In computer vision, the deep architectures with
initial target location and proposedan improved tracking convolutional structures have been found vastly efficient
algorithm which is based on the Kalman predictor; then, an and frequently used. For deep learning algorithms, Graphics
adaptive segmented linear fitting algorithm is proposed to Processing Units (GPUs) are found to be more effective
achieve vehicle trajectory fitting. To establish the vehicle because of its high processing power. Also, the availability
abnormal behavior detection model, two parameters of large amount of data has also made it possible to train the
containing the velocity variation rate and direction variation deep neural networks efficiently without any delay. The
rate are used. main aim of this paper is to perform a systematic study, in
order to explore the prevailing research about the
 Deep Learning Based Methods implementations of computer vision approaches based on
For detecting the salient regions in videos, a deep the deep learning algorithms and Convolutional Neural
learning-based method is proposed [15]. It mainly addresses Networks (CNN) [13]. They selected a total of 119 papers,
two important issues- First, the deep video saliency model is which were classified according to field of interest, network
trained with the absence of adequately huge and pixel-wise type, learning paradigm, research and contribution type.
video data which is annotated one; and second, training and This study reveals that this field is a promising and trending
detection with fast video saliency. The proposed system area for research. In this research, to explore the computer
mainly consists of two modules, one for capturing the vision task they choose human pose estimation in video
spatial and other for temporal saliency information. The frames. After the study, they proposed three different
dynamic saliency model, explicitly incorporating saliency research direction related to- improving the existing CNN
estimates from the static saliency model, directly produces implementations, using the Recurrent Neural Networks
spatiotemporal saliency inference without time-consuming (RNNs) for the estimation of human pose and finally depend
optical flow computation. For simulating the training video on unsupervised learning model to train neural networks.
data from existing annotated image datasets, a novel data
augmentation approach is enabled in this network By utilizing both regular and anomalous video, they
preventing the overfitting with the limited number of suggested a technique for learning anomalies [5]. It was also
training videos and to learn the diverse saliency information. suggested to understand anomaly through the deep multiple-
The deep video saliency model efficiently learn both the instance ranking framework by utilizing weakly labelled
spatial and temporal saliency motions, thus producing an training videos, meaning that the training labels (anomalous
accurate spatiotemporal saliency estimate motivating the or normal) are at the video level rather than the clip-level, in
synthetic video data and real videos. order to prevent annotating the anomalous segments or clips
in training videos, which is quite time-consuming. In this
The abnormal vehicle behavior analysis is a method, they use multiple instance learning (MIL) to
challenging field in surveillance videos, it is mainly due to automatically develop a deep anomaly ranking model that
the huge variations in different anomaly cases and the high forecasts higher anomaly scores for anomalous video
complexities in video surveillance. In this study [12], they segments by treating normal and anomalous videos as
proposed a novel smart vehicle behavior analysis framework packets and video segments as instances. In order to more
which is based on a digital twin. The first step is to accurately localize anomaly during training, they also
implement the detection of vehicles based on the deep integrate sparsity and temporal smoothness requirements
learning, and then for tracking vehicles both Kalman into the ranking loss function.
filtering and feature matching are used. After that, the
mapping of tracked vehicle is done to a digital twin virtual Their contribution is mainly a three-fold contribution
scene. According to the modified detection conditions which [14]. Firstly, they proposed a two-stream ConvNet

IJISRT23APR1240 www.ijisrt.com 894


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
architecture which includes spatial and temporal networks. where OpenCV is used. For further data processing, deep
Secondly, they demonstrate that a ConvNet is able to learning neural network is used. The usage of deep learning
achieve very high performance in spite of any limited technique helps in identifying the features by itself making
training data when trained using the multi-frame dense- the detection more effective and at ease. Within the deep
optical flow. Finally, to increase the amount of training data learning technique, the convolutional neural network which
and improve the performance multitask learning is applied is a type of artificial neural network is widely used for
to two different action classification datasets. Using the object classification and recognition. Recent study has
standard video actions benchmarks of UCF-101 and shown that the convolutional network will be more accurate
HMDB-51, this architecture is trained and evaluated. The and efficient to train if network contains less number of
usage of convolutional network shows good performance in layers. This in turn give rise to the new neural network that
video classification. is the Dense Convolutional Network (DenseNet). For the
abnormal vehicle detection, the DenseNet is used in this
In this research [6], they developed an integrated two- proposed system. The frames obtained from the videos are
stream convolutional network framework that can identify used for training the neural network. The dataset is
vehicular traffic in video surveillance data in real-time and classified into three sets- training set, testing test and
track them even while identifying serious accidents. A validation set. The data is passed over different layers of the
spatial stream network for item detection and a temporal network for feature extraction and classification. The
stream network that uses motion features for several object integration of computer vision along with neural network
tracking includes two paradigms. By combining motion and increase the system performance.
appearance features from these two networks, they can
detect near-accidents. Furthermore, they show on a range of  Abnormal Vehicle Behavior Detection Framework
videos gathered from fisheye and overhead cameras that The process of abnormal behavior detection includes
their methods can be used in real-time and even at a frame several processes. The abnormal vehicle behavior starts with
rate that is faster than that of the video frame rate. video sequence obtained from video surveillance. The neural
network cannot directly handle video data so before
They present a residual learning framework [16] to forwarding the input data to the DenseNet, the first step
make the training of networks easier that are considerably considered is the conversion of video sequences into
deeper than those used in previous papers. Instead of corresponding frames. The usage of computer vision
learning unreferenced functions, they explicitly reformulate techniques supports the pre-processing of data using the
the layers as the learning residual functions with the libraries in OpenCV. It includes functions that can be used
reference to the layer inputs. They provide complete for pre-processing data. The pre-processing includes:
empirical evidence showing that the optimization of these resizing, removing noises etc. The pre-processed image is
residual networks is easier, and can gain high accuracy with given as the input to the network. During the training phase,
considerably high depth. They evaluate residual network on the model is trained using the training data which is labelled
ImageNet dataset with a depth up to 152 layers which is 8 as accident and no accident. In this training phase the system
times deeper than the VGG nets but still having a will extract all the relevant feature by itself. In the testing
drawbackthat is lower complexity. phase, some data is given as input to the trained model to
predict the output. If it detects the changes correctly, we can
They proposed Residual Attention Network [17], a justify that the system is accurate and efficient.
convolutional neural network using attention mechanism
which can incorporate with state-of-art feed forward  Computer Vision
network architecture in an end-to-end training fashion. The Computer vision enables the system to study and
Attention Modules that produce attention-aware features are understand images and can derive important features from
stacked to create the Residual Attention Network. As the them. Images can be further identified and processed using
layers get deeper, the attention-aware features from various computer vision. OpenCV is an image processing library
modules adjust. The feedforward and feedback attention which contain programming functions. OpenCV stands for
processes are combined into a single feedforward process Open-Source Computer Vision Library which facilitates the
inside of each Attention Module using a bottom-up top- research in the computer vision domain provide strong
down feedforward structure. To train extremely deep support for the advanced CPU-based projects. It is freely
Residual Attention Networks that are easily scalable up to accessible for both commercial as well as academic purpose.
hundreds of layers, they crucially proposed attention The programming languages like Python, C++ and Java as
residual learning. well as the commonly used operating systems like Linux,
Windows, iOS, Mac OS, and Android are supported by
III. METHODOLOGY OpenCV. In real time applications, the computational
efficiency is an important factor and OpenCV was designed
 Proposed System mainly for this purpose. The multi-core processing concept
The proposed methodology is mainly based on is used in OpenCV. OpenCV also supports a wide range of
computer vision and deep learning technique used in video deep learning libraries like PyTorch which provide easy
analytics. For this purpose, video data containing the normal implementation of neural networks where the data processing
cases and abnormal cases of vehicles. The captured video is is supported well. The video can be pre-processed using
segmented into frames using the computer vision technique OpenCV library where many function for different purpose

IJISRT23APR1240 www.ijisrt.com 895


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
can be used. And our main aim is to convert the video used for this. Each layer is getting a collective knowledge
sequence into frames with the help of OpenCV. from all the preceding layers.

The neural network can be thinner and more compact,


because each layer receives feature maps from all preceding
layers, that is number of channels can be fewer. The growth
rate k is the additional number of channels for each layer. So,
it will have higher computational efficiency and memory
efficiency. The several compelling advantages of DenseNets
are- they reduce the vanishing-gradient problem, the reusage
of features are encouraged, substantially reduce the number
of parameters and the feature propagation is strengthened.

In the image below, consider a convolutional network


Fig 1 Proposed System Framework in which the single image x0 is given as the input. The
network comprises of L layers, each of the layers implements
 Pre-processing a non-linear transformation Hℓ (.), where Hℓ (.) is the
The pre-processing step includes normalization composite function of three operations such as BN- Batch
technique. Normalization refers to normalizing the data Normalization, ReLU -rectified linear units, Conv-
dimensions so that they are of approximately the same scale. Convolution. xℓ is denoted as the output of the ℓ th layer.
For Image data there are two common ways of achieving this The components of the DenseNet includes [19]:Connectivity,
normalization. One is to divide each dimension by its DenseBlocks, Growth Rate, Bottleneck layers.
standard deviation, once it has been zero-centered.

The pre-processing of images can de done using


normalization technique, which normalizes each dimension
to a specified one so that the min and max value along the
dimension will be in a particular range. The purpose of pre-
processing step is to make the different values of input image
to similar one. If the input data have similar values no pre-
processing is required, but they should be of approximately
equal importance to the learning algorithm. The relative
scales of pixels will be already approximately equal a pixel Fig 2 Dense Block
value which range from 0 to 255, so it is not strictly
necessary to perform this additional pre-processing step.  Connectivity
The feature maps from all the preceding layers undergo
Convolutional Neural Network is a deep learning concatenation operation instead of the summing operation as
algorithm, it takes input as image and assigns weight and used in normal convolutional layers and the output of that
bias to the different object aspects to differentiate one from concatenation operation is used as inputs in each layer.
other. Convolutional Neural Network is a type of artificial Therefore, only fewer parameters are necessary for the
neural network that is used in image processing and DenseNet when compared with the traditional
recognition which is designed for process the image in pixel Convolutional Neural Network, and this in turn reduce or
data. The Convolutional Neural Networks are used in image discard all the redundant feature, thus allowing feature
classification where the valuable features are recognized by reuse. Therefore, the feature-maps from all the preceding
the Convolutional Neural Network by identifying different layers, x0,...,x ℓ -1, the ℓ th layer receives the input as:
objects from the images. Less pre-processing steps are only
required in Convolutional neural network. xℓ= Hℓ([x0,x1,...,x ℓ -1] )……………….(1)

The vanishing gradient problem in the traditional Where, [x0,x1,...,x ℓ -1] represents the feature-maps
convolutional neural network occur as the layer get deeper concatenation, that is the output obtained in all the
which is considered as a problem to overcome. As a solution preceding layers ℓ (0,..., ℓ -1). The concatenation of Hℓ is
to this, the Dense Convolutional Network (DenseNet) is done to transform it into a single tensor to make the
developed, where each layer is connected to every other implementation easy and the multiple inputs of Hℓ is used
layers in a feed-forward fashion. But in case of the traditional for concatenation.
convolutional networks, it contains L layers having L
connections that is one layer between each layer and its  Dense Blocks
succeeding layer. In DenseNet there is a total of L(L+1)/2 When the size of feature maps changes, the usage of
direct connections. For each layer, the feature-maps of all the concatenation operation is not possible in such cases. To
preceding layers are used as inputs, and its own feature-maps obtain higher computational speed, down-sampling must be
are used as inputs into all subsequent layers. Concatenation is done in layers which help in reducing the size of the feature

IJISRT23APR1240 www.ijisrt.com 896


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
maps by reduction in dimensionality, which is considered as feature-maps. Therefore, a bottleneck layer is a 1x1
an essential step of Convolutional Neural Network. convolution layer which is introduced before each 3x3
convolution which can improve the speed of computations
To enable this functionality, the DenseNet is divided and efficiency of the network.
into DenseBlocks, within each DenseBlocks the
dimensionality of feature maps is constant, but the number The deeper layers use only the extracted feature by
of filters used will change. Transition Layers are the layers spreading the weights of all layers within the dense block and
in between blocks which helps in reducing the number of the transition layer[20]. Since the output from the transition
channels to half of that of the existing channels. layers contain many redundant features, second and third
dense block layers assign the least weights as the output of
the transition layers. As the model become deeper, more
high-level and relevant features are generated and it seems to
have high concentration towards the end of the feature maps
while using the entire dense block weights by the final
Fig 3 Propagation of Input through DenseNet layers.

In the above image, three dense blocks with a deep The DenseNet used in this experiment has the dense
DenseNet is shown. Through the convolution and pooling blocks that each has an equal number of layers. Before
operations down-sampling (i.e. feature-maps size is entering the first dense block, a convolution with 16 (or
changed) is performed in the transition layers that is the twice the growth rate for DenseNet) output channels is
layers between two adjacent blocks. To enable feature performed on the input images. For convolutional layers with
concatenation the size of feature map is kept same within kernel size 3x3, each side of the inputs is zero-padded by one
the dense block which is considered as an advantage of this pixel to keep the feature-map size fixed. Here 1x1
neural network. convolution followed by 2x2 average pooling as transition
layers between two contiguous dense blocks is used. At the
The first step of extracting the useful or important end of the last dense block, a global average pooling is
information from images is the convolutional layer. Using performed and then a softmax classifier is attached. The
the small squares of input data the image features are feature-map sizes in the three dense blocks are 32x32, 16x16,
learned for conserving the relationship between the pixels of and 8x8, respectively.
the frames or images with the help of convolution. By taking
the two inputs- matrix and kernel, it is implemented A DenseNet structure with 4 dense blocks on 224x224
mathematically using the operations. The matrix is the part input images is used. The initial convolution layer comprises
of the image. 2k convolutions of size 7x7 with stride 2; the number of
feature-maps in all other layers also follow from setting k.
When the given image is too large, the number of
parameters are reduced using pooling layers, which is The main advantages of using DenseNet includes:
considered as the main job of pooling layers. The spatial
pooling which is also termed as the down-sampling or sub-  Parameter efficiency – In DenseNet only limited number
sampling, helps in maintaining the most relevant of parameters are added in each layers that is only 12
information by diminishing the dimensionality of each kernels are learned per layers.
Feature Map.  Implicit deep supervision – The gradient flow is
 Growth Rate improved through the network that is the feature maps in
The features can be considered as a global state of the each layers have direct access to the loss function and its
neural network. After the propagation through each dense gradient.
layer by adding ' ƙ ' features on top of the existing features
with each layer, the feature map size increases. The growth  Dependencies
rate of the network is referred as ' ƙ '. This parameter ' ƙ '
can control the amount of information added in each layer of  Anaconda
the neural network. If k feature maps are produced by each Anaconda is an open-source software and environment
Hℓ function, then the ℓ th layer has management system used for data analytics, data processing,
etc. Anaconda runs on Linux, Windows and MacOS.
ƙℓ=ƙ0+ƙ*(ℓ-1)……………. (2) Anaconda can be used for running, installing and updating
the packages easily. It can switch between the local
input feature-maps where, k0 is defined as the number environment on the computer.
of channels in the input layer. DenseNet have very thin
layers when compared with the existing neural network  OpenCV
architectures OpenCV is an open-source library. OpenCV is mainly
used for image processing, computer vision, and machine
 Bottleneck Layers learning tasks. It plays an important role in the real-time
In case of more layers, the number of inputs can also be operation with data which have great impact in today’s
quite high, even though each layer produces only k output systems. By using OpenCV, the image and video data can be

IJISRT23APR1240 www.ijisrt.com 897


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
processed to identify for various applications like recognition obtained output is shown below. The Fig. 4 shows the
of human handwriting, objects or faces. When it is integrated abnormal behavior of the vehicle where the accident has
with various libraries, such as NumPy, python become more occurred. In Fig. 5, the normal behavior of the vehicle is
capable of processing with the array structure of OpenCV for detected.
analysis.

To Identify the image pattern and its corresponding


various features, vector space is used and perform some
mathematical operations on these features. The main purpose
of computer vision is to understand the content of the images
given. It extracts the necessary description from the pictures,
which may be an object, three-dimension model and a text
description and so on.

 Visual Studio Code Fig 4 Abnormal Behavior


Visual Studio Code is a powerful source code editor that
can run on desktop and it is a lightweight code editor. It is
available for Linux, macOS and Windows. It contains built-
in support for Node.js, JavaScript and TypeScript. It has a
ridiculous ecosystem of the extensions for other
programming languages. Visual Studio (VS) Code is used to
correct and restore coding errors which is cloud and web-
based applications. VS code is an open-source code editor.

 PyTorch Fig 5 Normal Behavior


PyTorch is used for python programming which is an
open-source library developed using Torch that can be used
in machine learning library. It is a free open-source library
and was developed by the AI Research lab of Facebook. It
mainly focuses on natural language processing, computer
vision and deep learning, and several other applications.
Using PyTorch, a programmer can easily build any complex
neural network, since it has a core data structure- Tensor and
multi-dimensional array such as Numpy arrays.

The features like flexibility, speed, and ease of use Fig 6 Testing Result
makes PyTorch to be used frequently in the most current
industries and in the research areas. PyTorch can run project V. CONCLUSION
in a fast manner which makes the PyTorch one of the top
deep learning tools. PyTorch is one of the best open- In smart security field, the abnormal behavior detection
sourcelibraryfor image classification, object detection and from videos is a trending and vast research area. Variety of
many other applications. The version of PyTorch used in this definitions can be given to abnormal behavior which can be
work is PyTorch 1.0.1. Using PyTorch, a programmer can done based on the different surveillance video objects and
process images and videos to develop a highly accurate and surveillance scenes. Among different abnormal behavior,
precise computer vision model. the research area mainly focuses on abnormal behaviors
detection among vehicles. The main focus of this research is
IV. RESULTS on the detection of the abnormal behaviors. For the
abnormal behavior detection, deep learning algorithm-based
The proposed methodology for detecting abnormal framework is used. First, the preprocessing of input video is
vehicle behavior can process the data in an efficient way done using the OpenCV library available in computer
using deep learning and computer vision. The result proves vision. When the preprocessed data is loaded to DenseNet, it
the efficiency of the system. Comparing with the existing will process this input through different layers. The network
system, the usage of DenseNet makes the framework more is trained using the dataset and it will detect whether the
effective since it reduces the parameters considered. To frames are abnormal or normal. The number of parameters
verify the robustness of the proposed system, here the video get reduced with the help of DenseNet which in turn
with different situations from the Internet is used. This increase the performance of the system. The result predicts
system correctly identified the unusual event and usual the accuracy has reached at a better level.
event in an efficient way using the DenseNet with less
parameters and classified the events into accident and no  Scope for Further Work
accident. The detection framework classifies each frame The proposed system can be implemented in smart
from a video with an accuracy of 97 percentage. The cities and intelligent traffic system. The implementation of

IJISRT23APR1240 www.ijisrt.com 898


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
such system will help in identifying the accidents fast and to [10]. D. Tian, C. Zhang, X. Duan and X. Wang, "An
take immediate actions so as to reduce the death rate and can Automatic Car Accident Detection Method Based on
increase human security. In future the system can be trained Cooperative Vehicle Infrastructure Systems," in
on large amount of data thereby increasing the system IEEE Access, vol. 7, pp. 127453-127463, 2019, doi:
accuracy to next level. By training with variety of data, the 10.1109/ACCESS.2019.2939532.
system will able to classify wide ranges of data into correct [11]. Enyuan Jiang, Xuejun Wang. 2015. “Abnormal
class. Vehicle behavior detection based on tracking
trajectory”. In Journal of Computer and
REFERENCES Communications. http://dx.doi.org/10.4236/jcc.2015.
[12]. Li, L., Hu, Z. & Yang, X. “Intelligent Analysis of
[1]. Robles-Serrano, Sergio, German Sanchez-Torres, and Abnormal Vehicle Behavior Based on a Digital
John Branch-Bedoya. 2021. “Automatic Detection of Twin”. J. Shanghai Jiaotong Univ. (Sci.) 26, 587–597
Traffic Accidents from Video Using Deep Learning (2021). https://doi.org/10.1007/s12204-021-2348-7
Techniques Computers 10”, no. 11: 148. [13]. Nishani, E.; Cico, B.“ Computer vision approaches
https://doi.org/10.3390/computers10110148. based on deep learning and neural networks: Deep
[2]. Doshi, Keval and Y. Yilmaz. “Fast Unsupervised neural networks for video analysis of human pose
Anomaly Detection in Traffic Videos”. 2020 estimation”. In Proceedings of the 2017 6th
IEEE/CVF Conference on Computer Vision and Mediterranean Conference on Embedded Computing
Pattern Recognition Workshops (CVPRW) (2020): (MECO), Bar, Montenegro, 11–15 June 2017; pp.
2658-2664. 11–14.
[3]. Iván González-Díaz, Tomás Martínez-Cortés, [14]. Qiao, Han & Liu, Shuang & Xu, Qingzhen & Liu,
Ascensión Gallardo-Antolín, Fernando Díaz-de- Shouqiang & Yang, Wanggan. (2021). “Two-Stream
María, “Temporal segmentation and keyframe Convolutional Neural Network for Video Action
selection methods for user-generated video search- Recognition”. KSII Transactions on Internet and
based annotation”, Expert Systems with Information Systems. Vol. 15. 3668-3683.
Applications, Volume 42, Issue 1, 2015, Pages 488- 10.3837/tiis.2021.10.011.
502, ISSN 0957-4174, [15]. W. Wang, J. Shen and L. Shao, “Video Salient
https://doi.org/10.1016/j.eswa.2014.08.001. Object Detection via Fully Convolutional Networks”,
[4]. Wang, C., Musaev, A., Sheinidashtegol, P., Atkison, in IEEE Transactions on Image Processing, vol. 27,
T. (2019). “Towards Detection of Abnormal Vehicle no. 1, pp. 38-49, Jan. 2018, doi:
Behavior Using Traffic Cameras”. In: Chen, K., 10.1109/TIP.2017.2754941.
Seshadri, S., Zhang, LJ. (eds) Big Data – BigData [16]. K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual
2019. BIGDATA 2019. Lecture Notes in Computer Learning for Image Recognition”, 2016 IEEE
Science(), vol 11514. Springer, Cham. Conference on Computer Vision and Pattern
https://doi.org/10.1007/978-3-030-23551-2_9. Recognition (CVPR), 2016, pp. 770-778, doi:
[5]. W. Sultani, C. Chen and M. Shah, “Real-World 10.1109/CVPR.2016.90.
Anomaly Detection in Surveillance Videos”,2018 [17]. Wang, Fei & Jiang, Mengqing & Qian, Chen &
IEEE/CVF Conference on Computer Vision and Yang, Shuo & Li, Cheng & Zhang, Honggang &
Pattern Recognition, 2018, pp. 6479-6488, doi: Wang, Xiaogang & Tang, Xiaoou. (2017). Residual
10.1109/CVPR.2018.00678. Attention Network for Image Classification. 6450-
[6]. Huang, Xiaohui & He, Pan & Rangarajan, Anand & 6458. 10.1109/CVPR.2017.683.
Ranka, Sanjay. (2019). Intelligent Intersection: [18]. Gao Huang, Zhuang Liu and Laurens van der
“Two-Stream Convolutional Networks for Real-time Maaten. Jan 2018 Densely Connected Convolutional
Near Accident Detection in Traffic Video”. Networks.
[7]. D. Singh and C. K. Mohan, “Deep Spatio-Temporal [19]. OpenGenus IQ: Computing Expertise & Legacy-
Representation for Detection of Road Accidents Architecture of DenseNet-121[Online]
Using Stacked Autoencoder”, in IEEE Transactions https://iq.opengenus.org/architecture-of-densenet121.
on Intelligent Transportation Systems, vol. 20, no. 3, [20]. Noh, Kyoung & Choi, Jiho & Hong, Jin & Park,
pp. 879-887, March 2019, doi: Kang. (2020). “Finger-Vein Recognition Based on
10.1109/TITS.2018.2835308. Densely Connected Convolutional Network Using
[8]. Maaloul, B.; Taleb-Ahmed, A.; Niar, S.; Harb, N.; Score-Level Fusion With Shape and Texture
Valderrama, C. “Adaptive video-based algorithm for Images”. IEEE Access. PP. 1-1.
accident detection on highways”. In Proceedings of 10.1109/ACCESS.2020.2996646.
the 2017 12th IEEE International Symposium on
Industrial Embedded Systems, SIES 2017, Toulouse,
France, 14–16 June 2017.
[9]. Arinaldi, A.; Pradana, J.A.; Gurusinga, A.A.
“Detection and Classification of Vehicles for Traffic
Video Analytics”. Procedia Comput. Sci. 2018, 144,
259–268.

IJISRT23APR1240 www.ijisrt.com 899

You might also like