Abnormal Vehicle Behavior Detection Using Deep Learning and Computer Vision
Abnormal Vehicle Behavior Detection Using Deep Learning and Computer Vision
ISSN No:-2456-2165
Abstract:- In the modern era, usage of video surveillance automatically can an important role in data analytics. The
has increased which in fact increase the size of data. system would then notify operators or users accordingly.
Video surveillance is widely using in both public and This technology includes detection, tracking and counting
private areas for improving the security and safety of all the movable objects from video and analyzing their
human being. Hence, it is important to identify and behaviors, and reply to them accordingly. Most challenging
analyse the video in different angle so as to extract the part is detection of abnormal events from a video and
most important information from the video. The video informing it to responsible authority. The abnormal behavior
may contain both usual or unusual event, mostly the is difficult to explain, but can be easily notified when it
users need to find out the unusual event from the video happens. Abnormal behavior is a psychological term for
that may affect their security. To differentiate both the defining actions that are different from what is considered as
events separately, here we are considering a special normal in a particular society or culture or in any other
scenario related with vehicle. The vehicles on road can environment. This abnormal behavior definition is
move in different ways, where they can follow or violate functional and useful for many purposes. However, most
traffic rules, illegal U-turns, accidents etc. In this paper, definitions of abnormal behavior also take into account that
the unusual event considered is the accidents on the from a psychological point of view, mental illness, pain, and
road. The technology used is deep learning and stress often play a major role in behavioral patterns.
computer vision. The neural network selected is the Abnormal events include the situations which are
DenseNet. The DenseNet is a convolutional neural unnecessary or unpredicted events like road accidents,
network. The peculiarity of a DenseNet architecture is traffic violations, etc.
that each layer in a network is connected to every other
layer. For each layer, the feature maps of all the The monitoring of video from surveillance system can
preceding layers are used as inputs, and its own feature be analyzed and detected for object from video which have
maps are used as input for each subsequent layer. The several applications. The enhancement in video surveillance
deployment of DenseNet along with computer vision system also allows several other editing and storing of
increases the accuracy of the system. videos in more efficient way. The processing and analysis of
such video is of great importance. It contains many valuable
Keywords: Deep Learning, Computer Vision, Segmentation, information that can be used for finding out different
Tracking. activities from the video. The current video surveillance can
use many interesting technologies like computer vision and
I. INTRODUCTION deep learning.
The increase in the population rate also increases the Objective and Scope:
need of safety and security of human beings in public and The capturing of video and processing such video for
private areas. The usage of video surveillance has become a further analysis to extract important feature is a challenging
vast concern of everyday life. As a consequence of these the task. According to the area of interest we need to process the
deployment of cameras has done almost everywhere. Video data because there is no need of the whole data. We have to
surveillance are widely used in smart cities, smart offices, simplify and change the representation of an image into
etc. Such videos are analyzed and studied through different something that is more meaningful and easier to analyze. An
technologies for extracting important information. And, it is abnormal behavior detection framework based on deep
currently a well-researched area and has mainly learning algorithm is used. The objectives of this proposed
applications. The most attractive areas include system are:
activityrecognition from the video surveillance system. The
main focus is on understanding the activities involved for Developing a system for detecting the abnormal vehicle
the detection and classification of the targets of interest and behavior.
analyzing the activities included in the data. The detection Detection is done using the specialized framework where
and reporting of situations of special interests from a video both neural network and computer vision technologies
is vital step, where unexpected things may happen. In such are used.
cases, the video surveillance system which can easily
interpret the scenes and recognize the abnormal behaviors
The vanishing gradient problem in the traditional Where, [x0,x1,...,x ℓ -1] represents the feature-maps
convolutional neural network occur as the layer get deeper concatenation, that is the output obtained in all the
which is considered as a problem to overcome. As a solution preceding layers ℓ (0,..., ℓ -1). The concatenation of Hℓ is
to this, the Dense Convolutional Network (DenseNet) is done to transform it into a single tensor to make the
developed, where each layer is connected to every other implementation easy and the multiple inputs of Hℓ is used
layers in a feed-forward fashion. But in case of the traditional for concatenation.
convolutional networks, it contains L layers having L
connections that is one layer between each layer and its Dense Blocks
succeeding layer. In DenseNet there is a total of L(L+1)/2 When the size of feature maps changes, the usage of
direct connections. For each layer, the feature-maps of all the concatenation operation is not possible in such cases. To
preceding layers are used as inputs, and its own feature-maps obtain higher computational speed, down-sampling must be
are used as inputs into all subsequent layers. Concatenation is done in layers which help in reducing the size of the feature
In the above image, three dense blocks with a deep The DenseNet used in this experiment has the dense
DenseNet is shown. Through the convolution and pooling blocks that each has an equal number of layers. Before
operations down-sampling (i.e. feature-maps size is entering the first dense block, a convolution with 16 (or
changed) is performed in the transition layers that is the twice the growth rate for DenseNet) output channels is
layers between two adjacent blocks. To enable feature performed on the input images. For convolutional layers with
concatenation the size of feature map is kept same within kernel size 3x3, each side of the inputs is zero-padded by one
the dense block which is considered as an advantage of this pixel to keep the feature-map size fixed. Here 1x1
neural network. convolution followed by 2x2 average pooling as transition
layers between two contiguous dense blocks is used. At the
The first step of extracting the useful or important end of the last dense block, a global average pooling is
information from images is the convolutional layer. Using performed and then a softmax classifier is attached. The
the small squares of input data the image features are feature-map sizes in the three dense blocks are 32x32, 16x16,
learned for conserving the relationship between the pixels of and 8x8, respectively.
the frames or images with the help of convolution. By taking
the two inputs- matrix and kernel, it is implemented A DenseNet structure with 4 dense blocks on 224x224
mathematically using the operations. The matrix is the part input images is used. The initial convolution layer comprises
of the image. 2k convolutions of size 7x7 with stride 2; the number of
feature-maps in all other layers also follow from setting k.
When the given image is too large, the number of
parameters are reduced using pooling layers, which is The main advantages of using DenseNet includes:
considered as the main job of pooling layers. The spatial
pooling which is also termed as the down-sampling or sub- Parameter efficiency – In DenseNet only limited number
sampling, helps in maintaining the most relevant of parameters are added in each layers that is only 12
information by diminishing the dimensionality of each kernels are learned per layers.
Feature Map. Implicit deep supervision – The gradient flow is
Growth Rate improved through the network that is the feature maps in
The features can be considered as a global state of the each layers have direct access to the loss function and its
neural network. After the propagation through each dense gradient.
layer by adding ' ƙ ' features on top of the existing features
with each layer, the feature map size increases. The growth Dependencies
rate of the network is referred as ' ƙ '. This parameter ' ƙ '
can control the amount of information added in each layer of Anaconda
the neural network. If k feature maps are produced by each Anaconda is an open-source software and environment
Hℓ function, then the ℓ th layer has management system used for data analytics, data processing,
etc. Anaconda runs on Linux, Windows and MacOS.
ƙℓ=ƙ0+ƙ*(ℓ-1)……………. (2) Anaconda can be used for running, installing and updating
the packages easily. It can switch between the local
input feature-maps where, k0 is defined as the number environment on the computer.
of channels in the input layer. DenseNet have very thin
layers when compared with the existing neural network OpenCV
architectures OpenCV is an open-source library. OpenCV is mainly
used for image processing, computer vision, and machine
Bottleneck Layers learning tasks. It plays an important role in the real-time
In case of more layers, the number of inputs can also be operation with data which have great impact in today’s
quite high, even though each layer produces only k output systems. By using OpenCV, the image and video data can be
The features like flexibility, speed, and ease of use Fig 6 Testing Result
makes PyTorch to be used frequently in the most current
industries and in the research areas. PyTorch can run project V. CONCLUSION
in a fast manner which makes the PyTorch one of the top
deep learning tools. PyTorch is one of the best open- In smart security field, the abnormal behavior detection
sourcelibraryfor image classification, object detection and from videos is a trending and vast research area. Variety of
many other applications. The version of PyTorch used in this definitions can be given to abnormal behavior which can be
work is PyTorch 1.0.1. Using PyTorch, a programmer can done based on the different surveillance video objects and
process images and videos to develop a highly accurate and surveillance scenes. Among different abnormal behavior,
precise computer vision model. the research area mainly focuses on abnormal behaviors
detection among vehicles. The main focus of this research is
IV. RESULTS on the detection of the abnormal behaviors. For the
abnormal behavior detection, deep learning algorithm-based
The proposed methodology for detecting abnormal framework is used. First, the preprocessing of input video is
vehicle behavior can process the data in an efficient way done using the OpenCV library available in computer
using deep learning and computer vision. The result proves vision. When the preprocessed data is loaded to DenseNet, it
the efficiency of the system. Comparing with the existing will process this input through different layers. The network
system, the usage of DenseNet makes the framework more is trained using the dataset and it will detect whether the
effective since it reduces the parameters considered. To frames are abnormal or normal. The number of parameters
verify the robustness of the proposed system, here the video get reduced with the help of DenseNet which in turn
with different situations from the Internet is used. This increase the performance of the system. The result predicts
system correctly identified the unusual event and usual the accuracy has reached at a better level.
event in an efficient way using the DenseNet with less
parameters and classified the events into accident and no Scope for Further Work
accident. The detection framework classifies each frame The proposed system can be implemented in smart
from a video with an accuracy of 97 percentage. The cities and intelligent traffic system. The implementation of