0% found this document useful (0 votes)
27 views9 pages

VDNet An Edge Vision-Based Surveillance System For Violence Detection

The document presents VD-Net, an AI-based framework for detecting violent behavior in surveillance systems, leveraging lightweight temporal convolutional networks and IIoT for real-time analysis. It highlights the limitations of existing methods and proposes a novel approach using bottleneck layers to enhance accuracy and reduce latency in violence detection. The system is designed for both indoor and outdoor environments, demonstrating improved performance over state-of-the-art techniques.

Uploaded by

Manju Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views9 pages

VDNet An Edge Vision-Based Surveillance System For Violence Detection

The document presents VD-Net, an AI-based framework for detecting violent behavior in surveillance systems, leveraging lightweight temporal convolutional networks and IIoT for real-time analysis. It highlights the limitations of existing methods and proposes a novel approach using bottleneck layers to enhance accuracy and reduce latency in violence detection. The system is designed for both indoor and outdoor environments, demonstrating improved performance over state-of-the-art techniques.

Uploaded by

Manju Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

VDNet An Edge Vision-Based Surveillance System

for Violence Detection


ABSTRACT

The automation of surveillance systems, driven by the rapid development of


computer vision technology, has significantly enhanced the analysis of surveillance
videos, particularly in recognition of human activity, including behavior analysis and
violence detection, thereby bolstering public and industrial security. Despite these
advancements, detecting and analyzing violent actions remains challenging,
especially for real-time surveillance systems with limited computing power. We
propose an artificial intelligence-based framework called VD-Net (Violence
Detection Network), enabled by Intelligent Internetof- Things (IIoT) to detect violent
behavior in public and private spaces. The model utilizes lightweight special task
temporal convolutional network (ST-TCN) blocks and several bottleneck layers to
focus on salient features in the input sequence. The learned features passed from the
classifier to discriminate between violent and nonviolent actions. Additionally, our
system is supposed to trigger an alert if violence is detected, which is then
communicated to relevant departments. We checked the robustness of our system by
surveillance and non-surveillance datasets and ensured a 1-4 % improvement in
State-of-The-Art (SoTA) accuracy.

EXISTING SYSTEM

One of the early works in this area [8] developed a machine learning-based approach
to detect violent movie scenes. The authors used a set of visual and audio features to
classify scenes as violent or nonviolent and achieved an 85 % accuracy on the movie
fight dataset. Similarly, in [9], the authors presented a machine learning-based
approach to detect violent events in surveillance videos using handcrafted features,
such as motion and texture, to classify the violence. Furthermore, the authors in [10]
introduced violence detection for social media and used a set of visual and audio
features to classify the actions accurately. However, [11] and [12] endorsed a
newmachine learning-based approach to detect violent movie events.

Similarly, a conventional method was proposed in [13],utilizing motion cues derived


from optical flow using RGB frames and incorporating appearance as low-level
features. By eliminating redundant information, the system developed a bag of words
(BoWs). Similarly, [14] developed a system to identify violence in crowded settings
based on background motion correction, appearance, and longterm dependencies. In
order to demonstrate how violent events are related to scene-scale spatial events, they
used late fusion and BoW. Another approach [15] developed a new local descriptor
to manage and reduce the coefficient reconstruction error to present a sparse-based
model for classification. Furthermore, [16] incorporated pixel-based analysis and
object trajectory results to monitor object speed, direction, and smaller movements.
Hence, the practice of these methods grew tiresome due to hand-carried engineering.
The subsequent section provides an overview of advance techniques.

Recently, deep learning techniques have become more popular for detecting
violence. The early works in this area [17] presented a method based on deep
learning to identify instances of violent behavior in surveillance videos. The authors
used a two-stream convolution neural network (CNN) architecture to extract
spatiotemporal features from the videos and achieved 89 % classification accuracy
on the surveillance dataset. Similarly, in [18], the authors developed a deep learning-
based model for violence detection in social media and extracted visual features and
temporal dependencies by a long-short term memory (LSTM) and achieved 94.9 %
results on the same dataset. Moreover, the authors in [19] and [20] introduced a deep
learning approach for detecting violent events in urban surveillance videos.
Furthermore, [21] and [22] presented a new method to detect abnormal events in
movie datasets.

Computer vision challenges are being addressed with deep learning based on recent
studies. However, there are also concerns that such technology is being used for
violence. For instance, a method in [7] represents a frame in a sequence using critical
information provided by Hough’s feature. Liu et al. [23] utilized a 3D CNN to
identify violent scenes in video-applied sampling as a pre-processing step. The
researchers developed a deep learning-based model for detecting violent scenes
utilizing transfer learning techniques, while [24] introduced a Spark framework for
detecting violent scenes by bidirectional LSTM. Similarly, [25] introduced the idea
of aggregating the ensembles, and [26] employed a combination of 3-D CNN and
support vector machine (SVM) to identify violent actions in videos. However, a
comprehensive literature analysis indicates that many existing methods must be
revised to overcome several limitations and challenges. These include inadequate
integration with state-of-the-art IoT devices, heavy reliance on end-to-end pre-trained
models, failure to incorporate cloud-based concepts, and the use of handcrafted
features.
Disadvantages
 An existing system didn’t explore Implementation of BOTTLENECK
TRANSFORMER NETWORK (BTNET).
 An existing system didn't implement SPATIAL TASK TEMPORAL
CONVOLUTION NETWORK (ST-TCN).
Proposed System

• Traditional monitoring systems often have wire failures during installation,


resulting in slow response times and increased processing requirements for
authorities. To tackle this challenge, we propose an AI-driven framework for
violence detection that leverages the powerful capabilities of Internet-of-Things
(IoT) to connect devices for the smooth exchange of information. Moreover, we
develop a cloud-based system that enables comprehensive investigations of violent
incidents in public and private settings with fast processing.

• To process surveillance data, we need an intelligent edge-based mechanism to


extract helpful information during analysis. To tackle this issue, we investigate the
use of IoT and introduce a lightweight system that can be implemented on embedded
devices. Our system recognizes critical violence to process and transmit over the
network for detailed investigation in the cloud instead of all frames. This approach
streamlines the process and improves its overall intelligence.
• Traditional IoT methods often use manual features or clustering algorithms, which
may not capture longterm dependencies, reducing accuracy. We propose a bottleneck
layer in VD-Net to encode spatial and temporal correlations and analyze local motion
between frames to overcome this issue. Additionally, the cloud server acquires
feature vectors and sends them to an attention unit for identifying salient cues. This is
the first time bottleneck layers are utilized for violence, significantly improving
accuracy with reduced latency for real-time applications.
• To the best of our knowledge, this article represents the first use of a bottlenecks
layer to learn salient cues of violent activity in the IIoT network. The module extracts
information from the input layer and determines whether a scene is violent or
nonviolent in a public/private environment. We evaluated the proposed VD-Net
using publicly available datasets demonstrating outperformed SOTA approaches.
Additionally, the system can be used for indoor and outdoor surveillance in IIoT-
based systems.

Advantages

The system proposed an advanced IIoT-based VD-Net to cover indoor and outdoor
activities with low latency in real-time. The basic steps of the proposed framework
are listed below. The first step demonstrates how to train the VD-Net for violent
action detection offline. It deals with data acquisition using vision sensors with
limited resources. At the same time, the second stage involves a screening process to
collect critical information, such as identifying people or suspicious activities on the
scene. Suppose the subjects and actions are identified in the second phase as violent
or suspicious. In this case, if there are any violent frames, an alert is generated, and
they are sent to the next step for a thorough investigation before the final violence
detection phase.

SYSTEM REQUIREMENTS

➢ H/W System Configuration:-

➢ Processor - Pentium –IV


➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

Software Requirements:
 Operating System - Windows XP
 Coding Language - Java/J2EE(JSP,Servlet)
 Front End - J2EE
 Back End - MySQL

You might also like