0% found this document useful (0 votes)

34 views53 pages

Report Final Submit

The document discusses anomaly detection in surveillance video. It talks about using object detection algorithms like YOLO for tasks like fire and crash detection from video footage. The document outlines the requirements, design, and results of building such a system.

Uploaded by

agusain124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views53 pages

Report Final Submit

Uploaded by

agusain124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

ANOMALY DETECTION IN SURVEILLANCE VIDEO

(for the partial fulfillment of Bachelor of Technology Degree in Computer Science &

Engineering)

Submitted by

AMAN GUSAIN

GOURAV SINGH

MRIDUL GUSAIN

SHIVAM NEGI

Under the guidance of

Dr. Ashok Kumar Sahoo

PROFESSOR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GRAPHIC ERA HILL UNIVERSITY

May, 2023
CERTIFICATE

This is to certify that the thesis titled “Anomaly Detection in Surveillance Video”
submitted by Aman Gusain, Gourav Singh, Mridul Gusain, Shivam Negi, to Graphic
Era Hill University for the award of the degree of Bachelor of Technology, is a genuine
representation of their research work under our guidance. The entirety or in parts, has not
been previously submitted to any other educational institution or University for the
purpose of obtaining any other degree or diploma.

Dr. Ashok Kumar Sahoo

(Professor)
GEHU, Dehradun
Place: Dehradun
Date: 10-5-2023
ACKNOWLEDGEMENT

We would like to extend our sincere appreciation to all those who have contributed to the
completion of this report on the AI Trainer For Yoga Practices First and foremost, we
would like to express our heartfelt thanks to Dr. Ashok Kumar Sahoo, our mentor, for his
invaluable guidance, support, and insightful suggestions. His expertise in both AI and
Yoga has played a vital role in shaping the direction and depth of this report. We are truly
grateful for his patience and wisdom. We would also like to acknowledge the experts and
practitioners who generously shared their knowledge and experiences with us. Their
contributions have provided us with a comprehensive understanding of the subject matter
and enriched the content of this report. Furthermore, we would like to thank our peers
and classmates for their valuable discussions and feedback, which have helped refine and
strengthen our ideas. Their collaboration has been instrumental in shaping the final
outcome of this report. We would also like to express our appreciation to the faculty
members and staff of Graphic Era Hill University for their support and provision of
necessary resources throughout this project. Finally, we extend our heartfelt gratitude to
our families and friends for their unwavering belief in us and constant encouragement
throughout this endeavor. Their love and support have been the driving force behind our
accomplishments.
Thank you

Aman Gusain Gourav Singh

Roll no.1918186 Roll no.1918346

Mridul Gusain Shivam Negi

Roll no.1918490 Roll no.1918693

1
ABSTRACT

World is growing rapidly and it is proportionally becoming harder to manage and secure
areas manually. That’s where the concept of surveillance dissolves in, it is now better and
efficient to equip surveillance cameras with a system in place to alert the right authority if
something irregular gets caught by it. Anomaly Detection System [5] does just that, it
observes the footage and when a variation occurs it performs the encoded task such as
alerting the relevant authorities. Video Surveillance [6] cameras are being used
everywhere to observe and secure administrations, institutions, organizations and
pathways. The crucial feature that the security application should provide is real-time
anomaly detection. The video anomalies can be defined as the variations from the actual
dataset [7] provided. The system should detect violent activities, traffic violations and
such out of the ordinary events. Anomaly detection is a demanding and tedious activity as
the nature of the footage depends on various natural variables such as human errors,
behaviors and instincts.

Video Surveillance systems and Anomaly Detection Systems has become the first wall of
defense and security against various threats as well as accidents. With such systems in
place, we no more worry about someone slacking off their job or any human errors. Even
the time taken by humans or non-automated procedures to contact and alert the
authorities about a threat or an accident is now done in a matter of milliseconds.

2
TABLE OF CONTENTS

ACKNOWLEDGEMENT i
ABSTRACT ii
LIST OF TABLES iv
LIST OF FIGURES v
ABBREVIATIONS vi
NOTATIONS vii
1. INTRODUCTION 7
2. PREMISE 8
2.1 Object Detection 8

2.2 YOLO Intro 9

2.3 YOLO Architecture 11

3. REQUIREMENT ANALYSIS 16
3.1 Libraries and Tool 16

3.2 Function Definition 20

4. PROJECT DESIGN 23
4.1 Fire Detection 23

4.2 Crash detection 25

5. RESULT 28

6. CONCLUSION AND FUTURE SCOPE 31

A. A SAMPLE APPENDIX 32
B. REFERENCE
41

3
LIST OF FIGURES

1. Class Diagram of crash Detection …..25

2. Figure of crash Detection ….. 28
3. Figure of Fire Detection in a room ….. 29
4. Figure of Fire Spreading around the room ….. 30
5. Figure of Result Graph ……30

4
ABBREVIATIONS

GEHU Graphic Era Hill University

YOLO You Look Only Once

OpenCV Open-Source Computer Vision Library

NumPy Numerical Python

2D Two Dimensional

CPU Central Processing Unit

GPU Graphics Processing Unit

HTML Hypertext Markup Language

5
NOTATIONS

^ Power

* Multiplication

- Subtraction

/ Divide

6
1. CHAPTER 1

INTRODUCTION

Anomaly detection can be defined as differentiating variations in single or multiple

datasets. Anomaly detection can use both supervised and unsupervised learning to its
advantages.
Now-a-days every infrastructure and place are equipped with surveillance cameras to
improve the security substantially, for example: if you visit any administration, institution
or organization you will come across multiple such tools being used in these places. Even
though installing surveillance cameras can help monitor these places, there still lies a big
factor of human intervention which is required to operate and observe the footage
provided by those cameras.
Although these anomalies have a rare occurrence, they leave a great impact on social and
economic status. Therefore, to rule out the wasted labor and time on constantly keeping a
watch over the surveillance cameras, an automated anomaly detection system is of utmost
preference. The goal of such an automated system is to inform and alert such unusual
occurrences quickly and precisely.
The scope of the pattern of such anomalies is vast and complex, which adds into the
desire of the system being independent of previous information about such events.
Therefore, the system should be under minimal supervision.
2. CHAPTER 2
PREMISE

2.1 OBJECT DETECTION

Object detection is a fundamental task in computer vision that plays a crucial role in
various real-world applications. It involves the identification and localization of objects
of interest within images or videos. By accurately detecting and localizing objects,
computer vision systems can enable tasks such as autonomous driving, surveillance
systems, robotics, image and video analysis, and augmented reality. Object detection is a
challenging problem due to variations in object appearance, scale, occlusions, and
cluttered backgrounds. Over the years, several advanced algorithms and techniques have
been developed to address these challenges and improve the accuracy and efficiency of
object detection systems. These advancements have significantly contributed to the
progress of computer vision and its widespread adoption in numerous industries and
domains. Object detection enables machines to understand and interpret visual data,
empowering them to interact with and respond to the surrounding environment.
The importance of object detection arises from its ability to provide detailed information
about objects present in an image or video. It goes beyond simple classification, which
only assigns a label to the entire image, and instead provides precise localization by
identifying the bounding boxes around individual objects. This fine-grained
understanding of object location and recognition enables a wide range of applications that
require object-level analysis and decision-making.
Object detection finds applications in various domains:
Autonomous Driving: Object detection is crucial for enabling vehicles to perceive their
surroundings. It allows autonomous cars to detect and track pedestrians, vehicles, traffic
signs, and obstacles, ensuring safe and reliable navigation on the roads.
Surveillance Systems: Object detection forms the basis of surveillance systems, enabling
the identification and tracking of people, objects, or specific events in security footage. It
assists in monitoring public spaces, airports, shopping malls, and other areas requiring
8
enhanced security measures.
Robotics: Object detection is essential for robots to interact with their environment
effectively. Robots can identify and manipulate objects, navigate through cluttered
spaces, and perform tasks such as picking and placing objects in industrial and domestic
settings.
Image and Video Analysis: Object detection allows for automatic tagging and content-
based retrieval of images and videos. It facilitates organizing large image and video
datasets, enabling efficient search and retrieval of specific objects or scenes.
Augmented Reality: Object detection is a critical component of augmented reality
applications. By detecting and tracking objects in real-time, virtual content can be
accurately overlaid onto the real-world environment, enhancing user experiences and
enabling interactive virtual interactions.
The ability to accurately detect and localize objects within images and videos is a
challenging task due to variations in object appearance, scale, pose, occlusion, lighting
conditions, and complex backgrounds. Traditional object detection methods typically
involve multi-stage pipelines that include object proposal generation, feature extraction,
classification, and post-processing steps. However, these methods often suffer from slow
processing times, making them unsuitable for real-time applications.
In this context, YOLO (You Only Look Once) emerged as a revolutionary object
detection algorithm that provides real-time detection with impressive accuracy. It
simplifies the object detection pipeline by simply predicting bounding boxes directly and
class probabilities in a single pass, significantly improving speed and efficiency.
The subsequent sections will delve deeper into the architecture, training process,
evolution, and applications of YOLO, providing a comprehensive understanding of this
groundbreaking object detection algorithm.

2.2 Introduction to YOLO (You Only Look Once)

YOLO, which stands for "You Only Look Once," is a groundbreaking object detection
algorithm that has revolutionized the field of computer vision. It offers a fast and accurate
approach to object detection, overcoming many limitations of traditional methods.
9
YOLO's key innovation is its capacity to perform object detection in real-time, making it
highly efficient and suitable for a wide range of applications.
Unlike traditional object detection methods that involve complex multi-stage pipelines,
YOLO takes a different approach. It formulates object detection as a single regression
problem, predicting the class probabilities and the bounding box coordinates in a single
pass.
The main advantages of YOLO can be summarized as follows:
Simplicity: YOLO simplifies the object detection process by eliminating the need for
complex region proposal mechanisms and post-processing steps. Instead, it directly
predicts bounding boxes and class probabilities, making the overall architecture more
streamlined and efficient.
Real-time Performance: YOLO's architecture allows it to achieve real-time object
detection, capable of processing images or videos at impressive speeds.
Contextual Information: YOLO considers the global context of the entire image while
making predictions. By analyzing the entire image in one pass, YOLO can capture spatial
relationships between objects and better understand the overall scene context.
High Accuracy: Despite its real-time capabilities, YOLO maintains a high level of
accuracy in object detection. Its architecture effectively incorporates usage of both low
and high-level features, providing it the abilty to capture fine-grained details and
contextual information.
Generalization: YOLO demonstrates excellent generalization capabilities across
different object categories and varying object sizes. It can detect objects of different
scales and shapes without relying on multiple network branches or image pyramid
representations.
YOLO has seen several iterations, with each version introducing improvements and
refinements. YOLOv1 was the initial release, which laid the foundation for subsequent
versions. YOLOv2, also known as YOLO9000, extended the capabilities of YOLO to
detect a vast number of object classes by leveraging additional labeled data. YOLOv3
further improved accuracy and introduced the concept of feature pyramid networks
(FPN) to handle objects at different scales.
10
The evolution of YOLO demonstrates the ongoing research and innovation in the field of
object detection, aiming to strike a balance between speed and accuracy. Researchers
continue to explore advancements in network architectures, training strategies, and post-
processing techniques to further enhance the performance of YOLO and its variants.
With its simplicity, real-time performance, and high accuracy, YOLO has become an
influential object detection algorithm, finding applications in autonomous vehicles,
surveillance systems, robotics, augmented reality, and more. Its impact on computer
vision research and practical implementations continues to grow, propelling the field of
object detection forward.

2.3 The YOLO Architecture:

YOLO consists of two main components: the backbone network and the detection layer.
Each of these components in detail, including their formulas are as follows:
Backbone Network:
The backbone network, typically based on the Darknet architecture, serves as a feature
extractor. It processes the input image and produces feature maps that capture relevant
information about objects.
Let's denote the input image as I, and the backbone network as f. The output feature maps
from the backbone network can be represented as F = f(I).

Grid System and Anchor Boxes:

YOLO divides the input image into a grid of cells and associates each cell with a fixed
number of anchor boxes. The grid system enables precise localization, and the anchor
boxes represent prior knowledge about object shapes and sizes.
Assume the input image is divided into an S x S grid, and there are B anchor boxes
associated with each cell. For each anchor box, we have the width (w_b) and height (h_b)
defined.
Bounding Box and Class Predictions:
For each cell, YOLO predicts bounding boxes and class probabilities.
11
The bounding box predictions (b_x, b_y, b_w, b_h) for each anchor box can be
calculated using formulas:
b_x = (sigmoid(t_x) + i) / S,
b_y = (sigmoid(t_y) + j) / S,
b_w = (exp(t_w) * w_b) / S,
b_h = (exp(t_h) * h_b) / S.
Here, (i, j) represents the index of the cell in the grid, sigmoid is the sigmoid activation
function, and exp represents the exponential function.
YOLO also predicts the probabilities of different object classes for each cell. Let's
assume there are C object classes in total. The network outputs a probability distribution
(p_1, p_2, ..., p_C) representing the confidence scores for each class.
Final Predictions and Non-Maximum Suppression (NMS):
The final predictions in YOLO are obtained by combining the bounding box predictions
and class probabilities.
For each cell and anchor box, the confidence score (c) is calculated as the product of the
class probability and the objectness score, which represents the probability of an object
being present.
The final bounding box predictions (x, y, w, h) are obtained by multiplying the relative
coordinates (b_x, b_y, b_w, b_h) with the width and height of the input image.
The final predicted class probabilities are obtained by multiplying the confidence score
with the class probabilities.
During inference, the redundant bounding box predictions can cause problems so NMS is
applied to retain only the most confident and accurate detections.
Conclusion:
The YOLO architecture, with its streamlined design and efficient prediction process, has
revolutionized object detection. By directly predicting bounding box coordinates and
class probabilities in a single pass, YOLO achieves real-time performance without
compromising accuracy. Its simplicity, real-time capabilities, and accuracy make it
suitable for a wide range of applications, including video analysis, autonomous driving,
robotics, and more.
12
YOLO has evolved through multiple versions, with each iteration introducing
improvements and refinements to enhance its performance. Researchers continue to
explore variations of YOLO, incorporating advanced network architectures and training
strategies to push the boundaries of object detection.
Overall, YOLO's architecture and formulas demonstrate its ability to overcome the
limitations of traditional object detection methods, offering a faster and more efficient
approach to real-time object detection tasks.

YOLO Training:
Training and inference are essential stages in utilizing the YOLO (You Only Look Once)
object detection algorithm. In this section, we will explore the training process of YOLO,
including loss computation and backpropagation. We will also cover the inference phase,
where object detection is performed using the trained model.
Training Process:
Data Preparation:
To train YOLO, you need a labeled dataset with bounding box annotations and
corresponding object classes. Each annotation consists of the object's bounding box
coordinates (x, y, width, height) and the class label.
Loss Computation:
YOLO utilizes a combination of localization loss, confidence loss, and class loss to train
the model. These losses are computed for each bounding box prediction and contributes
to the average loss function.
a. Localization Loss (L_coord):
The localization loss measures the discrepancy between the predicted bounding box
coordinates and the ground truth coordinates. It is computed using the mean squared error
(MSE) loss or a similar regression loss function. The formula for the localization loss is:

L_coord = _coord * [(x - x')^2 + (y - y')^2 + (sqrt(w) - sqrt(w'))^2 + (sqrt(h) - sqrt(h'))^2],

where (x, y, w, h) are the predicted bounding box coordinates, (x', y', w', h') are the
ground truth coordinates, and λ_coord is a coefficient that balances the influence of the
13
localization loss.
b. Confidence Loss (L_obj and L_noobj):
The confidence loss evaluates the objectness confidence of predicted bounding boxes. It
consists of two components: L_obj, which measures the confidence for boxes containing
objects, and L_noobj, which measures the confidence for background boxes.
L_obj is computed as the binary cross-entropy loss between the predicted objectness
score (c) and the indicator function (I_obj) that indicates whether an object is present in
the ground truth box. The formula for L_obj is:
L_obj = [I_obj * log(c) + (1 - I_obj) * log(1 - c)].
L_noobj is computed similarly but considers the confidence for background boxes where
no object is present:
L_noobj = [I_noobj * log(c) + (1 - I_noobj) * log(1 - c)],
where I_noobj is the indicator function for background boxes.
c. Class Loss (L_class):
The class loss measures the discrepancy between the predicted class probabilities and the
ground truth class labels. It is computed using the categorical cross-entropy loss. The
formula for the class loss is:
L_class = _class * [I_obj * (p * log(p') + (1 - p) * log(1 - p'))],
where p is the predicted class probability, p' is the ground truth class probability, and
λ_class is a coefficient that balances the influence of the class loss.
The overall loss function for YOLO training is given by:
Loss = L_coord + L_obj + L_noobj + L_class.
Backpropagation and Parameter Updates:
Once the loss is computed, backpropagation is performed to calculate the gradients of the
model parameters with respect to the loss. The gradients are then used to update the
model weights using optimization algorithms like stochastic gradient descent (SGD),
Adam, or RMSprop.
Iterative Training:
YOLO is typically trained iteratively over multiple epochs, where each epoch involves
passing the training dataset through the network, computing the loss, performing
14
backpropagation, and updating the model parameters. The model gradually learns to
improve its object detection capabilities during these iterations.
Inference Process:
After training, the YOLO model can be used for object detection on new unseen images.
The inference process involves the following steps:
Image Preprocessing:
The input image is preprocessed to match the network's input requirements, such as
resizing, normalization, and format conversion.
Forward Pass:
The preprocessed image is passed through the trained YOLO network. The network
performs a forward pass, extracting features, predicting bounding boxes, and class
probabilities for objects present in the image.
Non-Maximum Suppression (NMS):
To eliminate redundant detections, a post-processing step called non-maximum
suppression (NMS) is applied. NMS removes overlapping bounding boxes based on their
confidence scores, retaining only the most confident and accurate detections.
Output Visualization:
Finally, the remaining bounding boxes and their associated class labels and probabilities
are visualized on the input image, providing the detected objects' locations and
categories.
By following these steps, YOLO can efficiently detect objects in real-time with high
accuracy, making it a popular choice for various computer vision applications.
It's important to note that the specific formulas and techniques mentioned above are
based on the original YOLO algorithm (YOLOv1). Newer versions, such as YOLOv2,
YOLOv3, YOLOv4, and YOLOv5, introduce variations and improvements to the
architecture, loss functions, and training strategies, enhancing the overall performance of
YOLO.

15
3. CHAPTER 3
REQUIREMENT ANALYSIS
(SRS should be included in this chapter i.e., use case, DFD, Class
Diagram, ER diagram etc.)

3.1 Libraries and Tools:

The code uses the following libraries and tools:
OpenCV:
OpenCV is the one of the most if not most popular open-source libraries for tasks related
to image processing and computer vision. It provides a truck load of functions for the
user to use in their work.
Here are some key features and functionalities of OpenCV:
Image and Video I/O: OpenCV allows you to manipulate images as per your request
providing you with both read and write functions even in different formats. It provides
functions for loading images from files, capturing video streams from cameras, and
saving processed images or videos.
Image Processing: OpenCV offers a comprehensive set of image processing functions,
including filtering, transformations, color space conversions, thresholding, edge
detection, morphological operations, and more. These functions allow you to enhance,
manipulate, and analyze images.
Feature Detection and Extraction: OpenCV provides algorithms for detecting and
extracting various image features, such as corners (e.g., Harris Corner Detector, Shi-
Tomasi Corner Detector), blobs (e.g., Difference of Gaussians, Laplacian of Gaussians),
and edges (e.g., Canny Edge Detector). These features are essential for tasks like object
recognition, tracking, and matching.
Object Detection and Tracking: OpenCV includes pre-trained models and functions for
object detection and tracking. It supports popular object detection algorithms like Haar
16
cascades and HOG (Histogram of Oriented Gradients) for detecting objects in images or
videos. Additionally, OpenCV integrates with deep learning frameworks like TensorFlow
and PyTorch, allowing you to utilize powerful object detection models such as YOLO,
SSD, and Faster R-CNN.
Camera Calibration and 3D Vision: OpenCV provides tools for camera calibration,
allowing you to estimate camera parameters and correct lens distortion. It also offers
functions for 3D reconstruction, stereo vision, and depth estimation using multiple
cameras.
Machine Learning and Computer Vision Algorithms: OpenCV incorporates various
machine learning algorithms and computer vision techniques, including support vector
machines (SVM), k-nearest neighbors (KNN), decision trees, clustering algorithms, and
more. These algorithms can be used for tasks like classification, clustering, and
segmentation.
GUI and User Interface: OpenCV includes graphical user interface (GUI) functions for
creating windows, displaying images and videos, handling mouse and keyboard events,
and drawing annotations on images.
Integration and Language Support: OpenCV is written in C++ but provides APIs and
bindings for multiple programming languages, including Python, Java, and MATLAB.
This allows developers to use OpenCV functionalities in their preferred programming
languages.
OpenCV is widely used in various domains, including robotics, augmented reality,
medical imaging, surveillance, and automation. Its extensive collection of functions and
algorithms make it a versatile tool for computer vision tasks.
NumPy:
NumPy is a Python library that allows us to work on single as well as multidimensional
arrays as well as providing us many functions to manipulate it. It offers features such as:
ndarray: A powerful N-dimensional array object for storing and manipulating large
datasets.
Array operations: Mathematical functions and operators to perform element-wise
computations on arrays.
17
Broadcasting: A mechanism to handle arithmetic operations between arrays of different
shapes.
Linear algebra operations: Functions for matrix operations, solving linear equations,
eigenvalues, and more.
Random number generation: Tools for generating random numbers and arrays with
various distributions.
Indexing and slicing: Accessing and manipulating specific elements or subarrays of an
array.
Integration with other libraries: Seamless integration with scientific computing libraries
like SciPy and Matplotlib.
NumPy is widely used in fields like data analysis, machine learning, signal processing,
and simulation. Its efficient implementation and optimized C code make it significantly
faster than using Python's built-in lists for numerical computations. To use NumPy, you
import the library with the statement import numpy as np.
argparse:
argparse is a Python module that allows us with the ability to create command line
interfaces . It gives us a simple way to define arguments without any complexity.
Key features of argparse include:
Argument Parsing: argparse allows you to define the arguments your program expects
and parse them from the command line. You can specify the argument type, default
values, help messages, and more.
Positional Arguments: argparse supports defining positional arguments, which are
mandatory arguments specified without a preceding flag or option. These arguments are
typically used to specify input files, directories, or other required parameters.
Optional Arguments: argparse allows you to define optional arguments, which are
specified with flags or options. These arguments are typically used to modify the
behavior of the program or provide additional configuration options.
Argument Types and Validation: argparse provides built-in support for common
argument types, such as strings, integers, floating-point numbers, and file paths. It also
allows you to define custom argument types and perform validation on the input values.
18
Help Messages: argparse automatically generates help messages based on the argument
definitions, making it easy for users to understand the available options and their usage.
This helps create user-friendly command-line interfaces.
Subcommands: argparse supports defining subcommands, which allow you to create
more complex command-line interfaces with different sets of arguments and behaviors
for each subcommand. This is useful for building modular command-line tools.
Error Handling: argparse handles error cases, such as missing arguments or invalid
values, and provides informative error messages to guide users in correcting their input.
Integration with Python's sys.argv: argparse integrates seamlessly with the sys.argv list. It
parses the arguments and extracts the relevant values for easy use in your code.
argparse simplifies the process of building command-line interfaces, providing a
standardized and user-friendly way to handle arguments. It is part of Python's standard
library, so no additional installation is required. By using argparse, you can create robust
and user-friendly command-line programs with minimal effort.
MATH:
To use mathematical functions on a python program we need a library . The math module
is the one that provides them.
With the math module, you can perform basic arithmetic operations, such as finding the
square root, raising a number to a power, and obtaining the absolute value of a number.
Additionally, it offers trigonometric functions like sine, cosine, and tangent, as well as
logarithmic and exponential functions. These functions are particularly useful for
scientific calculations, engineering applications, and mathematical modeling.
The math module also includes functions for rounding numbers, obtaining the greatest
common divisor (GCD) between two integers, and calculating factorials. It provides
constants like pi and e, which are frequently used in mathematical computations.
Furthermore, the math module supports advanced mathematical operations, such as
hyperbolic functions, error functions, and complex number operations. It also provides
tools for working with floating-point numbers, handling infinity and NaN (Not a
Number) values, and converting between degrees and radians.

19
scipy.optimize.linear_sum_assignment:
The linear_sum_assignment function in the scipy.optimize module is a powerful tool for
solving the linear sum assignment problem, also known as the assignment problem. This
problem involves finding the optimal assignment of a set of tasks to a set of agents, with
each agent having a specific cost or benefit associated with performing each task. The
goal is to minimize the total cost or maximize the total benefit of the assignment.
The linear_sum_assignment function implements the Hungarian algorithm, which
efficiently solves the assignment problem by finding the optimal assignment in
polynomial time. It takes as input a cost matrix, where each element represents the cost or
benefit of assigning a particular task to a specific agent. The cost matrix can be a square
matrix or a rectangular matrix.
The linear_sum_assignment function returns two arrays: the row indices and the column
indices that represent the optimal assignment. The row indices correspond to the tasks,
and the column indices correspond to the agents. Each element in the row indices array
indicates which agent is assigned to the corresponding task.
By using the linear_sum_assignment function, you can easily solve various optimization
problems that involve task assignment, such as job scheduling, resource allocation, and
matching problems. It provides an efficient and reliable solution for finding the optimal
assignment based on the given costs or benefits.
Overall, the linear_sum_assignment function in scipy.optimize is a valuable tool for
solving the assignment problem. It simplifies the process of finding the optimal
assignment by implementing the Hungarian algorithm, allowing you to efficiently
allocate tasks to agents and optimize the overall cost or benefit of the assignment.

3.2 Function Definitions:

The code defines the following functions:
load_yolo
This function loads the YOLOv4-tiny[8] object detection model from a pre-trained
weights file and configuration file, and returns the network and output layer names.

20
start_webcam
This function starts the webcam and returns a VideoCapture object.
start_video
This function loads a video file and returns a VideoCapture object.
euclidean_distance
This function calculates the Euclidean distance[9] between two points.
calculate_cost_matrix
This function calculates the cost matrix for linear assignment optimization. It takes two
lists of positions as input, and returns a 2D NumPy array of distances between each pair
of positions.
detect_crash
This is the main function that performs crash detection. It takes the network, output
layers, and a VideoCapture object as input. It reads frames from the video stream, detects
vehicles in each frame using the YOLOv4-tiny model, and keeps track of their positions.
If the system detects a crash, it saves a frame of the video as an image.
torch.cuda.get_device_properties():
Returns a named tuple containing the properties of the specified GPU device.
os.chdir():
Changes the current working directory to the specified path.
glob.glob():
Returns a list of file paths that match a specified pattern.
cv2.VideoCapture():
Initializes a video capture object to read frames from a video file.
matplotlib.animation.FuncAnimation():
Creates a new animation object that repeatedly calls a function to update the plot with
new data.
Image(filename=imageName, width=400):
Displays an image file with the specified filename and width.
plot_results():
Plots the training losses and performance metrics from a CSV file.
21
detect.py:
A script that runs object detection on images or videos using a trained YOLOv5[11]
model.
create_animation():
Creates a Matplotlib animation object from a list of images.
display():
Displays a Matplotlib figure or image object.
Image() (from IPython.display):
Displays an image file or object.

22
4. CHAPTER 4
PROJECT DESIGN

First the user needs to choose between the two modules of what he wants to detect crash
or fire detection and then give the path of the video to the respective module for the
detection.
4.1 Fire Detection
1. Import necessary libraries for the YOLOv5 model.
2. Set the current working directory to the yolov5 folder.
3. Check the torch version and device (CPU or GPU) being used.
4. Train the YOLOv5 model using the train.py script with the following parameters:
 Image size of 640x640 pixels.
 Batch size of 16.
 Train for 3 epochs.
 Use the fire_config.yaml file for data configuration.
 Start with the pre-trained yolov5s.pt weights.
 Use one worker for data loading.
5. Plot the training losses and performance metrics using the plot_results function from
the utils.plots library.
6. Predict on a set of images using the detect.py script with the following parameters:
 Use the best.pt weights generated during training.
 Image size of 640x640 pixels.
 Confidence threshold of 0.25.
 Source images are located in the ../datasets/fire/val/images/ directory.
7. Display the resulting detections on the first 3 test images using the display and Image
functions.
8. Predict on a video using the detect.py script with the following parameters:
 Use the best.pt weights generated during training.

23
 Image size of 640x640 pixels.
 Confidence threshold of 0.25.
 The source video is located in the runs/detect/exp2/ directory.
9. Read in the frames of the resulting video using OpenCV and store them in a list.
10. Create an animation of the resulting frames using the create_animation function,
which uses Matplotlib [11] and HTML5 for the animation.
11. Display the animation of the resulting frames.
12. Visualize the feature map of an image using the detect.py script with the following
parameters:
 Use the best.pt weights generated during training.
 Image size of 640x640 pixels.
 Confidence threshold of 0.25.
 Source image to visualize is defined as a variable called image_path.
 Add the --visualize flag to the detect.py script to generate the feature map.
13. Display the resulting feature map image using the display and Image functions.

This is a Python script designed to train a YOLOv5 model for object detection on images
and videos using the PyTorch[12] deep learning framework. The model is trained on a
custom dataset of fire images and videos.
The script begins by importing necessary libraries and setting the working directory. It
then trains the YOLOv5 model using the "train.py" script, passing in arguments such as
image size, batch size, number of epochs, dataset configuration file, pre-trained weights,
and number of workers.
After training the model, the script plots the training losses and performance metrics
using the "plot_results" function from the "utils.plots" module.
The script performs object detection on images using the trained model, passing in
arguments such as the path to the saved model weights, image size, confidence threshold,
and source directory of the images. It also displays the inference results on some test
images.

24
Next, the script performs object detection on a video file, passing in the path to the saved
model weights, image size, confidence threshold, and source file of the video. It then
reads the frames of the video and stores them as a list of images.
Finally, the script visualizes the feature map of a single image by running object
detection on it and generating an output file of the feature map. It then displays the
feature map image.
Overall, this script demonstrates how to train and evaluate a custom YOLOv5 model for
object detection on images and videos, as well as how to visualize the feature map of an
image using the trained model.

4.2 Crash Detection

Class Diagram 1

1. Load the YOLOv4-tiny module and output layer names.

2. if using a webcam, start the webcam. if using a video file, load the video file.
3. Read frames from the video stream.
4. Detects vehicles in each frame using the YOLOv4-tiny module.
5. Keep track of the position of the vehicles in each frame.
6. Calculate the cost matrix for linear assignment optimization using the position of the
vehicles in the current frame and the previous frame.
7. Find the optimal pairs of vehicle positions between the current frame and the previous
frame using linear assignment optimization.

25
8. Calculate the distance between each pair of vehicle positions.
9. If the distance between a pair of vehicles is below a certain threshold, and their
relative velocity is above a certain threshold, trigger a crash detection.
10. Save a frame of the video as an image if a crash is detected.
11. Repeat steps 3-10 until the end of the video stream is reached.
12. Release the video stream and close all windows.

The code is a Python script for detecting crashes between vehicles in a video stream or
webcam feed using YOLOv4 object detection and the Hungarian algorithm [13] for
vehicle tracking. The main function detect_crash() reads in frames from the video stream,
applies YOLOv4 object detection to detect vehicles, and tracks the position of each
detected vehicle using the Hungarian algorithm. If a crash is detected between two
vehicles, the function saves the frame as an image file and outputs a message indicating
that a crash has been detected.
The script uses several Python packages, including OpenCV, NumPy, argparse, math,
and scipy.optimize. It also requires two pre-trained files, yolo v4-tiny. weights and
yolov4-tiny.cfg, which are used by the YOLOv4 object detection model.
The script takes two optional arguments, --webcam and --video_path. If --webcam is set
to True, the script will use the computer's default webcam as the video source. If --
webcam is set to False (the default value), the script will use the video file specified in --
video_path as the video source.
The load_yolo() function loads the YOLOv4 object detection model from the pre-trained
files and returns the model and a list of its output layer names. The start_webcam() and
start_video() functions create a VideoCapture object for reading frames from the webcam
or video file, respectively.
The euclidean_distance() function is responsible for calculating the Euclidean distance
between two points in a two-dimensional space. It is utilized in the object detection code
to measure the distance between the positions of vehicles in two consecutive frames.
The Euclidean distance is a straight-line distance between two points in a Cartesian
coordinate system.The calculate_cost_matrix() function utilizes the euclidean_distance()
26
function to construct a cost matrix for the Hungarian algorithm. The cost matrix
represents the pairwise distances between the positions of vehicles in two consecutive
frames. Each element of the matrix corresponds to the cost or distance between two
positions.
The calculate_cost_matrix() function takes two lists of positions: positions1 and
positions2. It iterates over each pair of positions and calculates the Euclidean distance
between them using the euclidean_distance() function. The distances are then stored in a
two-dimensional numpy array, representing the cost matrix.
The cost matrix is later used by the Hungarian algorithm (implemented in
linear_sum_assignment()) to find the optimal assignment of vehicles between frames
based on minimizing the overall cost. This assignment helps determine the
correspondence between vehicles in different frames, facilitating the detection of
potential collisions or crashes.
By utilizing the Euclidean distance and the cost matrix, the object detection code can
effectively track and analyze the movements of vehicles over time, enabling the detection
of potential crashes or collisions between them.

.
The detect_crash() function applies YOLOv4 object detection to each frame in the video
stream, tracks the positions of the detected vehicles using the Hungarian algorithm, and
saves the frame as an image file if a crash is detected. The function uses OpenCV to read
and display the video stream.
The script outputs a message indicating that the crash detection has started and displays
the video stream. If a crash is detected, the script outputs a message indicating that a
crash has been detected and saves the frame as an image file in a subdirectory named
crash_frames.
Overall, the code provides a simple but effective way to detect crashes between vehicles
in a video stream or webcam feed using YOLOv4 object detection and the Hungarian
algorithm.

27
5.

28
5 CHAPTER 5
RESULT
The Crash Detection Module is an important part of the project that helps to detect
potential crashes and save snapshots of the moments leading up to the accident. The
functionality of this module can vary depending on the specific arguments used in the
project. It is essential to specify the video path when building the Crash Detection
Module, as failing to do so can result in a black screen and an immediate exit.
In situations where an accident does occur, the module will save a snapshot of the
incident, along with a warning message displayed on the terminal. These snapshots
provide valuable information about the events leading up to the crash and can be used for
further analysis and investigation.

Figure 1 Crash Detected

The above snapshots show that accident happened and it will save the snapshot of the
incident.
Fire Detection Module is the other part of the project that helps the user to quickly
identify fires so that preventive measures or relief measures can be taken place.

29
Figure 2 Fire Detection in room

The Fire Detection Module processes the frames of the video and makes changes to the
video by creating a bounding box if it detects fire using the create animation function.
The YOLOV5 model was trained at 10 epochs and with batch size 16 and the images
were of size 640 x 640 pixels, the confidence of (0.25). Used 1 worker for data loading.
As a result, the time spent on training the model was 10 hrs. When the model was tested
on a mp4 video it showed the bounding boxes for fire in the video.

30
Figure 3 Fire Spreading in a room

Figure 4 Result Graph

The first three columns are the YOLOV5 model containing the loss components which
are the box loss, the objectness loss and the classification loss, the rightmost columns are
31
the precision and recall, these are the indicators of how well the algorithm predicts the
object. These show that classes used like fire weapons are accurately recognized during
the training process. The model is suitable for detecting accidental fires and wildlife fires,
it performs well in open environments and can be used for presence of fire or in frame to
be classified as Anomaly.

32
6. CHAPTER 6
CONCLUSION AND FUTURE SCOPE
The code provides an efficient and effective crash detection system that uses object
detection and linear assignment optimization. The system can be run on a webcam stream
or a video file, making it suitable for a wide range of applications. The code can be
further optimized and improved by adjusting the detection thresholds and fine-tuning the
YOLOv4-tiny model for specific use cases. In future we can also build a frontend using a
multiprocessing system to combine the two modules using multiprocessing to achieve
better results.Despite training the model for only a few epochs it was observed that the
model still performed fairly well.During the evaluation of the trained model, an issue was
identified where the model incorrectly predicted red pandas as fire. This misclassification
could be attributed to the limited number of negative samples in the training dataset. To
address this problem and enhance the model's performance, a potential solution is to
incorporate additional images that contain non-labeled fire objects as negative samples.
By including such images, the model can learn to differentiate between fire and other
objects that may share similar visual characteristics. The authors of the YOLOv5 model
recommend utilizing approximately 0-10% of the dataset as background images to
mitigate false positive predictions.
By incorporating more diverse and representative negative samples, the model can better
understand the distinguishing features of fire and improve its ability to accurately detect
and classify fire objects. This refinement process helps to enhance the overall
performance and reliability of the object detection system.

33
APPENDIX A

(Code)

import cv2

import numpy as np

import argparse

import math

from scipy.optimize import linear_sum_assignment

import os

parser = argparse.ArgumentParser()

parser.add_argument('--webcam', help="True/False", default=False)

parser.add_argument('--video_path', help="Path of video file", default="video.mp4")

args = parser.parse_args()

def load_yolo():

net = cv2.dnn.readNet("yolov4-tiny.weights", "yolov4-tiny.cfg")

layer_names = net.getLayerNames()

unconnected_out_layers = net.getUnconnectedOutLayers()

output_layers = [layer_names[i - 1] for i in unconnected_out_layers.flatten()]

return net, output_layers

34
def start_webcam():

cap = cv2.VideoCapture(0)

return cap

def start_video(video_path):

cap = cv2.VideoCapture(video_path)

return cap

def euclidean_distance(pt1, pt2):

return math.sqrt((pt1[0] - pt2[0]) 2 + (pt1[1] - pt2[1]) 2)

def calculate_cost_matrix(positions1, positions2):

cost_matrix = np.zeros((len(positions1), len(positions2)))

for i, pos1 in enumerate(positions1):

for j, pos2 in enumerate(positions2):

cost_matrix[i, j] = euclidean_distance(pos1, pos2)

return cost_matrix

def detect_crash(net, output_layers, cap):

vehicle_positions = []

prev_vehicle_positions = []

frame_count = 0

35
crash_count = 0

if not os.path.exists("crash_frames"):

os.makedirs("crash_frames")

while True:

ret, frame = cap.read()

if not ret:

break

blob = cv2.dnn.blobFromImage(

frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)

net.setInput(blob)

outs = net.forward(output_layers)

conf_threshold = 0.25 ### Adjust as per your requirement

class_ids = []

confidences = []

boxes = []

for out in outs:

36
for detection in out:

scores = detection[5:]

class_id = np.argmax(scores)

confidence = scores[class_id]

if confidence > conf_threshold and class_id == 2:

center_x, center_y, w, h = (detection[0:4] * np.array(

[frame.shape[1], frame.shape[0], frame.shape[1],

frame.shape[0]])).astype('int')

x, y = int(center_x - w / 2), int(center_y - h / 2)

boxes.append([x, y, w, h])

confidences.append(float(confidence))

class_ids.append(class_id)

indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, 0.2)

vehicle_positions = [

(x + w // 2, y + h // 2) for i in np.array(indices).flatten() for x, y, w, h in

[boxes[i]]]

if frame_count > 0 and vehicle_positions and prev_vehicle_positions:

cost_matrix = calculate_cost_matrix(

37
prev_vehicle_positions, vehicle_positions)

row_indices, col_indices = linear_sum_assignment(cost_matrix)

for row, col in zip(row_indices, col_indices):

prev_pos = prev_vehicle_positions[row]

current_pos = vehicle_positions[col]

distance = euclidean_distance(prev_pos, current_pos)

if distance < 10: ### Adjust as per your requirement

velocity = cost_matrix[row, col]

if velocity > 9: ### Adjust as per your requirement

print("Crash detected")

cv2.putText(frame, "Crash Detected", (50, 50),

cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 4)

# Save the frame as an image file

crash_count += 1

crash_frame_path = f"crash_frames/crash_{crash_count:04d}.png"

cv2.imwrite(crash_frame_path, frame)

print(f"Crash frame saved to {crash_frame_path}")

break
38
prev_vehicle_positions = vehicle_positions

frame_count += 1

cv2.imshow("Frame", frame)

if cv2.waitKey(1) == 27:

break

cap.release()

cv2.destroyAllWindows()

if __name__ == '__main__':

webcam = args.webcam

video_path = args.video_path

net, output_layers = load_yolo()

if webcam:

print('---- Starting Web Cam crash detection ----')

39
cap = start_webcam()

else:

print('---- Starting Video crash detection ----')

cap = start_video(video_path)

detect_crash(net, output_layers, cap)

cv2.destroyAllWindows()

Weapon Detection

import cv2

import numpy as np

import argparse

import os

parser = argparse.ArgumentParser()

parser.add_argument('--webcam', help="True/False", default=False)

parser.add_argument('--play_video', help="True/False", default=False)

parser.add_argument('--video_path', help="Path of video file",

default="videos/fire1.mp4")

parser.add_argument('--verbose', help="To print statements", default=True)

args = parser.parse_args()
40
def load_yolo():

net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

classes = []

with open("obj.names", "r") as f:

classes = [line.strip() for line in f.readlines()]

layers_names = net.getLayerNames()

output_layers = [layers_names[i-1] for i in net.getUnconnectedOutLayers()]

colors = np.random.uniform(0, 255, size=(len(classes), 3))

return net, classes, colors, output_layers

def load_image(img_path):

img = cv2.imread(img_path)

img = cv2.resize(img, None, fx=0.4, fy=0.4)

height, width, channels = img.shape

return img, height, width, channels

def start_webcam():

41
cap = cv2.VideoCapture(0)

return cap

def display_blob(blob):

for b in blob:

for n, imgb in enumerate(b):

cv2.imshow(str(n), imgb)

def detect_objects(img, net, output):

blob = cv2.dnn.blobFromImage(img, scalefactor=0.00392, size=(320, 320),

mean=(0, 0, 0), swapRB=True, crop=False)

net.setInput(blob)

outputs = net.forward(output)

return blob, outputs

def get_box_dimensions(outputs, height, width):

boxes = []

confs = []

class_ids = []

for output in outputs:

for detect in output:

42
scores = detect[5:]

class_id = np.argmax(scores)

conf = scores[class_id]

if conf > 0.3:

center_x = int(detect[0] * width)

center_y = int(detect[1] * height)

w = int(detect[2] * width)

h = int(detect[3] * height)

x = int(center_x - w/2)

y = int(center_y - h / 2)

boxes.append([x, y, w, h])

confs.append(float(conf))

class_ids.append(class_id)

return boxes, confs, class_ids

def draw_labels(boxes, confs, colors, class_ids, classes, img, weapon_count):

# Create the crash_frames folder if it doesn't exist

if not os.path.exists("weapon_frames"):

os.makedirs("weapon_frames")

43
indexes = cv2.dnn.NMSBoxes(boxes, confs, 0.5, 0.4)

font = cv2.FONT_HERSHEY_PLAIN

for i in range(len(boxes)):

if i in indexes:

x, y, w, h = boxes[i]

label = str(classes[class_ids[i]])

color = colors[i % len(colors)]

cv2.rectangle(img, (x, y), (x+w, y+h), color, 5)

cv2.putText(img, label, (x, y - 5), font, 2, color, 2)

if "Gun" in label or "Rifle" in label:

print("Weapon detected")

weapon_count += 1

weapon_frame_path = f"weapon_frames/weapon_{weapon_count:04d}.png"

cv2.imwrite(weapon_frame_path, img)

print(f"Weapon frame saved to {weapon_frame_path}")

img = cv2.resize(img, (800, 600))

cv2.imshow("Image", img)

return weapon_count

44
def webcam_detect():

model, classes, colors, output_layers = load_yolo()

cap = start_webcam()

weapon_count = 0 # Initialize weapon_count outside the loop

while True:

_, frame = cap.read()

height, width, channels = frame.shape

blob, outputs = detect_objects(frame, model, output_layers)

boxes, confs, class_ids = get_box_dimensions(outputs, height, width)

weapon_count = draw_labels(boxes, confs, colors, class_ids, classes, frame,

weapon_count)

key = cv2.waitKey(1)

if key == 27:

break

cap.release()

def start_video(video_path):

model, classes, colors, output_layers = load_yolo()

cap = cv2.VideoCapture(video_path)

weapon_count = 0 # Initialize weapon_count outside the loop

while True:
45
ret, frame = cap.read()

if not ret:

break

height, width, channels = frame.shape

blob, outputs = detect_objects(frame, model, output_layers)

boxes, confs, class_ids = get_box_dimensions(outputs, height, width)

weapon_count = draw_labels(boxes, confs, colors, class_ids, classes, frame,

weapon_count)

key = cv2.waitKey(1)

if key & 0xFF == ord('q'):

break

cap.release()

if __name__ == '__main__':

webcam = args.webcam

video_play = args.play_video

if webcam:

if args.verbose:

print('---- Starting Web Cam object detection ----')

46
webcam_detect()

if video_play:

video_path = args.video_path

if args.verbose:

print(f'---- Starting Video object detection on {video_path} ----')

start_video(video_path)

cv2.destroyAllWindows()

Fire Detection

import torch

import os

import glob

from IPython.display import Image, display

get_ipython().run_line_magic('cd', 'yolov5')

print(f"Setup complete. Using torch {torch.version}

({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

get_ipython().system('python train.py --img 640 --batch 16 --epochs 3 --data

../fire_config.yaml --weights yolov5s.pt --workers 1')

47
from utils.plots import plot_results

plot_results('runs/train/exp/results.csv')

get_ipython().system('python detect.py --weights runs/train/exp/weights/best.pt --img

640 --conf 0.25 --source ../datasets/fire/val/images/')

images = glob.glob('runs/detect/exp/*.jpg')

for imageName in images[:3]: #assuming JPG

display(Image(filename=imageName, width=400))

get_ipython().system('python detect.py --weights runs/train/exp/weights/best.pt --img

640 --conf 0.25 --source ../input.mp4')

import cv2

vidcap = cv2.VideoCapture('runs/detect/exp2/input.mp4')

success,image = vidcap.read()

images = []

while success:

success,image = vidcap.read()

if success:

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

images.append(image)

from matplotlib import animation, rc

import matplotlib.pyplot as plt

rc('animation', html='jshtml')

48
def create_animation(ims):

fig = plt.figure(figsize=(9, 9))

plt.axis('off')

im = plt.imshow(ims[0])

def animate_func(i):

im.set_array(ims[i])

return [im]

return animation.FuncAnimation(fig, animate_func, frames = len(ims), interval =

1000//12)

create_animation(images)

image_path = "../datasets/fire/val/images/004dec94c5de631f.jpg"

display(Image(filename=image_path, width=400))

get_ipython().system('python detect.py --weights runs/train/exp/weights/best.pt --img

640 --conf 0.25 --source {image_path} --visualize')

display(Image(filename="runs/detect/exp3/004dec94c5de631f/
stage23_C3_features.png")

49
References
[1]YOLO Real-Time Object Detection Source: https://pjreddie.com/darknet/yolo/

[2] Grace Karimi’s Introduction to YOLO Algorithm for Object Detection

[3]Geethapriya. S, N. Duraimurugan, S.P. Chokkalingam (2019). “Real-Time Object

Detection with Yolo”. International Journal of Engineering and Advanced Technology
(IJEAT)

[4] Nicholas Renotte‘s Deep Drowsiness Detection using YOLO

Source:https://www.youtube.com/watch?v=tFNJGim3FXw

[5] What is Anomaly Detection? :

https://aws.amazon.com/what-is/anomaly-detection/#:~:text=Anomaly%20detection
%20is%20examining%20specific,increases%20manual%20tracking%20is
%20impractical.

[6] Video Surveillance - an overview : https://www.sciencedirect.com/topics/computer-

science/video-surveillance

[7]Fire and Smoke Dataset used in combination with Fire and Gun Dataset:
https://www.kaggle.com/datasets/dataclusterlabs/fire-and-smoke-dataset

[8] Fire and Dataset: https://www.kaggle.com/datasets/atulyakumar98/fire-and-gun-

dataset

[9] Euclidean Distance :

[10]Yolo v4: https://blog.paperspace.com/how-to-train-scaled-yolov4-object-detection/

[11] Yolo v5: https://towardsdatascience.com/how-to-train-a-custom-object-detection-

model-with-yolo-v5-917e9ce13208

[12] Pytorch: https://www.youtube.com/watch?

v=V_xro1bcAuA&ab_channel=freeCodeCamp.org

[13] Hungarian Algorithm: https://brilliant.org/wiki/hungarian-matching/

[14] Fire and smoke detection system using Jetson nano & Yolov5 with image dataset
from getty images : https://github.com/nikhilgawai/Fire_Detection

[15] wildfire-smoke-detection-research -> early wildfire smoke detection

50
[16]Fire Detection using CCTV images — Monk Library Application - keras classifier on
kaggle datasets, mobilenet-v2, densenet121 https://pub.towardsai.net/fire-detection-
using-cctv-images-monk-library-application-242df1fca2b9

[17]Early Fire detection system using deep learning and OpenCV - customized
InceptionV3 and CNN architectures for indoor and outdoor fire detection. 980 images for
training and 239 images for validation, training accuracy of 98.04 and a validation
accuracy of 96.43, openCV used for live detection on webcam - code
https://towardsdatascience.com/early-fire-detection-system-using-deep-learning-and-
opencv-6cb60260d54a.

Final Report PDF
No ratings yet
Final Report PDF
64 pages
MR - Chetan Seminar Report
No ratings yet
MR - Chetan Seminar Report
42 pages
Orr 29012024
No ratings yet
Orr 29012024
21 pages
Seminar Report
No ratings yet
Seminar Report
39 pages
Human Activity - Merged
No ratings yet
Human Activity - Merged
34 pages
FINALREPORTA6CV
No ratings yet
FINALREPORTA6CV
48 pages
Document 104
No ratings yet
Document 104
28 pages
Real Time Anomaly Detection in CCTV Surveillance
No ratings yet
Real Time Anomaly Detection in CCTV Surveillance
37 pages
Black Book 2020
No ratings yet
Black Book 2020
49 pages
Project-II B.Tech Format&guidelines
No ratings yet
Project-II B.Tech Format&guidelines
30 pages
Preliminary Papers
No ratings yet
Preliminary Papers
10 pages
BATCH 7 FINAL REPORT Updated
No ratings yet
BATCH 7 FINAL REPORT Updated
47 pages
Minor Report Final-1
No ratings yet
Minor Report Final-1
21 pages
Artificial Intelligence Based Real-Time Attendance System Using Face Recognition
No ratings yet
Artificial Intelligence Based Real-Time Attendance System Using Face Recognition
66 pages
OBJECT DETECTION AND IDENTIFICATION Report TC
No ratings yet
OBJECT DETECTION AND IDENTIFICATION Report TC
10 pages
Thesis ADS11
No ratings yet
Thesis ADS11
54 pages
A Seminar Report On
No ratings yet
A Seminar Report On
22 pages
Major A3
No ratings yet
Major A3
57 pages
Major Project Report - G-21 Group
No ratings yet
Major Project Report - G-21 Group
18 pages
Abstarct Merged
No ratings yet
Abstarct Merged
9 pages
Major Project Report
No ratings yet
Major Project Report
52 pages
FSD Report11
No ratings yet
FSD Report11
23 pages
TAnishq DTMMMMMMM
No ratings yet
TAnishq DTMMMMMMM
21 pages
I ASSIST INTERIM REPORT Final
No ratings yet
I ASSIST INTERIM REPORT Final
33 pages
1822 B.E Cse Batchno 67
No ratings yet
1822 B.E Cse Batchno 67
175 pages
Professional Training Report PT1
No ratings yet
Professional Training Report PT1
24 pages
Capstone Final
No ratings yet
Capstone Final
45 pages
Weapon Detection
No ratings yet
Weapon Detection
24 pages
Report 1
No ratings yet
Report 1
26 pages
Joel Project Final 1
No ratings yet
Joel Project Final 1
66 pages
YOLO-Based Video Processing For CCTV Surveillance
No ratings yet
YOLO-Based Video Processing For CCTV Surveillance
5 pages
PPPPR 3
No ratings yet
PPPPR 3
49 pages
Report
No ratings yet
Report
76 pages
Minor Project Report
No ratings yet
Minor Project Report
69 pages
Modinagar Institute of Technology, Modinagar: Unattended Object Detection in Surveillance Video
No ratings yet
Modinagar Institute of Technology, Modinagar: Unattended Object Detection in Surveillance Video
21 pages
Report
No ratings yet
Report
26 pages
FRT Seminor Topic
No ratings yet
FRT Seminor Topic
28 pages
ThesisADS 11
No ratings yet
ThesisADS 11
66 pages
Minor Project Report Format Dec 2024 (1) (AutoRecovered)
No ratings yet
Minor Project Report Format Dec 2024 (1) (AutoRecovered)
15 pages
Object Recognition: Mekala Sathvik Reddy Urk18Cs146
No ratings yet
Object Recognition: Mekala Sathvik Reddy Urk18Cs146
22 pages
Final Project Report
No ratings yet
Final Project Report
19 pages
Handwritten Digit Recognizer
No ratings yet
Handwritten Digit Recognizer
40 pages
A Seminar Report On New
No ratings yet
A Seminar Report On New
23 pages
PR Merged
No ratings yet
PR Merged
47 pages
Shreyaa
No ratings yet
Shreyaa
37 pages
Projecj Deep Learning
No ratings yet
Projecj Deep Learning
9 pages
Artificial Intelligence Based Real-Time Attendance System Using Face Recognition
No ratings yet
Artificial Intelligence Based Real-Time Attendance System Using Face Recognition
68 pages
Project Report
No ratings yet
Project Report
41 pages
4su18is035 - Technical Seminar Report F
No ratings yet
4su18is035 - Technical Seminar Report F
18 pages
Front Page Ramesh
No ratings yet
Front Page Ramesh
7 pages
Mini D9
No ratings yet
Mini D9
76 pages
Seminar Report Dhanshri 1
No ratings yet
Seminar Report Dhanshri 1
17 pages
Project Doc (1) (1) 32
No ratings yet
Project Doc (1) (1) 32
39 pages
Object Detection Using Yolo Technology: in Partial Fulfillment of The Requirements For The Award of The Degree
No ratings yet
Object Detection Using Yolo Technology: in Partial Fulfillment of The Requirements For The Award of The Degree
38 pages
Title of Project Report: Submitted by
No ratings yet
Title of Project Report: Submitted by
22 pages
Iot Report
No ratings yet
Iot Report
56 pages
Project
No ratings yet
Project
48 pages
Mini Report 5
No ratings yet
Mini Report 5
15 pages

Report Final Submit

Uploaded by

Report Final Submit

Uploaded by

ANOMALY DETECTION IN SURVEILLANCE VIDEO

Under the guidance of

Dr. Ashok Kumar Sahoo

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GRAPHIC ERA HILL UNIVERSITY

Dr. Ashok Kumar Sahoo

Aman Gusain Gourav Singh

Mridul Gusain Shivam Negi

2.2 YOLO Intro 9

2.3 YOLO Architecture 11

3.2 Function Definition 20

4.2 Crash detection 25

6. CONCLUSION AND FUTURE SCOPE 31

1. Class Diagram of crash Detection …..25

GEHU Graphic Era Hill University

YOLO You Look Only Once

OpenCV Open-Source Computer Vision Library

NumPy Numerical Python

CPU Central Processing Unit

GPU Graphics Processing Unit

HTML Hypertext Markup Language

Anomaly detection can be defined as differentiating variations in single or multiple

2.1 OBJECT DETECTION

2.2 Introduction to YOLO (You Only Look Once)

2.3 The YOLO Architecture:

Grid System and Anchor Boxes:

L_coord = _coord * [(x - x')^2 + (y - y')^2 + (sqrt(w) - sqrt(w'))^2 + (sqrt(h) - sqrt(h'))^2],

3.1 Libraries and Tools:

3.2 Function Definitions:

4.2 Crash Detection

1. Load the YOLOv4-tiny module and output layer names.

Figure 1 Crash Detected

Figure 4 Result Graph

from scipy.optimize import linear_sum_assignment

parser.add_argument('--webcam', help="True/False", default=False)

parser.add_argument('--video_path', help="Path of video file", default="video.mp4")

net = cv2.dnn.readNet("yolov4-tiny.weights", "yolov4-tiny.cfg")

output_layers = [layer_names[i - 1] for i in unconnected_out_layers.flatten()]

return net, output_layers

def euclidean_distance(pt1, pt2):

return math.sqrt((pt1[0] - pt2[0]) ** 2 + (pt1[1] - pt2[1]) ** 2)

def calculate_cost_matrix(positions1, positions2):

cost_matrix = np.zeros((len(positions1), len(positions2)))

for i, pos1 in enumerate(positions1):

for j, pos2 in enumerate(positions2):

cost_matrix[i, j] = euclidean_distance(pos1, pos2)

def detect_crash(net, output_layers, cap):

ret, frame = cap.read()

frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)

conf_threshold = 0.25 ### Adjust as per your requirement

for out in outs:

if confidence > conf_threshold and class_id == 2:

center_x, center_y, w, h = (detection[0:4] * np.array(

[frame.shape[1], frame.shape[0], frame.shape[1],

x, y = int(center_x - w / 2), int(center_y - h / 2)

indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, 0.2)

(x + w // 2, y + h // 2) for i in np.array(indices).flatten() for x, y, w, h in

if frame_count > 0 and vehicle_positions and prev_vehicle_positions:

row_indices, col_indices = linear_sum_assignment(cost_matrix)

for row, col in zip(row_indices, col_indices):

distance = euclidean_distance(prev_pos, current_pos)

if distance < 10: ### Adjust as per your requirement

velocity = cost_matrix[row, col]

if velocity > 9: ### Adjust as per your requirement

cv2.putText(frame, "Crash Detected", (50, 50),

# Save the frame as an image file

print(f"Crash frame saved to {crash_frame_path}")

net, output_layers = load_yolo()

print('---- Starting Web Cam crash detection ----')

print('---- Starting Video crash detection ----')

detect_crash(net, output_layers, cap)

parser.add_argument('--webcam', help="True/False", default=False)

parser.add_argument('--play_video', help="True/False", default=False)

parser.add_argument('--video_path', help="Path of video file",

parser.add_argument('--verbose', help="To print statements", default=True)

net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

return math.sqrt((pt1[0] - pt2[0]) 2 + (pt1[1] - pt2[1]) 2)

print(f"Setup complete. Using torch {torch.version}