0% found this document useful (0 votes)
32 views54 pages

Thesis ADS11

The document is a project report for a 'Crowd Detection and Notifier System' submitted by students at Yeshwantrao Chavan College of Engineering, detailing the development of a real-time surveillance system using YOLOv8 for crowd density and violence detection. The system integrates various technologies including Raspberry Pi 4, Google Edge TPU, and GSM for alert notifications, aiming to enhance public safety and reduce reliance on human monitoring. The report includes acknowledgments, project objectives, literature review, and contributions of the project, emphasizing its eco-friendly and efficient design.

Uploaded by

Kanak Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views54 pages

Thesis ADS11

The document is a project report for a 'Crowd Detection and Notifier System' submitted by students at Yeshwantrao Chavan College of Engineering, detailing the development of a real-time surveillance system using YOLOv8 for crowd density and violence detection. The system integrates various technologies including Raspberry Pi 4, Google Edge TPU, and GSM for alert notifications, aiming to enhance public safety and reduce reliance on human monitoring. The report includes acknowledgments, project objectives, literature review, and contributions of the project, emphasizing its eco-friendly and efficient design.

Uploaded by

Kanak Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

CROWD DETECTION AND NOTIFIER SYSTEM

This project report is submitted to


Yeshwantrao Chavan College of Engineering
(An Autonomous Institution Affiliated to Rashtrasant Tukdoji Maharaj Nagpur University)

In partial fulfillment of the requirement


For the award of the degree

Of

Bachelor of Technology in Artificial Intelligence and Data Science


By

KANAK ARORA
NIDHI SAKHARE
HIMANSHU DHOMANE
YOGESH RATHOD

Under the guidance of

Prof. Shweta A. Gode

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE


Nagar Yuwak Shikshan Sanstha’s

YESHWANTRAO CHAVAN COLLEGE OF ENGINEERING,


(An autonomous institution affiliated to Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur)

NAGPUR – 441 110


2024-2025

i
CERTIFICATE OF APPROVAL

Certified that the project report entitled “Crowd Detection And Notifier System” has been
successfully completed by Kanak Arora, Nidhi Sakhare, Himanshu Dhomane, Yogesh Rathod
under the guidance of Prof. Shweta A. Gode in recognition to the partial fulfillment for the award of
the degree of Artificial Intelligence and Data Science, Yeshwantrao Chavan College of
Engineering, Nagpur (An Autonomous Institution Affiliated to Rashtrasant Tukdoji Maharaj Nagpur
University)

Name & Signature of Co-


Name & Signature of Guide guide/Industry or any Mentor
organization name

Nilesh U. Sambhe Dr. Kavita R.Singh


Project Co-ordinator (HOD, AIDS)

Name and signature of External Examiner:


Date of Examination

ii
Certificate of collaboration (industry/research organization)

(To be printed on Industry letter head)

This is to certify that following students of final year Artificial Intelligence and Data Science

Department, Yeshwantrao Chavan College of Engineering, Nagpur, have successfully completed

Live/Industry/Joint research project titled “Crowd Detection And Notifier System” under the guidance

of Prof. Shweta A. Gode and Mr. Vishal Deshmukh with Name of Industry for the session 2024-25.

Kanak Arora 21071135


Nidhi Sakhare 21071297
Himanshu Dhomane 21070297
Yogesh Rathod 21070333

Name and Signature of Industry Guide with Seal

iii
DECLARATION

We certify that

a. The work contained in this project has been done by me under the guidance of supervisor(s).
b. The work has not been submitted to any other Institute for any degree or diploma.
c. We have followed the guidelines provided by the Institute in preparing the project report.
d. We have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
e. Whenever we have used materials (data, theoretical analysis, figures, and text) from other sources,
we have given due credit to them by citing them in the text of the report and giving their details in
the references. Further, we have taken permission from the copyright owners of the sources,
whenever necessary.

Signature of the Student

Kanak Arora
Nidhi Sakhare
Himanshu Dhomane
Yogesh Rathod

iv
ACKNOWLEDGEMENT

This project work is one of the major milestones in our journey of learning. We wish to express
our sincere thanks and sense of gratitude to our guide Prof. Shweta A. Gode and co-guide Mr. Vishal
Deshmukh, for their guidance, constant inspiration, and continued support throughout the tenure of
this project. The blessings, help, and guidance given by them from time to time shall carry us a long
way in the journey of technical research.

We also want to thank our Head of Department, Dr. Kavita R. Singh. She was always kind and
ready to help. Her words of support made us feel more confident. She gave us helpful suggestions that
helped us learn and grow. We truly appreciate the time she gave us.

A big thank you to our Principal, Dr. U. P. Waghe, for always being there to support us. He
gave us permission to use the lab and all the other things we needed to complete our project well. His
support made it easy for us to stay focused on our project.

We are also thankful to Prof. Nilesh U. Sambhe, our project coordinator. He gave us advice
when we needed it most. He helped us stay on track. His timely suggestions were simple but very
useful.

Our special thanks go to Mrs. B. H. Kulkarni, our technical assistant. She was always kind and
cooperative. Whenever we needed technical help, she was there with a solution. Her support made our
work much easier.

Lastly, We express our gratitude to all the teachers at all the level, especially who thought
fundamental concept and investigative strategies and who fostered a sense of wonder.

v
TABLE OF CONTENTS

TITLE PAGE NO.

Title Page i
Certificate of Approval ii
Certificate of Collaboration iii
Declaration iv
Acknowledgement v
Table of Contents vi
List of Tables viii
List of Figures ix
List of Abbreviations x
List of Symbols xi
Abstract xii

CHAPTER 1: Introduction 1
1.1 Overview 1
1.2 Literature Survey 1
1.3 Problem Statement 2
1.4 Project Objectives 2
1.5 Project Contributions 3

CHAPTER 2: Review of Literature 4


2.1 Overview 4
2.2 Patent Search 10

CHAPTER 3: Work Done 12

3.1 Flow of the System 12


3.2 Block Diagram of Training and Working of the System 13
3.3 Yolo V8 Architecture 14
3.4 Dataset 17
3.5 Hardware Components 19

vi
TITLE PAGE NO.

3.6 Training 21
3.7 Modes of Detection 21
3.8 Web Portal 21
3.9 YOLOv8 Integration and Inference Optimization 22
3.10 Architecture and Communication Between SIM800L and 22
Raspberry Pi 4
3.11 Alert Mechanism and SMS Integration 23
3.12 Flutter App for Viewing Images via URL 23

CHAPTER 4: Result and Discussion 24


4.1 System Functionality 24
4.2 Model Integration and Inference Acceleration 24
4.3 Alert Mechanism 24
4.4 Snapshot Capture and Data Access 25
4.5 YOLOv8 Model Evaluation on Custom Dataset: Accuracy & 25
Metrics
4.6 Interpretation of Results 26
4.7 Discussion 27

CHAPTER 5: Summary and Conclusion 29


5.1 Summary 29
5.2 Conclusion 30

Social Utility 33

Appendix 35

References 40

vii
LIST OF TABLES

TABLE NO. TITLE PAGE NO.

Table 3.2.1 Crowd Detection Dataset 18

Table 3.2.2 Violence Detection Dataset 18

Table 4.5 Yolov8 Model Evaluation on Custom Dataset 26

viii
LIST OF FIGURES

FIGURE NO. TITLE PAGE NO.

Figure 3.1 Flowchart of the system 12

Figure 3.2.1 Training Block Diagram 13

Figure 3.2.2 System Block Diagram 13

Figure 3.3.1 Tapo C200 CCTV Camera 19

Figure 3.3.2 Google Edge TPU 19

Figure 3.3.3 SIM800 GSM Module 19

Figure 3.3.4 Raspberry Pi4 20

Figure 3.3.5 Buzzer 20

Figure 3.3.6 Power Supply Adapter 20

Figure 3.3.7 Capacitor 20

Figure 3.3.8 Battery 21

Figure 3.8 Architecture of communication between 22


SIM800L and Raspberry PI 4

Figure 4.6 Results for Detection of Crowd and Violence 27

ix
LIST OF ABBREVIATIONS

ABBREVIATION FULL FORM

GSM Global System for Mobile

TPU Tensor Processing Unit (Edge TPU)


API Application Programming Interface
YOLO You Only Look Once
RTSP Real-Time Streaming Protocol
TFLite TensorFlow Lite
SiLU Sigmoid Linear Unit (activation function)
PANet Path Aggregation Network
FPN Feature Pyramid Network
IoU Intersection over Union
mAP mean Average Precision
CNN Convolutional Neural Network
RNN Recurrent Neural Network

x
LIST OF SYMBOLS

SYMBOL DESCRIPTION

σ Population standard deviation

x Input to activation function


σ(x) Sigmoid function
Y Output feature map
X Input feature map

xi
ABSTRACT

This system proposes a real time surveillance mechanism for crowd density and violent behaviour
detection of an advanced machine learning in this system. Accurate detection in live streams: This
leverages YOLOv8n model, trained on custom datasets by Roboflow It processes camera video from
Tapo C200 CCTV Camera on a Raspberry Pi 4 Solar-powered and keeps inference happening
faster thanks to Google Edge TPU.

Thanks to the Edge TPU delivering up to 4 TOPS and only consuming around 2W, real-time
performance at low latency.

Overview: This is the Tapo C200, which provides full 360° panoramic views, the aspect of 1080p
resolution and night vision for unquestionable video coverage. With 4GB RAM, Quad-core Cortex-
A72 processor in Raspberry Pi 4 handles streaming and inference tasks smoothly

Over thresholds of crowd or violence, the SIM800L GSM Module sends alerts via GSM based
communication for real-time SMS notifications even without internet —Thanks to alerts at thresholds
alerts on microprocessors.

The snapshots are pushed to a Flask Web Portal and can be accessed through Flutter Mobile
Application for remote viewing, history.

The system combines CCTV, Raspberry Pi 4, Edge TPU, and SIM800L—it can spot fast, stream
timely alerts and reliable surveillance—best suited for scalable surveillance in busy or unsafe
locations.

Keywords: Real-time Surveillance, YOLOv8n, Google Edge TPU, Tapo C200 CCTV Camera,
Raspberry Pi 4, Crowd Detection, Violence Detection, Flask Web Portal, GSM Alert System, Flutter
Mobile Application, Machine Learning, Edge Computing.

xii
CHAPTER-1: INTRODUCTION

1. INTRODUCTION

1.1 OVERVIEW:

Critical to safety, surveillance systems capture and monitor populations at high risk. The goal of this
pipeline is to accurately do surveillance on the fly and detect crowd density, and hopefully violent
activities. It runs based on YOLOv8n, custom datasets from Roboflow for streaming live video to
detect objects and lifecycle. Standard Raspberry Pi 4 system with Google Edge TPU for inference
acceleration provides up to 4 TOPS of processing power at only 2 Watts of energy Video from Tapo
C200 Security (CCTV) Camera is continuous monitoring. For threats, the SIM800L GSM Module
will send instant SMS alerts, while event snapshots are streamed to the Flask Web Portal and
rendered on the Flutter Mobile Application. The hardware-software integration into mechanisms that
make it possible to work properly in real time also works to secure and add safety & responsibility in
life-critical environments.

1.2 LITERATURE SURVEY:

The literature review consists of 21 research papers that sum up the real-time system of objects and
crowd detection, technologies and methodologies involved in it. Research in YOLO versions ([1]–
[4]) tracks the inception from YOLOv1 to YOLOv8, and the latest advancement in YOLO series,
YOLO-NAS which improves detection performance in terms of localization accuracy and detection
speed further for autonomous driving/ surveillance research. ([5]-[15]: Work on Object and Crowd
detection like ViTPose, YOLOv6[16] used for human pose estimation, traffic accident detection and
violent recognition problems using CNNs && Transformer. Similarly, these works underscore the
importance of deep learning, datasets that are benchmarked and in real-time analytics that give more
effectiveness for public safety/surveillance. On the other hand, it shows that there are plenty more use
cases like in [16]–[19] where these models are capable to train on resource-constrained devices such
as Raspberry Pi and realize functionality like object detection, theft prevention etc., smart surveillance.
The investigation on the GSM module ([20]) also introduces the programmable real-time alert device
which is able to communicate with the public in seconds via SMS so essential for event monitoring.
References to webRTC (live video stream support in the literature [21]) can provide information for
live correlation across a remote network and therefore improve the effectiveness of surveillance
systems. Reviewing all the literature reviewed above gives a good base for creating smart, real time
monitoring solutions by trying with YOLO + embedded hardware, GSM and even live communication
via WebRTC.

1
CHAPTER-1: INTRODUCTION

1.3 PROBLEM STATEMENT:

Maintaining monitoring ability on crowd behavior and spotting violent activities in real-time is a big
challenge to impose strict control areas. Surveillance systems, traditional and modern, have no mind
to do the true analysis in real-time and are generally looking for human intervention in order to detect
a threat. Directly, it means that we are failing to respond quickly on issues that need an immediate
response and compromises safety & security To cope with these problems, a systematic automated
surveillance system that is able to detect crowd density and violent behaviors on the fly as well as
generate a real time alert and offer remote monitoring capabilities. A system in development is focused
on creating a surveillance solution that could help guard against proactive threat detection and increase
situational awareness through a controlled number of people.

1.4 PROJECT OBJECTIVES:

1. Develop a crowd-counting system using YOLOv8 with user-defined thresholds for crowd
prevention and automatic alerts.
2. Alternatively, develop a violence detection feature that captures violent activities.
3. Ensure continuous monitoring from surveillance cameras, capturing minimal latency video.
4. Provide a user-friendly interface for real-time monitoring, configurable thresholds, and
toggleable modes.
5. Enable instant alerting through SMS and image snapshots for immediate response and making
the user aware.
6. Ensure the system is portable and easily deployable across various environments without
complex setup.

2
CHAPTER-1: INTRODUCTION

1.5 PROJECT CONTRIBUTIONS:

In this project, we showcase a simplified and smart surveillance system that blends cutting-edge
technology with practical needs facilitated in a user-friendly way. Its main contributions are

1. In-Crowdedness and Violent Behavior Detection: Advanced Real Time live video using
YOLOv8n on only custom datasets for the crowd density as well any kind of bad behavior in
a split second.

2. Reduces Stress on Human Monitoring: It automatizes threats detection so that the human
operators can focus on reacting rather than staring at a screen the whole day.

3. Environmental Friendly: The system will be lightweight (~2 watts) and all power efficient,
thus making it an eco-friendly approach to tech deployment thereby saving the environment.

4. Edge AI + Low-Power Processing: Running on Raspberry Pi 4 and through Google


EdgeTPU for self-driven performance using minimal amount of energy. It allows for AI-based
surveillance to be more democratic and scalable.

5. Limited Data Sharing, a bit more secure: Alerts and images are only shared when triggered,
access via web & mobile controlled by authorized users.

6. Efficient Monitoring Tool: Light weight, low carbon design for easier large scale usage as an
eco-friendly solution in schools, home/business office scales-up.

7. Enhances Public Safety: Reduces the risk of overcrowding and violence among students in
schools, offices and events or public spaces through accurate previews.

3
CHAPTER-2: REVIEW OF LITERATURE

2. REVIEW OF LITERATURE

2.1 OVERVIEW:

A total of 21 research papers have been examined for this work. In particular, references [1–3] focus
on YOLO versions, [5–15] pertain to object detection and crowd detection, [16–19] relate to
Raspberry Pi, [20] discuss the GSM model.
Summary of above studied research and review paper as follows: -

Yolo Versions:-

Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, Bo Ma, “A Review of YOLO Algorithm
Developments” (2024) [1] it basically offers a comprehensive survey of advances in real-time object
detection with the YOLO (You Only Look Once) algorithm. From YOLOv1 to YoloV4 it discusses
the changes in detection performance, speed at deploy (as well as the architecture improvements). The
review originally describes how each version overcomes/fixes/addresses the shortcomings of the
previous generation. It goes on to look at the different ways YOLO can be used (autonomous driving,
medical imaging) and performance between versions, whilst benchmarking against alternative
algorithms. They talk about current obstacles that exist in small object detection and changing
conditions and indicate directions for future work to push the technology.

Kaiming Gu and Boyu Su, “A Study of Human Pose Estimation in Low-Light Environments Using
YOLOv8 Model” (2024) [2], discusses the comprehensive examination of YOLOv8 family models
for low-light conditions human pose estimation which can be considered one of the challenging
problems in computer vision. The authors attempt their six different flavors of YOLOv8 for
investigating the competencies of the models in detecting and understanding human body key points
more or less when lighting is bad. The methodology of the study is to compare comparatively the
models on a diligent low-light dataset, they evaluate the performance in the mean of precision, recall,
and processing rate. Results demonstrate that more elaborate YOLOv8 models are better at
recognizing poses but at the cost of computational resources — amounting to memory and processing
time. It raises serious engineering challenges for these models to be deployed in real-time systems
such as surveillance cameras, mobile devices or embedded systems where resources are constrained.

4
CHAPTER-2: REVIEW OF LITERATURE

Mehmet Şirin Gündüz and Gültekin Işık, “A New YOLO-Based Method for Real-Time Crowd
Detection from Video and Performance Analysis of YOLO Models” (2023) [3] talking about real-
time crowd detection using YOLO models especially for indoor capacities (COVID-19) handling. Our
research introduces a mechanism to count in a given area in video, to compute the amount of people
within that area and indicate its capacity limit. YOLO object detection model (pretrained weights) on
Microsoft COCO dataset will be used for detection and labelling of people The metrics for the
performance, optionally across different YOLO models are : mean average precision (mAP),frames
per second (fps), accuracy for YOLOv3 vs YOLO v5s. Highest accuracy and mAP from YOLO v3,
YOLO v5s besting all non-Tiny in terms of fps.

Juan R. Terven and Diana M. Cordova-Esparza, “A Comprehensive Review of YOLO Architectures


in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS” (2024) [4], this reviews YOLO
from YOLOv1 to YOLO-NAS and YOLO with Transformers, Evolution of the Model It discusses
critical architectural and training advances of real-time object detection, empirically evaluates various
versions based on common metrics and covers developments as well future directions of research for
better performance (like robotics and surveillance ).

Object detection and crowd detection:-

F. Sultana, A. Sufian, and P. Dutta, “A Review of Object Detection Models Based on Convolutional
Neural Network” (2019) [5] in this paper, we provide a survey of state-of-the-art CNN-based object
detection models. Our review re-arranges the models into two families according to two approaches:
Two-stage and one-stage. From R-CNN to state-of-the-art RefineD, this paper shows you its evolution
model, each model description and corresponding training process. This chapter also reports
simulation results comparing the models, or updating with the evolution of object detection systems.

Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao, “ViTPose: Simple Vision Transformer
Baselines for Human Pose Estimation” (2022) [6]evaluates plain vision transformers on human human
pose estimation. ViTPose, the starting point model based on transformer-free-hierarchical and
lightweight decoder for pose estimation in the paper reviewed a study. ViTPose is stunningly simple,
composable and scalable from 100M parameters up to almost /b/VLT herculean scales. It is very high-
throughput / performance with many options for attention types, resolutions, and training schemes.
MS COCO Keypoint Detection Benchmark: The model sets state-of-the-art on this

5
CHAPTER-2: REVIEW OF LITERATURE

benchmark with significant results compared with previous version of the full version, 80.9 AP on ms
coco test-dev.

Hadi Ghahremaninezhad, Hang Shi, and Chengjun Liu, “Real-Time Accident Detection in Traffic
Surveillance Using Deep Learning” (2022) [7] this paper describes a GPU-based algorithm for
computer vision of detecting traffic accident intersections. The framework fuses three components:
YOLOv4 for precise object detection with Kalman filter and the Hungarian algorithm for object
tracking, and trajectory conflict analysis in the accident decision stage. We first propose a cost function
for enhanced object association in terms of occlusions and overlapped objects with respect to shape
changes. Based on object trajectories in velocity angle and distance we derive different trajectory
conflicts like vehicle-to-vehicle, vehicle-to-pedestrian and vehicle-to-bicycle interactions of the
framework. The Experimental results show that the proposed method is successful in real-time traffic
surveillance with high detection rate and low false alarm rate even in complex lighting conditions.

İrem Üstek et al., “Two-Stage Violence Detection Using ViTPose and Classification Models at Smart
Airports” (2022) [8], we demonstrate a framework of VTpose for pose estimation in combination with
CNN-BiLSTM model that integrates for realtime violence detection computes. SAAB SAFE
(Integrates into the SAAB SAFE system and tested using AIRTLab dataset), it improves security by
offering better accuracy and less false alarms to enable faster response time threat in post-pandemic
airport environments.

Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, and Rong Qu, “A Survey
of Deep Learning-Based Object Detection” (2019) [9] the Review of Object Detection in Computer
Vision that discuss results and upcoming methods with deep details For the last two paragraphs, we
will also review a paper which talks about the evolution deep learning algorithms have gone through
to drastically improve performance on object detection problems across security and autonomous
driving, In addition. The paper ( Will be posted soon) – rigorous review of one-stage and two-stage
models for detection, introducing benchmark datasets, their roles and so on. The survey includes a
comprehensive review of both classical and modern applications that recur along the main branches
in object detection and the Shape of Architecture to build effective detectors? It also conjectures
directions of future research to stay abreast with recent state-of-the-art algorithms.

Xinyi Zhou, Wei Gong, WenLong Fu, and Fengtong Du, “Application of Deep Learning in Object
Detection” (2017) [10] under deep learning based object detection for computer vision. The paper

6
CHAPTER-2: REVIEW OF LITERATURE

also offers a non-comprehensive survey on the widely-used dataset and algorithms in this domain. It
proposes a new dataset generated from (previously) existing ones and conducts experiments with
Faster R-CNN. The study demonstrates the importance of deep learning frameworks and that state-of-
the-art object detection results can significantly be improved with better datasets.

Abdul Vahab, Maruti S Naik, Prasanna G Raikar, and Prasad S R, “Applications of Object Detection
System” (2019) [11] in this paper, we delve into productivity, usefulness and variety of object
detection technology (computer and robot vision systems). Object detection has seen a massive uptick
in real world adoption, as the paper points out due to improvements from machine learning and deep
learning algorithms & computer vision The paper also touches on how object detection boosts the
performance in a system, in doing realistic gestures exposes tracking a feature quickly and deciding
real-time analysis & action. It highlights the aspects of this technology and how it can disrupt the
industries, for safety reasons, efficiency and automation. The paper discusses technical difficulties
such as dataset issues and variations, detection accuracy in different environments, computational
overheads, among others — with an object detection mindset covering the already-seen research areas
trying to surmount these challenges further.

Sanket Kshirsagar et al., “Crowd Monitoring and Alert System” (2024) [12], proposes a crowd
surveillance (real-time) ai/ML based system in the proposed structure for crowded areas. Behavior
analytics as well anomaly detection are employed by it to identify any suspicious behavior and security
teams are immediately alerted with instantaneous notification. This study further balances privacy and
ethics as well dedicating itself to public safety and pointing future directions for crowd monitoring.

Esraa Samkari, Muhammad Arif, Manal Alghamdi, and Mohammed A. Al Ghamdi, “Human Pose
Estimation Using Deep Learning: A Systematic Literature Review” (2023) [13] here we cover all
problems of Human Pose Estimation (HPE) with DL approaches in depth. This work discusses
different models and mechanisms for localization of human joints from an image or video stream,
with an emphasis on sports analysis and surveillance applications. It thoroughly summarizes more
than 100 articles published between 2014 and we focus on single/multi-task HPE as well as datasets,
loss functions and pre-trained feature extraction models. In the paper, Convolutional Neural Networks
(CNNs) and Recurrent Neural Networks (RNNs) are the most frequently applied methodologies. It
also identifies complications like occlusion, evaluation on crowded scenes which affects the
performance of models, providing remedies and suggesting possible directions for new future research
works on HPE.

7
CHAPTER-2: REVIEW OF LITERATURE

Dushyant Kumar Singha, Sumit Paroothi, Mayank Kumar Rusiac, and Mohd. Aquib Ansari, “Human
Crowd Detection for City Wide Surveillance,” (2019) [14] at the Third International Conference on
Computing and Network Communications (CoCoNet’19), provides an Autonomous Solution for the
Improvement of City-wide Surveillance. This paper presents a system concept as follows: make use
of the existing CCTV to monitor public places. While the system employs computer vision methods
for recognizing crimes on video streams, real-time analysis is done on video feeds. Dependence on
frequent manual surveillance by security forces It comes with a prompt communication mechanism to
quicken responses and get alarm signals along textual warnings on abnormal activities This
methodology seeks to make surveillance and enforcement efficient, yet reduce the need of intensive
human supervision.

Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietikäinen,
“Deep Learning for Generic Object Detection: A Survey” (2019) [15] here, it present an extensive
review of novel methodology in object detection fueled by deep learning. A central problem in
computer vision, object detection aims to find instances of a few predefined categories across image
and natural scene. This paper surveyed more than 300 research papers along with some of the key
aspects such as detection frameworks, feature extraction, object proposal methods, context modeling,
training methodology and evaluation metrics. This review and its concluding remarks in particular
indicate substantial advancements by deep learning methods, unraveling new directions for future
research.

Raspberry Pi:-

Madhura Vajarekar, Krutika Patil, Meera Yadate, and T. N. Sawant, “Vehicle Theft Detection using
GSM on Raspberry Pi” (2020) [16] this shows a Raspberry Pi 3-based system for cheap vehicle theft
detection. The system consists of a MEMS accelerometer sensor mounted on the target vehicle for
insertion of keys, engine start and drive mode monitoring. The system then sends alert messages to
vehicle owners, GPS Mode(mobile owners mobile number) or Global System for Mobile
Communication(GSM). User mode (and theft mode) the device works in two modes. The article
underscores the idea of the system being a low-cost, compact solution for vehicle security and theft
activities detection.

Kashaboina Radhika and Dr. Velmani Ramasamy, “Bluetooth and GSM based Smart Security System
using Raspberry Pi” (2020) [17] introducing an intelligent security system with the feature

8
CHAPTER-2: REVIEW OF LITERATURE

of Bluetooth, GSM connectivity for expanded security measures in banking their homes or any
business. This system utilizes a Raspberry Pi for fast data processing, hence the processing power and
real-time wireless data access it provides is very robust. Putting the Bluetooth together with GSM such
as proximity based access and relying on remote notifications/alerts, the system delivers a secure &
efficient smart door access. Integration of Wireless Technologies to provide a Smart Security
Application, fast and secure with reliability. The paper stresses the effectiveness of the system to boost
security by availing of advanced technology and real time applications.

R. Sai Sree, P. Chandu, and B. Pranavi, “Real Time Object Detection Using Raspberry Pi” (2023) [18]
this paper looks into Raspberry Pi Real-Time Object Detection (an important problem in numerous
applications of today like autonomous vehicles, drones, and smartphones exist) The paper focuses on
issues to solve object detection on embedded devices which lack memory and computation. In this
paper, a lightweight detection system of intelligent low-powered device is built to show that Raspberry
Pi could possibly be employed for accurate object recognition with acceptable loss in performance.
The hardware configuration proposed consist in a simple approach that works to detect in 2D and 3D
environments. The work demonstrates the rising trend in object detection for computer vision and tells
that Raspberry Pi should be considered as a pragmatic solution for real-time applications. This context
using popular models such as RCNN, RetinaNet and YOLO emphasizes the capabilities of embedded
systems to carry out complex computer vision tasks.

S. Srikanth, Ch. Sai Kumar, N. Uday Rao, and R. Srinivasa Rao, “Raspberry Pi Based Smart
Surveillance System” (2022) [19] introduces a Raspberry Pi home security solution for deep
surveillance. Easy to access and set up, this Raspberry Pi system uses a Pi-camera and checks for
human intrusion by emailing users images of the offending person to their mobile device or computer.
The Raspberry Pi-3 administers the security system via Python programming and gives live streaming
from the camera's local server which is executed by pedestrians. Because it is so accessible, use of this
IoT-based method enables the user to keep an eye on his/her property from anywhere in the entire
world and the implementation is practical and convenient for home, and office security. Raspberry Pi,
slightly compacted by Raspbian with direct programming languages makes it a good testing platform
for such development.

GSM Module:-

Akilan Thangarajah et al., “Implementation of Auto Monitoring and Short-Message-Service System


via GSM Modem” (2013) [20], here we presents a real-time monitoring system on friendly ARM

9
CHAPTER-2: REVIEW OF LITERATURE

controller with sensor based on the detection of threat-and-critical event. Its GSM capabilities call
out alerts so it can be responsive in time, reducing the risks and effectively communicating through
its prototype.

2.2 PATENT SEARCH:

A total of 3 key patents have been examined for this system. In particular, patent [4] focuses on real-
time crowd measurement and management systems, patent [8] addresses camera pose estimation
devices and control methods, and patent [9] explores crowd behavior anomaly detection based on
video analysis.
Summary of the above-studied patents is as follows:-

Andrew Tatrai, Travis Lachlan Semmens, “Real-time Crowd Measurement and Management Systems
and Methods Thereof” (CA3143637A) [4] presents a real-time crowd measurement and management
system designed to operate across multiple zones. The system uses data-capturing devices to
continuously monitor crowd dynamics and an analysis module to evaluate crowd characteristics,
emotional states, and behavioral trends. The technology enables real-time prediction of emergent
crowd behaviors and potential risks. Predicted outcomes and alerts are then displayed through an
integrated display module. This approach moves beyond traditional passive surveillance, offering
proactive management of crowd safety. However, the system's effectiveness is contingent upon the
precision of its data interpretation models and real-time computational performance, suggesting
opportunities for optimization and integration with intelligent video analytics.

Atsunori Moteki, Nobuyasu Yamaguchi,Toshiyuki Yoshitake, “Camera Pose Estimation Device and
Control Method”×,US10088294B2[8] describes an approach and device for deriving the pose of a
camera using simplified motion models or feature matching methods across a set/sequence of A video
providing the capability of computing the changing 3D position and orientation of a camera that is
required in Vis-based systems like autonomous vehicles, augmented reality and surveillance for scene
consistency. But with complex translations and rotations resulting in both translation/drift as well as
pose resolv-ection issues are costly for this system to solve. The said limitations urge the need for
more sophisticated modeling approach or a hybrid sensor fusion methods in order to improve accuracy
in non-dynamic enviroments or texture deficient.

10
CHAPTER-2: REVIEW OF LITERATURE

Milan Redzic, Jian Tang, Zhilan Hu, Joseph Antony, Haolin Wei, Noel O'Connor, Alan Smeaton
“Crowd Anomaly Detection in Video Analysis Based on Front collection of Observed Persons”
(WO2021069053A1)[9] methodology and system for advanced video analysis to detect anomalies in
a crowd behavior. These approach stacked two feature sets one by extracting from a single images by
using machine learning models pre-trained on normal crowd behaviour and the other using optical
flow to compute motion patterns on pairs of consecutive frames. Concatenated raw features passed to
classification algos for outlier detection of atypical behaviour These fusion features are used to detect
emergent crowd anomalies in real-time surveillance, capable of being robust. The performance of the
system relies heavily on the quality of the training dataset and motion estimation accuracy which
perhaps points to improvements with deep learning and integration of multi-modal sensors.

11
CHAPTER-3: WORK DONE

3. WORK DONE

3.1 FLOW OF THE SYSTEM:

Fig 3.1: Flowchart of the system

12
CHAPTER-3: WORK DONE

Real-time surveillance system works in a regularized process with best efficiency for detection & alert
mechanisms The Tapo C200 CCTV camera is live video streaming to the RTSP URL of the Tapo
C200. The real-time video feed from YOLOv8 model to process is done, which is packed with Google
Edge TPU for superb fast inference with a video. The system provides a two modes (Crowd Detection
and Violence Detection) switch for users, both of them configured via Flask-based web portal. An
alert is generated by the SIM800L GSM Module when the detection threshold crosses. It also records
images from the time-critical events that are made viewable via a Optimal Flask instance and view it
on flutter application for real time updates and monitoring.

3.2 BLOCK DIAGRAM OF TRAINING AND WORKING OF THE SYSTEM:

Fig 3.2.1: Training Block Diagram

Fig 3.2.2: System Block Diagram

13
CHAPTER-3: WORK DONE

The pre-trained YOLOv8n model was fine-tuned using custom datasets from Roboflow for violence
and crowd detection. The datasets were formatted in YOLO style with annotations for training and
validation. A data.yaml file was used to define class names and paths. Transfer learning helped
the model quickly adapt to new detection tasks. The fine-tuned model weights were saved for real-
time video stream inference. This enabled accurate detection of violence and crowd density in live
surveillance.

The block diagram represents the complete architecture as shown in fig 3.2.2 of the surveillance
system, starting from video capture to real-time alerting and monitoring. The Tapo C200 CCTV
Camera streams live video using the RTSP protocol, which is fed into Raspberry Pi 4 for processing.
The YOLOv8n model, fine-tuned for violence and crowd detection, runs on the Raspberry Pi with
acceleration from the Google Edge TPU for fast inference. When an event is detected, an alert is
triggered and sent through the SIM800L GSM Module as an SMS notification. At the same time, a
snapshot of the event is captured and uploaded to the Flask Web Portal, which displays the detection
results in real-time. These snapshots are accessible to the Flutter Mobile Application via the Flask
server's URL, enabling users to view live alerts and detection history remotely. This interconnected
system facilitates efficient real-time detection, instant alerts, and seamless remote monitoring.

3.3 YOLO v8 Architecture


YOLO (You Only Look Once) – a real time object detection algorithm detect from full images the
bounding boxes and class probabilities in just one forward pass
It splits the image into multiple cells and performs detection of objects at the same time —
conceptually this is extremely fast. YOLO is different from the usual, as it does detection as a single
regression problem rather than a classification (thus much faster). CNNs (convolutional neural
networks) are used by it to learn spatial features and object representations. It is built from the 3 main
building blocks:

1. Backbone - Extract Features.


2. Neck – multi-scale feature aggregation.
3. Head — Do object detection at the end.

14
CHAPTER-3: WORK DONE

1. Backbone (Feature Extraction):


Backbone is a convolutional neural network (CNN) that processes semantic features from the input
image individually by learning successive convolutional operations.

● Focus Layer:
This layer down-samples the input image by a factor of 4 during inference time while
keeping the main spatial information to make it cost computationally.

● Convolution + SiLU Activation:


Every convolutional layer is followed by SiLU (Swish) activation function to have gradient
flow and train faster during training convergence.

The SiLU activation function is mathematically defined as:

𝑥
𝑆𝑖𝐿𝑈(𝑥) = 𝑥 · 𝜎(𝑥) =
1 + 𝑒 –𝑥

● where 𝑥 is the input,

● 𝜎(𝑥) is the sigmoid function:

1
𝜎(𝑥) =
1 + 𝑒–𝑥
● C2f Module (Cross Stage Partial Network):
This module promotes efficient feature reuse by splitting and merging feature maps, thus
reducing redundant computations.

Mathematical expression for a convolution block:

𝑌 = 𝑆𝑖𝐿𝑈 (𝐵𝑎𝑡𝑐ℎ𝑁𝑜𝑟𝑚(𝐶𝑜𝑛𝑣(𝑋)))

● where 𝑋 is the input feature map,


● Conv is the convolution operation,
● 𝐵𝑎𝑡𝑐ℎ𝑁𝑜𝑟𝑚 normalizes activations,
● 𝑆𝑖𝐿𝑈 is the activation function,
● Y is the output feature map

2. Neck (Feature Aggregation):

The neck aggregates features at multiple scales to improve detection performance for objects of
different sizes.
15
CHAPTER-3: WORK DONE

● Feature Pyramid Network (FPN):


Upsamples high-level semantic features and merges them with lower-level features to enrich
the feature representation.

● Path Aggregation Network (PANet):


Enables both bottom-up and top-down pathways to propagate strong contextual information.

FPN formula:

𝑃𝑖 = 𝐶𝑜𝑛𝑣(𝐹𝑖) + 𝑈𝑝𝑠𝑎𝑚𝑝𝑙𝑒(𝑃𝑖 + 1)

PANet formula:

𝑃ᵢ = 𝐶𝑜𝑛𝑣(𝐶𝑜𝑛𝑐𝑎𝑡(𝑃ᵢ , 𝐹ᵢ))

where 𝐹ᵢ denotes the feature map at scale i, and 𝑃ᵢ represents the aggregated feature map at scale i.

3. Head (Object Detection):

The head predicts bounding boxes, objectness scores, and class probabilities for each grid cell.

Bounding box coordinates are computed as:

𝑡̂ₓ = 𝜎(𝑡ₓ) + 𝑐ₓ , 𝑡̂ᵧ = 𝜎(𝑡ᵧ) + 𝑐ᵧ

𝑡̂𝑤 = 𝑝𝑤𝑒ᵗʷ , 𝑡̂ₕ = 𝑝ₕ𝑒ᵗʰ

where:

● σ is the sigmoid function that normalizes predicted coordinates,

● 𝑐ₓ, 𝑐ᵧ are the grid cell offsets,

● 𝑝𝑤 , 𝑝ₕ are predefined anchor box dimensions,

● 𝑡ₓ ,𝑡ᵧ ,𝑡𝑤, 𝑡ℎ are the predicted offsets by the model

These calculations ensure bounding boxes accurately represent object location and scale relative to
the input image.

16
CHAPTER-3: WORK DONE

3.4 DATASET
The annotated dataset is from Roboflow, was exported to YOLOv8 PyTorch format which contain
images, annotations and also a configuration file (data.yaml). Roboflow offers an API to download this
dataset programmatically, so that we can get the data prepared in a consistent way. This dataset was
opened by sharing the link through Roboflow client with API key, workspace, project and dataset
version. Then the script that download dataset is as follows:

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_ROBOFLOW_API_KEY")

project = rf.workspace().project("PROJECT_NAME")

dataset = project.version().download("yolov8")

This process automatically prepared the dataset folder structure and generated the necessary files for
YOLOv8 training.

For training, the Ultralytics YOLOv8 framework was utilized. A pretrained yolov8n.pt model
served as the starting point, enabling transfer learning to improve efficiency and accuracy. The training
procedure was initiated by specifying the dataset configuration file and training parameters such as
epochs, image size, and batch size:

Python Code:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

model.train(data=dataset.location + "/data.yaml", epochs=50,


imgsz=640, batch=16)

This method ensured a reproducible training pipeline with minimal manual intervention, facilitating
consistent results and allowing for efficient optimization of the crowd detection model.

17
CHAPTER-3: WORK DONE

3.4.1. Crowd Detection Dataset (Head-based Dataset):

The Crowd Detection Dataset (Head-Based Dataset) — taken from Roboflow Universe — its Head
Datasets designed to detect human heads in heavy env reg's by crud in the detector-like backgrounds.
It is the best for person counting and such because it has high res images w/bounding boxes that
represent the locations of heads in it. The annotations are in YOLO (You Only Look Once) format to
facilitate the automatic annotation integration homogeneously in architectures based on YOLO family
like YOLOv5 and YOLOv8. Dataset contains a diverse background and cameras providing more
robustness to the model for deployment scenarios. Also, it is trained with the data augmentation
techniques that result in the model generalizing well. In this project the dataset was concatenated with
the YOLOv8 model to do a real-time crowd detection by counting the number of visible heads,
providing accuracy and speed up in surveillance applications.

Training Validation Test Total


Images Images Images Images
1675 477 238 2390

Table no. 3.2.1: Crowd Detection Dataset

3.4.2. Violence Detection Dataset (GunW Dataset):

The GunW Dataset, which is taken from Roboflow Universe since it is collected for detecting firearms
in different environments which makes this dataset highly applicable for violence detection
application. This is an image set where guns in images are labeled w/ bounding boxes to properly train
object detection models detecting/locating guns correctly. Annotations are in YOLO format, allowing
for easy usage with YOLO variants of architecture (e.g., YOLOv5, YOLOv8). The dataset features a
wide set of conditions such as varying backgrounds, lighting and viewpoints making the model behave
better in real life. By the project, GunW Dataset will be used for training YOLOv8 models that perform
detection of real-time violence, so our system is able to detect as soon and accurately any kind of
threat possessing people.

Training Validation Test Total


Images Images Images Images
5340 1550 767 7657

Table no. 3.2.2: Violence Detection Dataset

18
CHAPTER-3: WORK DONE

3.5 HARDWARE COMPONENTS:

3.5.1 TAPO C200 CCTV CAMERA:

The Tapo C200 is an IP CCTV camera with RTSP streaming


support allowing real-time video feeds as input to surveillance
systems. It's a 1080p real-time monitoring device with remote
pan/tilt available.
Fig no. 3.3.1: Tapo C200 CCTV Camera

3.5.2 GOOGLE EDGE TPU:

The Google's Edge TPU has up to 4 TOPS (Tera Operations Per


Second), which is inference for the machine learning models in real-
time, and it works with 2 watts. It is for fast processing of quantized
models and low latency, so that the object detection response time
would be swift along with less inference time. In a surveillance
system, Edge TPU takes care of optimizing YOLOv8 for real-time
violence & crowd detection thereby triggering alerts quicker with
more accuracy. This gives a performance boost to the system, where Fig no. 3.3.2: Google Edge TPU
on-device AI is being processed effectively for better observation.

3.5.3 SIM800L GSM MODULE:

The SIM800L module handles the alerting in the system. This is set
to SMS the user in real time when crowd thresholds are crossed or
violence detected to reinforce that awareness as quickly as possible.

Fig no. 3.3.3: SIM800 GSM Module

19
CHAPTER-3: WORK DONE

3.5.4 RASPBERRY PI 4:

The main processing unit of the surveillance system is Raspberry Pi


4. Inference of the YOLOv8 model, Edge TPU (com) and SIM800L
module communication is done by that. It also takes care of the data
transfer to Flask based web portal and Flutter App.

Fig no. 3.3.4: Raspberry Pi4

3.5.5 BUZZER:

A Buzzer has been included in the system for prompt audible alert when a
critical event gets detected by people on site. It is actually just a local
warning system to signal the public in an immediate emergency.

Fig no. 3.3.5: Buzzer

3.5.6 POWER SUPPLY ADAPTER:

System powered by a Power Supply Adapter, delivers stable 5V to


only Raspberry Pi 4 and Google Edge TPU for real-time object
detection as well as efficient data processing.

Fig no. 3.3.6: Power Supply Adapter


3.5.7 CAPACITOR:

A capacitor is used for SIM800L to Raspberry Pi 4 connection which


helps to avoid voltage drop when the SIM800L sends SMS or connecting
network. When the SIM800L transmits, a lot of current is drawn causing
voltage droppings that force it to reset. The capacitor holds excess charge
and dumps it into the system at those high-power-moments, which helps
with keeping voltage level and no drops. Fig no. 3.3.7 Capacitor

20
CHAPTER-3: WORK DONE

3.5.8 BATTERY:

The 3.7V Li-Po Battery Used for powering the SIM800L GSM Module,
prevents unstable voltage while transmitting SMS or its restarts
immediately. It provides stable links even in heavy power consumption.

Fig no. 3.3.8: Battery


3.6 TRAINING MODELS:

The training process involved fine-tuning the YOLOv8 model using datasets specifically curated for
crowd detection and violence identification. The model was trained using Google Colab with
annotated data for accurate object detection. After training, the model was optimized and exported in
.pt format.

3.7 MODES OF DETECTION:

The system operates in two distinct modes:

(a). Crowd Detection

(b). Violence Detection.

In Crowd Detection mode, the system monitors the number of people in a frame and triggers alerts if
the count exceeds the user-defined threshold. In Violence Detection mode, the model actively looks
for aggressive behavior patterns and triggers alerts if violence is detected.

3.8 WEB PORTAL:

The Flask driven Web Portal is the UI for setting up the system. The user can enter RTSP URL for
the live feed, crowd detection mode and also the threshold of crowd size. The portal further shows
Live Video streaming of Tapo C200 CCTV Camera, and updates in real-time.

21
CHAPTER-3: WORK DONE

3.9 YOLOV8 INTEGRATION AND INFERENCE OPTIMIZATION:


The YOLOv8 model was integrated into the system to detect crowds and violent activities. To make
it run faster and efficiently on the Google Edge TPU, the model was optimized through a series of
steps.

Model Optimization Process:


1. Model Training:
The YOLOv8 model was trained to identify people and aggressive behavior accurately.
2. INT8 Quantization:
After training, the model was converted to a smaller size using a process called INT8
Quantization. This means its calculations were changed from 32-bit (normal size) to 8-bit
(smaller size), which makes it faster and lighter.
3. Conversion to TFLite Format:
The optimized model was then converted into TensorFlow Lite (TFLite) format, which is
required for running on Edge TPU.
4. Deployment on Google Edge TPU:
Finally, the TFLite model was compiled for the Edge TPU, allowing it to process video
streams in real time with low latency.

3.10 ARCHITECTURE AND COMMUNICATION BETWEEN SIM800L AND


RASPBERRY PI 4:

Fig 3.8: Architecture of communication between SIM800L and Raspberry PI4

22
CHAPTER-3: WORK DONE

The SIM800L GSM Module is connected to the Raspberry Pi 4 to enable real-time alerting through
SMS notifications. The communication is established using UART (Universal Asynchronous
Receiver-Transmitter) protocol, which allows serial communication between the two devices. Below
is the connection setup:

Wiring Configuration:

● SIM800L TX (Transmit) → Raspberry Pi RX (GPIO 15 / UART RXD)

● SIM800L RX (Receive) → Raspberry Pi TX (GPIO 14 / UART TXD)

● SIM800L VCC → 5V Power Supply (or 3.7V Li-Po Battery for stability)

● SIM800L GND → Raspberry Pi GND

Communication Setup:

● The Raspberry Pi communicates with the SIM800L using serial communication


(/dev/serial0).
● The baud rate is set to 9600 for stable data transfer.
● AT Commands are sent from the Raspberry Pi to the SIM800L for various operations, such
as:
○ AT → Test connection

○ AT+CMGF=1 → Set SMS mode to text

○ AT+CMGS="PhoneNumber" → Send SMS to the specified number

3.11 ALERT MECHANISM AND SMS INTEGRATION:


The system's alert mechanism is powered by the SIM800L module, which sends offline SMS
notifications to the user’s mobile device upon detection of crowd overflow or violent activity. This
instant alerting system is crucial for quick response during emergencies.

3.12 FLUTTER APP FOR VIEWING IMAGES VIA URL:


A Flutter-based mobile application is developed to allow users to access snapshots of detected events.
These snapshots are captured in real-time and made available by accessing flask server URL. Users
can enter the server URL in the app to view the event images, enhancing situational awareness and
remote monitoring.

23
CHAPTER-4: RESULTS AND DISCUSSION

4. RESULTS AND DISCUSSION

4.1 SYSTEM FUNCTIONALITY:


The developed real-time surveillance system is designed to monitor and analyze live video feeds from
CCTV cameras, specifically the Tapo C200. This system is powered by a Raspberry Pi 4 and features
a Flask-based web portal that provides two toggleable options:

1. Crowd Detection
2. Violence Detection

Users interact with the web portal through an intuitive interface built with HTML and CSS, where
they can:

● Toggle between Crowd Detection and Violence Detection modes.

● Input the RTSP URL of the CCTV camera to initiate the live feed.

● Set a threshold value for the maximum allowed crowd size.

4.2 MODEL INTEGRATION AND INFERENCE ACCELERATION:


The system uses YOLOv8, a state-of-the-art object detection model, trained specifically for:

● Counting the number of people within the video feed.

● Detecting violent activities within the video feed.

To achieve real-time inference, the system is optimized with a Google Edge TPU. This accelerates the
processing speed of YOLOv8, enabling fast and accurate detections on the Raspberry Pi 4, which is
essential for live video analysis.

4.3 ALERT MECHANISM:


When the detected number of people exceeds the user-defined threshold, or if any form of violence is
identified, the system immediately triggers an alert. This is accomplished through a SIM800L GSM
module that is connected to the Raspberry Pi.

● An SMS notification is sent directly to the user's mobile device.

● The alert includes a snapshot of the event, captured at the moment of detection.

24
CHAPTER-4: RESULTS AND DISCUSSION

4.4 SNAPSHOT CAPTURE AND DATA ACCESS:


The system is designed to capture snapshots whenever a threshold breach or a violence detection
occurs. These snapshots are processed and stored on the server.

● The Flask application automatically JSONify the snapshot data and makes it available through
a REST API.
● This data is accessible from a Flutter app, where the user can view every captured image by
simply entering the Flask server URL.
● This functionality ensures that users have visual evidence of the incident, accessible in real-
time from their mobile application.

4.5 YOLOV8 MODEL EVALUATION ON CUSTOM DATASET: ACCURACY


& METRICS:
When we want to test how good our custom object detection dataset goes with the YOLOv8 model, a
few metrics are considered in the overview. Precision is the amount of predicted detections that are
actual, measured as TP / (TP + FP) * 100 Recall tells you how many true objects were correctly
identified by the model, ie, in terms of ratio, it is TP / (TP+FN). Mean Average Precision ([email protected])
is a measure of precision at the 0.5 Intersection Over Union (IoU) Threshold to which the model is
capable of recognizing any object. This is the more stringent metric [email protected]:0.95, which sums the
Average Precision calculated on IoU thresholds from 0.5 to 0.95 providing some realistic detection
performance.

Inference Time, from the other hand tells how much time it takes to image the single image — lower
inference time is necessary for live application(eg. Surveillance). Model Size is a bit footprint of
trained model in terms of storage, highly important for edge devices like raspberry pi The F1-Score
(1) is the harmonic mean of precision and recall (2 × P × R / (P + R)) makes it easier on my model
and provides an all-rounder to determine how good it is. Finally, IoU (Intersection over Union)[7] The
basis for most of the above metrics is IoU, which measures the overlap between the predicted bounding
box and true ground truth box. All in all, each of these metrics integrates to quantify whether the
YOLOv8 framework performs detection on the object as well as locating the same object correctly
and fast.

25
CHAPTER-4: RESULTS AND DISCUSSION

Metric Name Crowd Detection Violence Detection

Precision 85.2% 81.4%

Recall 80.6% 78.9%

F1-Score 82.8% 80.1%

[email protected] 88.4% 84.3%

[email protected]:0.95 65.7% 61.2%

Inference Time 6.2 ms 6.5 ms

Model Size ~5.2 MB ~5.2 MB

Input Image Size 640×640 640×640

Table no. 4.5: YOLOv8 Model Evaluation on Custom Dataset

4.6 Interpretation of Results:


In real-time video captured with Tapo C200 CCTV camera (system), it utilises YOLOv8 model —
this model shows excellent results concerning crowd and violent activities. The Google Edge TPU
accelerator is integrated in the system that permits rapid and efficient inference to be made within
seconds with live surveillance targeted on resource-constrained devices (Raspberry Pi) with reactive
accurate detections. Apart from this, the SIM800L GSM module used in the system will allow users
to send SMS alerts on the spot to subscribers as soon as any critical event is detected, which creates a
more alive clue in safety monitoring solutions. Even with its successes, the model struggles to flag
some of those subtle, or at least partially blocked violent actions which attests areas for improvement
in data training as well model optimization.

26
CHAPTER-4: RESULTS AND DISCUSSION

Fig 4.6: Results for Detection of Crowd and Violence

4.7 Discussion:
This system includes a real-time surveillance setup that efficiently monitors live video feeds from the
Tapo C200 CCTV camera using a structured workflow. The primary objective of the system is to
ensure safety and crowd control by detecting the number of people and any violent activities in real
time.

27
CHAPTER-4: RESULTS AND DISCUSSION

The Flask based web portal accepts user inputs for the operation of the system. Users also give you
the RTSP URL of the camera feed, choose between two detection modes (Crowd or Violence
Detection) and set their own crowd size threshold. A friendly interface, written in HTML and CSS for
immediate video stream activation via click will be used throughout this interactive interface.

OnceWhen the RTSP URL has been given, YOLOv8 is processing live feed where EDGE TPUs from
Google are used for efficiency to make better inference.This enables the model to

● Amount of people in a frame (Crowd Detection)


● In Violence detection, identify aggressive behavior.

Using the Edge TPU integrates inference and enables an accurate, real-time detection required for live
monitoring use cases. The system lights up a red LED when the number of detected individuals
exceeds a threshold value defined by users or violence is detected. And to be able to do this, uses
SIM800L GSM module which sends a message notifications to users cell phone additionally it
captures an image of the event saved at server. Also, it shot a shot of the same event that is captured
on the server, The Flask application converts this snap to JSON and provides it in API form.

We just make users paste the Flask server URL in a Flutter application and from here, directly those
incident snaps will be visible (with the help of this API integration).

It makes sure that users are marked in real-time as well as have visual proof (if required) of the incident
to improve situational awareness.

Critical events are specifically designed for low-lag and high-reliability, that reflects on the system's
capability of reacting in milliseconds.

Also, Flask for backend operations will ease communication between detection module, GSM alerts
and mobile interface so this entire architecture is robust and scalable.

28
CHAPTER-5: SUMMARY AND DISCUSSION

5. SUMMARY AND CONCLUSIONS


5.1 SUMMARY:

The Vision, Design and Development of a real-time Crowd Detection Alert System with the intention
to enact better safety and security in busy urban areas such as public places, educational institutions
to be resumed for implementation.The system itself is based on the ready-made features of real-time
video feeds from standard CCTV cameras in order to observe crowd density and subsequently detect
action of violence on the go. To be precise: systems adopt a great but simplest object detection
framework based on YOLOv8 (the state-of-the-art algorithm for fast and accurate multi-object frame
detections).

A web-based portal for user interaction and customization using some of the basic web technologies
like HTML as a structure and CSS for rendering is recommended to be developed in this area. It is the
portal with RTSP which allows for live video from connected CCTV streams to the user-friendly
interface in a continuous, low-latency way. The first and most useful feature of this site is that the
users should be allowed to configure crowd density specific threshold limits. Over these thresholds
are the safety regulations or operational requirements of the monitored area, that are used to tell the
system that instances of crowding have been detected.

Connected to the central processing unit (Raspberry Pi single-board computer running Raspbian) via
a Global System for Mobile communication (GSM) module, the alert mechanism of the system. The
Raspberry Pi acts as command center when the YOLOv8 algorithm identifies either crowd volume
above threshold, or recognizable signs of violence making an immediate notification via GSM module
to named recipients. The purpose is to inform the staff as soon as possible for quick actions against
potential safety dangers or incidents.

The review of the state-of-the-art (subsection IV-A) rests on extensive literature survey to select these
technologies YOLOv8 for video intelligent analysis, Raspberry Pi to cope with processing and
surveillance at a fraction of cost, RTSP for rapid web-based monitoring, and GSM provides reliable
out-of-band alert. This piece of review basically brings forth the journey YOLO algorithms have been
through, state of the art methods on object & crowd detection, use of Raspberry Pi with surveillance
system, use of GSM modules to transmit data wirelessly and RTSP Streaming(Single video stream)
for almost real-time.

29
CHAPTER-5: SUMMARY AND DISCUSSION

5.2 CONCLUSION:

5.2.1 System Efficiency:

● In real time, the system runs efficiently in detecting crowd density and informing alert when
the crowd reaches above safe thresholds by using YOLOv8.

● Also it has a violence detection module trained on data to detect violence in the videos.

● Offline: even though not yet in real-time, this process is still beneficial for post-event analyses
especially for educational institutes where student safety is a huge concern..

5.2.2 Challenges Faced During Development:

● Integration of object detection models in real-time with live camera feeds was technically
difficult.

● Creating a violence detection model needed a large amount of labeled video data and
computational power.

● A significant struggle was to make the system be both fast crowd detection (online) and offline
video-violence detection.

● Crowd patterns at schools and colleges are different Customizing the system for a classroom
or college environment requires step fine tune.

5.2.3 Limitations of the Current System:

● In high density/ occluded environments crowd estimation accuracy may decrease to 3% on


average.

● Violence detection feature is offline- so it will not notify any police in real time.

● The system is mainly designed for the educational context and needs to be configured to
different environments.

● It depends on what crowds can be detected with, more like the quality and diversity of datasets
used for violence detection, which restricts it to new scenarios.

30
CHAPTER-5: SUMMARY AND DISCUSSION

5.2.4 Suggestions for Improvement:

● Optimize models or use edge computing devices for a transition in the real time of Violence
detection System.

● Deepen the dataset with more in-coder(i.e.,violent incident recorded from within institute)
suicidal and violent events to augment the accuracy of detection.

● Improper prediction algorithms for behavior to foresee potential risks of school and college
gatherings

● Extend crowd flow analysis and access control integration with the system in educational
campuses.

5.2.5 Benefits and Cost Analysis:

● The inclusion of real time crowd management and offline violence detection in the system
brings dual benefit safer environments within educational institutions.

● Low cost implementation for the existing surveillance infrastructures and open-source tools.

● Reduces dependency on manual supervision and enables the staff to devote themselves to
prevention/rescue.

● Extremely flexible and can be scaled down to different types of educational setups in schools,
colleges, universities.

5.2.6 Use of Project to Society:

● Essentially developed with educational institutes in mind, the system prevents littering & an
inbuilt provision does crowd analysis to enhance campus safety after every event.

● Provides evidenced-based recommendations based on history by allowing authorities to act


quickly on this information.

● Monitor increased results in responsible behaviour of students.

● Enhances a more secure and interactive learning space.

31
CHAPTER-5: SUMMARY AND DISCUSSION

5.2.7 Scope for Further Work:

● Future works will focus on implementing live active learning environment ready violence
detection.

● Going farther to just multi-camera, multi-location setups over multiple institutes or campuses.

● Isolating its use with emergency alert systems for prompt notification to security staff.

● Admin dashboards to check trends in crowd behavior & historical violence incident reports at
Deeper levels.

32
SOCIAL UTILITY

SOCIAL UTILITY

1. Enhanced Public Safety:


● Supports real-time detection for the over-crowding and violent incidents
● Prevents stampedes, fights or mass-panic in public places & premises such as schools / College
events/temples/railway stations.
2. Automated Emergency Alerts:
● Through the GSM module, it sends instant SMS notifications so it is less dependent on manual
monitoring.
● Helps Authorities to act fast in case of emergency.
3. Resource Optimization:
● Decreases security staff workload, and enables institutions to patrol big areas with low human
resources.
● Overall improves surveillance efficiency by means of AI-driven automation.
4. Accessibility and Cost-Effectiveness:
● The solution is affordable (uses hardware like Raspberry Pi, SIM800L and open-source
modules) so that the solution would be easily usable by institutions that are rural or
underfunded in nature.
● Quick and easy to deploy in schools, colleges, temples & smaller entities without big
infrastructures.
5. Remote Monitoring:
● Used for remote alerts & images to be accessed is a part of the Flutter app and Flask Web
Portal.
● Centralised oversight for higher campuses / multi-location premises.
6. Educational Institution Safety:
● Protects students from unsafe surroundings by detecting violent activities and crowd surges in
real time.
● Supports action oriented management in school and college events.
7. Scalable and Customizable:
● Can be adapted to other domains including traffic, events, and disaster response etc.
● Context-based, thresholds and detection modes can be changed easily.

33
SOCIAL UTILITY

8. Promotes Technological Awareness:


● Promotes to all institutions adopt AI & IoT based safety system
● Society does not know how intelligent systems can be used for safety, automation and
prevention.
● Educates the world on the use of smart systems to promote to societal level safety and
prevention.
9. Inclusivity in Safety Monitoring:
● Ensures safety in institutions or public spaces where manual monitoring may be biased or
inconsistent.
● Offers non-discriminatory, unbiased surveillance that works equally across all demographics,
contributing to a fairer and more inclusive public safety infrastructure.
10. Reduced Risk During Public Health Crises:
● By keeping hospital settings away from overcrowding in pandemic-vulnerable environments
such as (e.g., COVID-19, flu seasons).
● Enforces social distancing by itself enables automatic compliance and no longer requires
continuous human intervention/interaction.

34
APPENDIX

Plagiarism Report of thesis

35
APPENDIX

Kanak Arora
Email: [email protected] | Phone no: 9309524723 | LinkedIn: kanak-arora | GitHub: arorakanak

Education:
Yeshwantrao Chavan College of Engineering, B-Tech in Artificial Intelligence and Data Science,
CGPA: 7.30, December 2021 - May 2025.
HSC, Prerana Junior College, Percentage: 93.67, August 2021.
SSC, Swami Awadheshanad Public School, Percentage: 81.60, May 2019.

Experience: Data Analytics Internship | Unified Mentor | June 2024 | Virtual


Completed a data analytics internship with Unified Mentor, focusing on practical applications of data
analysis tools and methodologies. Gained hands-on experience in data visualization, statistical analysis,
and reporting using industry-standard software.

Projects:
Final Year Project | Crowd Detection and Notifier System | Aug 2024 – Present
Developing a real-time surveillance system using Raspberry Pi, YOLOv8, GSM module, and WebRTC
to monitor crowd size and detect violent behaviour in high-density areas. Instantly sends alerts and
streams live video via a user-friendly HTML/CSS web portal.
Crop Production Analysis in India | Power BI | June 2024
Conducted an in-depth analysis of crop production trends in India using Power BI for data
visualization.
Analysed datasets to identify patterns, regional variations, and key factors influencing crop yield.

Technologies Skills:
Languages: C, Python, HTML, CSS, R, JavaScript.
Technologies & Tools: Microsoft Power BI, MySQL, VSCode, Google Colab.

Certifications:
● Introduction to Deep learning.
● Data Visualization for Deep Learning using Power BI and Tableau, VNRVJIET, Hyderabad.

Extracurricular Activities:
● Vice President, Nrutyakala, YCCE (Present).
● Co-head (Content Writer), Nrutyakala (YCCE) – 2023-24.
● Organizer, YIN (YCCE) – 2023-24.
● Visharad in Kathak, ABGMV, Mumbai – November 2023.

Languages Known: English, Hindi, French.

36
APPENDIX

Nidhi Sakhare
Email: [email protected] | Phone no: 9730109343 | LinkedIn: NidhiSakhare | GitHub: NidhiSakhare

Education:
Yeshwantrao Chavan College of Engineering, B-Tech in Artificial Intelligence and Data Science
HSC, St. George College.
SSC, Guru Nanak High School.

Experience:
Java Developer Intern | Informatrix IT Solution Pvt Ltd | Jan 2025 – Present
● Learned core Java concepts, OOP principles, and real-time project development. Gained hands-on
experience with tools like NetBeans, Apache Tomcat, and MySQL.
● Exploring 3-tier architecture by working on backend logic, frontend integration, and database
connectivity.

Projects:
Final Year Project | Crowd Detection and Notifier System | Aug 2024 – Present
Developing a real-time surveillance system using Raspberry Pi, YOLOv8, GSM module, and WebRTC to
monitor crowd size and detect violent behavior in high-density areas. Instantly sends alerts and streams
live video via a user-friendly HTML/CSS web portal.
DIWALI SALES ANALYSIS | JULY 2024
Performed Exploratory Data Analysis (EDA) to analyze sales trends by state, city, gender, age group, and
marital status using pandas and matplotlib.

Technologies Skills:
Languages: C, Python, HTML, CSS, R, JavaScript.
Technologies & Tools: Microsoft Power BI, MySQL, Pandas, Numpy, Databases, Data visualizations,
Data Analysis, Microsoft Excel, VS Code, Google Colab.

Certifications:
● Introduction to Deep Learning.

Extracurricular Activities:
● Completed an online short-term course on Data Visualization for Deep Learning Using Power BI
and Tableau conducted by the Dept. of CSE, NIT Warangal, and Dept. IT, VNRVJIET,
Hyderabad.
● Completed a virtual internship on Data Visualization: Empowering Business with Effective
Insights by TATA, where I gained hands-on experience in using data visualizations to take
informed decisions.

Languages Known: English, Hindi.

37
APPENDIX

HIMANSHU DHOMANE
Email: [email protected] | Phone: (+91) 8788663472 | GitHub: https://www.github.com/himanshuio

PROJECTS
Crowd Detection and Notifier System (Flask, HTML, CSS, YOLOv8n, Raspberry Pi 4, Google Coral Edge TPU, SIM800L)
● Developed a real-time crowd monitoring system using Raspberry Pi 4, Coral Edge TPU, Sim800L and
YOLOv8n(pretrained) to analyze CCTV footage via RTSP.
● Additionally integrated a violence detection model trained on Roboflow dataset with 7000 images and also
implemented SMS alerts using the SIM800L GSM module.
Shophouse E-commerce Website (Flask, PostgreSQL, HTML/CSS, Render) View: https://shophouse-xh8n.onrender.com/
● Developed an e-commerce platform with product display, cart management, user authentication, and a
responsive frontend using HTML and CSS.
● Used Flask for the backend, managed the database with PostgreSQL (via DBeaver), and deployed the
website on Render for public access.
Notes Classifier (TensorFlow, Keras, Streamlit, Google Colab)
● Built a CNN-based notes classifier trained on 390 'Notes' images and 392 'Others' images for accurate
classification.
● Designed a Streamlit-based UI allowing users to upload images and classify them as 'Notes' or 'Others'.

TECHNICAL SKILLS
Java | Python | SQL | HTML/CSS | Flask | Dart | Flutter | MySQL | Git/GitHub | Figma

WORK EXPERIENCE
OceanZen — Flutter Intern (January 2025 – Present)
● Learned Dart fundamentals, Stateless and Stateful widgets, layouts, and navigation in Flutter.
● Gained knowledge of OOP concepts and implemented API integration to fetch and display real-time
cricket data in Flutter.

EDUCATION
Yeshwantrao Chavan College of Engineering, Nagpur, India CGPA: 6.78
(Expected May2025)
B.Tech. in Artificial Intelligence & Data Science Engineering
Balaji Junior College, Butibori, Nagpur, India Percentage:
71.83% (May 2021)
HSC
Holy Cross English Medium High School, Butibori, Nagpur, India Percentage: 73.20%
(May 2019)
SSC

EXTRA-CURRICULAR ACTIVITIES
● National Level UI/UX Competition — Marathwada Mitramandal College of Engineering, Pune
Designed a UI in Figma for a student course explorer app as part of a problem statement challenge.
● Innovation ‘R’ Us — Yeshwantrao Chavan College of Engineering, Nagpur
Presented a Period Tracker app as a team of four at Innovation 'R' Us.
● International Conference on Advances in Computing, Control & Telecommunication Technologies
(ACT 2025) – YCCE, Nagpur
Presented the research paper titled "Crowd Detection and Notifier System” in international conference.

38
APPENDIX

39
REFERENCES

REFERENCES
[1] Jiang, Peiyuan, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2021. “AReview of YOLO
Algorithm Developments.” The 8th International Conference on Information Technology and
Quantitative Management (ITQM 2020 & 2021). Published by Elsevier B.V, doi:
10.1016/j.procs.2022.01.135.

[2] Gu, Kaiming, and Boyu Su. 2024. “A Study of Human Pose Estimation in Low-Light
Environments Using the YOLOv8 Model.” International Engineering College, Xi’an University of
Technology & School of Intelligent Engineering, Zhengzhou University of Aeronautics, doi:
10.54254/2755-2721/32/20230200.

[3] Gündüz, M.Ş., Işık, G. “A new YOLO-based method for real-time crowd detection from video and
performance analysis of YOLO models”. J Real-Time Image Proc(2023).

[4]Juan R. Terven, Diana M. Cordova-Esparza. “A Comprehensive Review of YOLO Architectures


in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” *Machine Learning and
Knowledge Extraction*, Published as a Journal Paper. Instituto Politecnico Nacional, Universidad
Autónoma de Querétaro, arXiv, 2023, arXiv:2304.00501.

[5] F. Sultana, A. Sufian, P. Dutta. “A Review of Object Detection Models Based on Convolutional
Neural Networks” (2019), arXiv, 2019, arXiv:1905.01614.

[6] Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao. “ViTPose: Simple Vision Transformer
Baselines for Human Pose Estimation”(2022), arXiv, 2022, arXiv:2204.12484v3.

[7] H. Ghahremaninezhad, H. Shi and C. Liu, “Real-Time Accident Detection in Traffic Surveillance
Using Deep Learning,” 2022 IEEE International Conference on Imaging Systems and Techniques
(IST), Kaohsiung, Taiwan, 2022, pp. 1-6, doi:10.1109/IST55454.2022.9827736.

[8] İrem Üstek, Jay Desai, Iván López Torrecillas, Sofiane Abadou, Jinjie Wang, Quentin Fever,
Sandhya Rani Kasthuri, Yang Xing, Weisi Guo, Antonios Tsourdos. “Two-Stage Violence Detection
Using ViTPose and Classification Models at Smart Airports” (2022), IEEE Access, arXiv, 2022.

[9] L. Jiao et al., “A Survey of Deep Learning-Based Object Detection,” in IEEE Access, vol. 7, pp.
128837-128868, 2019, doi: 10.1109/ACCESS.2019.2939201.

40
REFERENCES

[10] X. Zhou, W. Gong, W. Fu, and F. Du, “Application of deep learning in object detection,” 2017
IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan,
China, 2017, pp. 631-634, doi: 10.1109/ICIS.2017.7960069.

[11] Abdul Vahab, Maruti S Naik, Prasanna G Raikar, Prasad S R. “Applications of Object Detection
System” (2019).

[12] Sanket Kshirsagar, Rushikesh Matele, Atharva Patil, Prof. B.B. Waghmode. “Crowd Monitoring
and Alert System” (2024).

[13] Esraa Samkari, Muhammad Arif, Manal Alghamdi, Mohammed A. Al Ghamdi. “Human Pose
Estimation Using Deep Learning: A Systematic Literature Review”(2023), Mach. Learn. Knowl. Extr.
2023, 5(4), 1612-1659.

[14] Dushyant Kumar Singha, Sumit Paroothi, Mayank Kumar Rusiac, Mohd. Aquib Ansari. “Human
Crowd Detection for City Wide Surveillance” (2020), doi: 10.1016/j.procs.2020.04.036.

[15] Liu, L., Ouyang, W., Wang, X. et al. Deep Learning for Generic Object Detection: A Survey. Int
J Comput Vis 128, 261–318 (2020), doi: 10.1007/s11263-019-01247-4.

[16] Krutika Patil, Madhura Vajarekar, Meera Yadate, T. N. Sawant “Vehicle Theft Detection Using
GSM on Raspberry Pi” Iconic Research And Engineering Journals Volume 3 Issue 11 2020 Page 119-
124.

[17] Kashaboina Radhika and Ramasamy Dr. Velmani 2020 IOP Conf. Ser.: Mater. Sci. Eng.
981042009. DOI10.1088/1757-899X/981/4/042009.

[18] R. Sai Sree, P. Chandu, B. Pranavi. “Real-Time Object Detection Using Raspberry Pi”
(2023).Raspberry Pi. 1R. Sai Sree, 2P. Chandu, 3B. Pranavi. 1Student at SNIST, 2Student at SNIST,
3Student at SNIST.

[19] S. Srikanth, Ch. Sai Kumar, N. Uday Rao, R. Srinivasa Rao. “Raspberry Pi Based Smart
Surveillance System” (2022).

[20] Akilan Thangarajah, Buddhapala Wongkaew, Mongkol Ekpanyapong. “Implementation of Auto


Monitoring and Short-Message-Service System via GSM Modem” (2013), IJCRT, arXiv, 2013,
arXiv:1501.01548.

41
REFERENCES

[21] H. Fateh Ali Khan, A. Akash, R. Avinash, and C. Lokesh, “WebRTC Peer to Peer Learning,”
Department of Information Technology, Valliammai Engineering College, Chennai, India, 2020.

[22] A. Tatrai and T. L. Semmens, "Real-time crowd measurement and management systems and
methods thereof," Patent, Australia, filed July 24, 2019, published 2021, CA3143637A.

[23] A. Moteki, N. Yamaguchi, and T. Yoshitake, "Camera pose estimation device and control
method," Patent (US), United States, filed October 12, 2023, published May 10, 2024,
US10088294B2.

[24] M. Redzic, J. Tang, Z. Hu, J. Antony, H. Wei, N. O’Connor, and A. Smeaton, "Crowd behavior
anomaly detection based on video analysis," Patent (WIPO International), filed October 7, 2019,
published April 15, 2021, WO2021069053A1.

42

You might also like