0% found this document useful (0 votes)
2 views81 pages

ABSTRACT Ujjwalror

The project report titled 'Abandoned Object Detection' by Ujjwal Choudhary focuses on developing an automated system for real-time detection of unattended objects in public surveillance footage to enhance public safety. Utilizing advanced deep learning algorithms, specifically YOLOv8, the system aims to accurately identify and alert authorities about potential threats posed by abandoned items in crowded environments. The report outlines the challenges faced in traditional surveillance methods and presents a solution that integrates computer vision techniques for effective monitoring and response.

Uploaded by

21040389
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views81 pages

ABSTRACT Ujjwalror

The project report titled 'Abandoned Object Detection' by Ujjwal Choudhary focuses on developing an automated system for real-time detection of unattended objects in public surveillance footage to enhance public safety. Utilizing advanced deep learning algorithms, specifically YOLOv8, the system aims to accurately identify and alert authorities about potential threats posed by abandoned items in crowded environments. The report outlines the challenges faced in traditional surveillance methods and presents a solution that integrates computer vision techniques for effective monitoring and response.

Uploaded by

21040389
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

A

PROJECT REPORT
ON
“Abandoned Object Detection”

Submitted in Partial Fulfillment of the Requirement for the Degree of


BACHELOR OF
TECHNOLOGY IN
Computer Science and Engineering

Submitted by:
Ujjwal Choudhary
210060101171

Under the Supervision of:


Mr. Kamal Kumar Gola
(Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE


AND ENGINEERING

College Of Engineering Roorkee


Veer Madho Singh Bhandari Uttarakhand Technical University
NH-72, Suddhowala, Dehradun, Uttarakhand, 248007
Session: 2024 – 2025
DEPARTMENT OF COMPUTER SCIENCE
AND ENGINEEIRING

Project Progress Report

Student ID: 210060101171


Student Name: Ujjwal Choudhary
Program: B.Tech Semester/Section: 8 / C
Session: Even Semester (2024-2025)
Project Mentor Name: Ms. Kamal Kumar Gola
Project Title: Abandoned Object Detection
Details of Visit to Mentor:

S. No Date Time Remarks Signature

Project In-Charge Signature

Ujjwal Choudhary (210060101171) Page No. I


CANDIDATE’S DECLARATION

I hereby declare that this project report titled “Abandoned Object Detection” is an
original work done by me under the supervision of Mr. Kamal Kumar Gola. It has not
been submitted previously for the award of any degree.

Ujjwal Choudhary

210060101171

Ujjwal Choudhary (210060101171) Page No. II


CERTIFICATE

This is to certify that the project titled “Abandoned Object Detection” submitted
by Ujjwal Choudhary, Roll No. 210060101171, has been carried out under my
guidance and is approved for submission.

Mr. Kamal Kumar Gola Signature


(Assistant Professor) (External Examiner)
Department of Computer Science and Engineering Name:
College Of Engineering Roorkee Designation:
Date: 26/05/2025

Ujjwal Choudhary (210060101171) Page No. III


ACKNOWLEDGEMENT

I sincerely express my gratitude to Mr. Kamal Kumar Gola, my project guide, for
her valuable guidance, encouragement, and support throughout this project. I also
extend my thanks to my department faculty, family, and friends for their
cooperation.

Ujjwal Choudhary
Date: 26/05/2025

Ujjwal Choudhary (210060101171) Page No. IV


TABLE OF CONTENTS

Sr. No. Title Page No.

1. Progress Report I
2. Candidate’s Declaration II
3. Certificate III
4. Acknowledgements IV
5. Table of Contents V
6. Abstract 1
7. Introduction 3
8. Literature Review 8
9. System Analysis 14
9.1 Existing System 16
9.2 Proposed System 19
10. System Design 27
10.1 Architecture Diagram 29
10.2 Data Flow Diagram (DFD) 34
10.3 Entity-Relationship (ER) Diagram 40
11. Implementation 45
11.1 Technologies Used 47
11.2 Coding & Modules 50
12. Testing & Validation 57
13. Results & Discussion 62
14. Conclusion & Future Work 67
15. References 75

Ujjwal Choudhary (210060101171) Page No. V


6. ABSTRACT

Security surveillance in public areas has become an essential function of modern


urban infrastructure due to rising concerns about terrorism, theft, and public safety.
Among the various aspects of intelligent surveillance, one of the most critical
challenges is the real-time detection of abandoned objects. Items such as unattended
bags, suitcases, or packages in crowded areas like malls, airports, bus stations, or
railway platforms can indicate potential threats and demand immediate action. In such
scenarios, timely detection and alert generation can play a pivotal role in preventing
incidents, enabling proactive responses, and improving overall situational awareness.

Traditional surveillance techniques rely on continuous manual monitoring of multiple


video feeds by human operators. This approach is inherently limited by the human
attention span and is highly susceptible to fatigue, distraction, and missed
observations, especially in busy, high-resolution, multi-camera environments. As the
scale of surveillance increases, the effectiveness of human-centric monitoring
decreases significantly. This gap has created the need for automated, intelligent
systems that can process video feeds in real-time, identify unattended objects, and
provide reliable alerts with minimal false positives.

This project presents a robust and intelligent solution for the real-time detection of
abandoned objects using a combination of deep learning algorithms, temporal
analysis, and computer vision techniques. The proposed system is built using
YOLOv8 (You Only Look Once, version 8)—a cutting-edge object detection
framework that achieves a balance between detection accuracy and inference speed.
YOLOv8 processes entire video frames in a single pass, enabling high-speed object
detection suitable for real-time applications. It is integrated into a Python-based
processing pipeline using libraries such as OpenCV, NumPy, and the Ultralytics
YOLO API.

The core functionality of the system is built around continuous frame-wise analysis of
detected objects. It focuses on object classes like backpacks, handbags, and suitcases,
which are most commonly linked to abandonment scenarios. Each detected object is
tracked over time using a lightweight tracking module that computes the object's
position, velocity, and trajectory. The system uses a temporal

Ujjwal Choudhary (210060101171) Page No. . 1


persistence model to determine whether an object has remained stationary beyond a
defined threshold (typically 90 frames or 3 seconds at 30 FPS). If an object is static
for too long and lacks any association with human presence or motion, it is classified
as potentially abandoned.

The proposed solution was evaluated on custom and publicly available surveillance
videos, simulating real-world crowded environments. It achieved a detection
precision of 85-88%, and an average processing speed of 28 frames per second on
standard mid-range hardware equipped with GPU acceleration. The system is capable
of simultaneously processing multiple video streams and can be integrated into larger
surveillance infrastructures without requiring specialized hardware.

Ujjwal Choudhary (210060101171) Page No. 2


7. INTODUCTION

Background and Motivation :

In today’s increasingly urbanized and densely populated societies, public safety has
become a priority issue. With the proliferation of high-traffic environments such as
airports, railway stations, metro terminals, bus depots, malls, and government
complexes, there is a heightened need for constant surveillance and threat monitoring.
One of the most pressing challenges in such scenarios is the identification of
abandoned or unattended objects. These items may seem innocuous—bags, suitcases,
boxes—but they can also be vehicles for malicious activity, including theft,
smuggling, or even terrorism.

Traditionally, surveillance of such environments has been handled by CCTV systems


monitored by human operators. While effective in principle, this approach is
burdened by several limitations: human fatigue, divided attention, and the inability
to consistently monitor multiple camera feeds over long durations. As a result, even
well-trained personnel may fail to identify threats in real time, especially in crowded
or dynamic environments.

This gap in monitoring efficiency has created a strong motivation to automate


abandoned object detection using artificial intelligence (AI) and computer vision.
Automated surveillance systems can continuously scan video footage, detect
suspicious objects, and trigger alerts with far greater reliability and speed than a
human operator. Such systems enhance not only threat detection but also provide
audit trails, visual records, and statistical data for further analysis and decision-
making.

Problem Statement :
The fundamental problem addressed by this project is the reliable, real-time detection
of abandoned objects in public surveillance footage. The task is non- trivial due to
several inherent challenges:

• Dynamic Environments: Public places have rapidly changing scenes


with varying lighting, motion, and crowd density.

Ujjwal Choudhary (210060101171) Page No. 3


• Ownership Ambiguity: Determining whether an object is truly abandoned
temporarily placed is complex.

• Camera Perspectives: Objects may appear differently based on


angles, occlusions, and distances from the camera.

• Performance Requirements: Detection must be fast and accurate enough


to operate in real-time without significant delay or resource overhead.

• False Positives: Incorrect alerts reduce system reliability and can


causeunnecessary panic or response activation.

The goal is to build a system that not only identifies static objects but also determines
whether they’ve been left unattended for a critical amount of time. This involves
combining object detection, object tracking, and temporal behavior analysis into a
cohesive and efficient software pipeline.

Developing automated video-surveillance systems is attracting huge interests for


monitoring public and private places. As these systems become larger, effectively
observing all cameras in a timely manner becomes a challenge, especially for public
and crowded places such as airports ,buildings , or railway stations. The automatic
detection of events of interest is a highly desirable feature of these systems to enable
focusing the attention on monitored places potentially at risk.

In the video-surveillance domain, Abandoned Object Detection (AOD) has been


thoroughly investigated in the last few years for detecting events of wide interest such
as abandoned objects and illegally parked vehicles . AOD systems analyze the
moving objects of the scenario with the objective of identifying the stationary ones,
which become the candidates to be abandoned objects. Later, a number of filtering
steps validate the candidates in order to determine whether they are vehicles, people,
or abandoned objects.

AOD systems face several challenges when deployed. They are required to perform
correctly under complex scenarios with changing conditions and a high density of
moving objects.Many visual factor impact AOD performance such as image noise,
appearing in low-quality recordings; illumination changes, either gradual or sudden,
camera jitter, and camouflage between a foreground object and the background are

Ujjwal Choudhary (210060101171) Page No. 4


some of the challenges in background subtraction approaches. Dynamic
backgrounds, containing background moving objects, are also an important issue to
be taken into account. Moreover, challenges with processing data in real time emerge
as large amounts of data must be handled by the (relatively) complex AOD systems
composed of several stages. Another critical challenge concerns the unsupervised
operation for long periods of time where the effect of visual factors dramatically
decreases performance and errors commonly appear in early stages of AOD systems
,which are propagated to the subsequent stages.

Current AOD systems mostly focus on two main stages of the processing pipeline:
stationary object detection and classification. The stationary object detection task
aims to detect the foreground objects in the scene remaining still after having been
previously moving. Once stationary objects are located, the classification task
identifies if the static object is an abandoned object or not. Despite the number and
variety of proposals, there is a lack of cross-comparisons (both theoretically and
experimentally), which makes it difficult to evaluate the progress of recent proposals.
In addition, these approaches provide partial solutions for AOD systems, as only one
stage of the full pipeline is studied. The impact of these partial solutions is rarely
studied for larger end-to-end systems whose input is the video sequence and the
output is the abandoned object event. Moreover, existing experimental validations are
generally limited to few, short, or low-complexity videos. Therefore, system
parameters may be over-fitted to the specific challenges appearing in the small
datasets, which makes it difficult to extrapolate conclusions to unseen data (e.g., long-
term operation).

To address the above-mentioned shortcomings, this paper proposes a canonical


framework representing the common functionality of AOD systems and survey seach
stage of these systems.We critically analyze recent advances for moving and
stationary object detection, people detection, and abandonment verification applied
to AOD systems. We also provide experimental comparisons for traditional and
recent approaches.

In the era of rapidly growing urban populations and increasing public activity in shared
spaces such as airports, bus stations, shopping malls, and educational institutions, the
challenge of ensuring public safety has become more critical than ever. The possibility

Ujjwal Choudhary (210060101171) Page No. 5


of suspicious or unattended objects posing a security threat—whether through neglect,
accident, or malicious intent—has led to a global demand for more intelligent,
proactive surveillance systems. Manual monitoring of security camera footage, though
still widely used, is time-consuming, error-prone, and heavily dependent on the
continuous attention of human operators. It is here that computer vision and artificial
intelligence offer a transformative solution.

This project proposes an Abandoned Object Detection System, an intelligent, real-time


solution designed to automatically identify and alert when an object has been left
unattended for a suspicious period. The system aims to enhance public safety by
monitoring video feeds, detecting target object classes (such as suitcases, backpacks,
or handbags), tracking their movement across time, and classifying them as abandoned
based on their stationary duration and owner absence.

Using modern object detection techniques—particularly the YOLOv8 deep learning


model—combined with real-time object tracking and temporal analysis, the system is
able to make fast and accurate decisions. The integration of Python with OpenCV and
NumPy provides a high-performance video processing pipeline capable of operating
on standard computing hardware without requiring costly servers or GPU clusters.

What sets this system apart is its real-world applicability, cost-efficiency, and modular
design. It can be deployed as a standalone surveillance node or integrated into a broader
smart security ecosystem. By including real-time audio-visual alerts and logs, the
system not only detects threats but assists in their timely response, making it a vital
tool for public safety administrators.

Moreover, the system aligns with emerging trends in smart city design, where urban
infrastructure leverages AI and IoT to create safer, responsive environments. With
minor adaptations, this solution can be extended to identify misplaced items in schools,
monitor parcels in corporate logistics, or flag unattended items during mass public
events or pilgrimages.

The project also addresses key research challenges in AI, such as object permanence,
owner-object association, and adaptive thresholding based on environmental behavior.
These topics are actively studied in modern AI systems, and our approach adds
valuable insights by proposing a lightweight, deployable method that combines

Ujjwal Choudhary (210060101171) Page No. 6


detection, motion tracking, and decision logic in a unified pipeline.

In conclusion, the Abandoned Object Detection System represents a practical


application of deep learning and computer vision that transcends academic theory and
meets a pressing societal need. It demonstrates how automation and intelligence can
enhance public security without excessive cost or operational complexity—paving the
way for smarter, safer environments worldwide.

Ujjwal Choudhary (210060101171) Page No. 7


8. LITERATURE REVIEW

8.1 Evolution of Object Detection Techniques


The growing importance of public safety has fueled extensive research into intelligent
surveillance systems, especially those capable of detecting suspicious or unattended
objects. As the demand for automated monitoring in environments such as airports,
shopping malls, bus terminals, and railway stations increases, the literature reflects a
shift from traditional rule-based systems to advanced, learning-enabled object
recognition frameworks. In recent years, deep learning has emerged as the most
effective paradigm for video-based surveillance, especially in tasks like object
detection, behavior monitoring, and threat prediction.
One of the most influential advancements in object detection is the You Only Look
Once (YOLO) algorithm, first introduced by Redmon et al., which reframed object
detection as a regression problem and enabled real-time detection by processing the
entire image in a single forward pass. Over time, YOLO has evolved through multiple
iterations, each offering improved speed and accuracy. The most recent version,
YOLOv8, developed by Ultralytics, has gained popularity due to its lightweight
architecture, high detection precision, and plug-and-play compatibility with Python-
based systems. This makes YOLOv8 particularly suitable for real-time surveillance
applications like abandoned object detection, where speed and resource efficiency are
critical.
Unlike earlier methods that depended on handcrafted features and background
subtraction, deep learning models like YOLOv8 learn contextual and spatial patterns
directly from annotated data. This allows for more robust detection across variable
lighting, camera angles, and object appearances. In particular, YOLOv8 can detect
multiple classes of objects—including suitcases, backpacks, and handbags—with high
confidence and minimal false detections, which is crucial when distinguishing
potentially hazardous items in public environments.
To go beyond mere detection, object tracking plays a vital role in determining whether
an object has been abandoned. While many studies have employed multi-object trackers
like SORT (Simple Online and Realtime Tracking) and Deep SORT, custom solutions
have also proven effective. Tracking involves assigning consistent IDs to detected

Ujjwal Choudhary (210060101171) Page No. 8


objects across consecutive frames, thereby allowing the system to monitor their
movement. If an object remains stationary for a pre-defined duration and lacks
proximity to a human subject, it can be flagged as abandoned. This kind of temporal
and spatial reasoning is central to the abandonment logic employed in modern
surveillance systems, including the one developed in this project.
Several recent academic contributions support the integration of temporal analysis with
deep learning-based object detection for real-world surveillance tasks. For example,
Liu et al. explored the use of lightweight neural networks for real-time abandoned
object identification on embedded platforms. Similarly, work by Kiran et al. highlights
the importance of edge AI and GPU acceleration in achieving real-time performance in
urban monitoring systems. These studies emphasize the feasibility of deploying
detection models on consumer-grade hardware without compromising accuracy—a
challenge this project addresses by combining YOLOv8 with frame optimization
techniques such as frame skipping and resolution scaling.
Datasets such as PETS2006 and i-LIDS have been widely used to evaluate abandoned
object detection systems in controlled environments. These datasets provide labeled
examples of object placements, removals, and abandonment scenarios, making them
valuable benchmarks for testing both detection and decision-making algorithms.
Though the current project primarily uses custom video footage, it follows the
evaluation criteria established by these benchmarks, including metrics like precision,
recall, and frame-level response time.
What distinguishes recent approaches from older systems is the ability to contextualize
object presence within a scene. Advanced models can determine not just what an object
is, but whether its current state and location align with expected human behavior. Some
experimental frameworks even include person-object association to identify ownership,
a feature that can prevent false alarms by linking items with nearby individuals. While
our current system does not yet implement ownership inference, the literature provides
clear pathways for incorporating such features in future work.
In conclusion, the field of abandoned object detection has progressed from basic motion
analysis to sophisticated, context-aware AI systems that combine real-time object
recognition with temporal reasoning. The project described in this report builds upon
these foundations by adopting YOLOv8 for high-performance detection and
implementing custom tracking logic to assess object behavior over time. This positions
the system as a modern, efficient solution aligned with both academic advancements

Ujjwal Choudhary (210060101171) Page No. 9


and real-world security needs.

These classical methods suffered from limitations in handling cluttered backgrounds,


varying lighting conditions, and changes in object scale or orientation. Moreover, they
struggled in crowded environments and lacked the ability to learn from large datasets.
As a result, the research community began exploring deep learning approaches, which
led to a breakthrough in both accuracy and scalability.
The advent of Convolutional Neural Networks (CNNs) marked a turning point in the
field. Algorithms such as AlexNet, VGGNet, and ResNet laid the groundwork for
automated feature learning. Instead of manually designing feature extractors, CNNs
learned hierarchical representations of input data, improving robustness and
generalization.

8.2 Two-Stage Object Detectors: R-CNN Family


The introduction of Region-based Convolutional Neural Networks (R-CNN) by
Girshick et al. (2014) ushered in a new era for object detection. The model worked in a
two-stage pipeline:
1. Region Proposal: Extract regions that may contain objects using selective
search.
2. Classification: Use CNNs to classify each proposed region.
While R-CNN achieved high accuracy, it was computationally expensive. This led to
subsequent improvements like:
• Fast R-CNN: Introduced RoI pooling to speed up processing.
• Faster R-CNN: Replaced selective search with a Region Proposal Network
(RPN), enabling end-to-end training.
These methods achieved state-of-the-art results on benchmark datasets but were often
too slow for real-time use, particularly in applications like live surveillance or
autonomous driving.

8.3 One-Stage Detectors: YOLO and SSD


To address the speed limitations of two-stage detectors, researchers introduced single-
shot detectors like YOLO (You Only Look Once) and SSD (Single Shot MultiBox
Detector).
YOLO

Ujjwal Choudhary (210060101171) Page No. 10


The original YOLO algorithm, proposed by Redmon et al. (2016), approached object
detection as a regression problem. It divided the input image into a grid and predicted
bounding boxes and class probabilities directly, all in a single forward pass through the
network. This made it extremely fast—capable of processing real-time video streams.
YOLO has undergone several iterations:
• YOLOv2: Introduced batch normalization, anchor boxes, and better resolution
handling.
• YOLOv3: Added multi-scale detection and feature pyramid networks.
• YOLOv4 & v5: Enhanced backbone networks and training tricks.
• YOLOv8: Utilized neural architecture search, anchor-free detection, and
improved accuracy-speed trade-offs.
These advancements made YOLO not only a real-time solution but also increasingly
accurate in detecting small and overlapping objects—an essential feature for detecting
bags in crowded scenes.
SSD
SSD, introduced by Liu et al. (2016), also performed object detection in a single shot.
Unlike YOLO, it predicted objects at multiple feature map levels, improving
performance on small-scale objects. However, YOLO’s architectural efficiency and
ease of implementation often made it the preferred choice for surveillance applications.

8.4 Abandoned Object Detection: Historical Approaches


Abandoned object detection has long been studied in the context of surveillance,
particularly in transportation and high-security domains. Initial solutions used:
• Background Subtraction: Detect foreground objects that remain stationary.
• Optical Flow: Track movement vectors and identify non-moving items.
• Dual Background Modeling: Maintain short-term and long-term background
models to detect new, static objects.
While effective in controlled environments, these approaches failed in real-world
scenarios with changing lighting, moving crowds, and complex backgrounds. They also
lacked the intelligence to understand object ownership or interaction.
Tian et al. (2004) proposed a rule-based method for identifying abandoned luggage, but
the reliance on motion masks and static thresholds made it sensitive to environmental
noise. Lin et al. (2015) attempted to use dual-background modeling to improve
robustness but faced challenges in crowded scenes.

Ujjwal Choudhary (210060101171) Page No. 11


8.5 Deep Learning-Based Abandoned Object Detection
Recent research has leveraged CNNs for abandoned object detection. These models
aim to combine object detection with contextual and temporal reasoning:
• Singh et al. (2018) used CNNs to classify static objects based on appearance
and temporal history.
• Tripathi et al. (2019) employed transfer learning to improve detection in
limited-data scenarios.
• Fan et al. (2013) combined foreground detection with object classification to
reduce false positives.
However, these models are often trained on limited datasets and do not scale well to
real-time, high-resolution video feeds. Furthermore, they typically ignore object
ownership dynamics and human-object interaction, which are critical in correctly
determining abandonment.
The challenge remains in striking the right balance between:
• Speed and accuracy
• Simplicity and intelligence
• Real-time execution and scene understanding

8.6 Role of Temporal Analysis and Object Tracking


A key advancement in abandoned object detection is the use of temporal analysis—
evaluating object behavior over time rather than in single frames. This is essential to
distinguish:
• A bag temporarily placed by someone using an ATM
• A truly abandoned suitcase in a public area
Object tracking frameworks like Kalman filters, SORT (Simple Online Real-time
Tracking), and Deep SORT are commonly used to maintain object identity across
frames. They provide velocity, direction, and positional consistency—enabling robust
decision-making about abandonment.
In our project, we use a custom tracking module that maintains a short history of object
positions and calculates motion metrics to determine stationary status. This is combined
with a configurable threshold (in frames or seconds) to define the abandonment
window.

Ujjwal Choudhary (210060101171) Page No. 12


8.7 Gaps in Existing Literature
Despite the progress, several research gaps remain:
• Lack of public datasets specifically labeled for abandoned object scenarios.
• Insufficient contextual understanding of scene dynamics and human interaction.
• Limited scalability of existing methods for real-time, multi-camera
environments.
• Poor integration with standard CCTV systems or low-power hardware.
• Overreliance on static thresholds, which vary significantly across environments.
Our project addresses these challenges through:
• A modular YOLOv8-based detection pipeline
• Velocity and position-based object tracking
• Temporal persistence metrics for abandonment
• Real-time alarm and logging system
• Compatibility with commodity hardware

8.8 Summary and Conclusion


The literature shows a clear evolution from simple motion-based systems to
sophisticated deep learning-based detectors. While modern object detectors like
YOLOv8 provide high accuracy and speed, integrating them into real-world abandoned
object detection systems requires more than just detection. It demands temporal
analysis, robust tracking, low false positive rates, and a user-friendly alerting
mechanism.
By combining state-of-the-art object detection with smart tracking and real-time
feedback, our system makes a significant contribution to the field of intelligent
surveillance and automated security monitoring. It demonstrates not only technical
feasibility but also practical viability in environments where public safety is paramount.

Ujjwal Choudhary (210060101171) Page No. 13


9. SYSTEM ANALYSIS

Introduction to System Analysis


System analysis is a critical phase in the development of any engineering solution. It
involves examining the problem environment, understanding the current limitations,
and designing a future-ready system that fulfills functional, technical, and performance
requirements. For the project titled “Abandoned Object Detection”, system analysis
serves to define the problem clearly, explore existing solutions, and justify the need for
a more intelligent, automated approach using computer vision and deep learning.
This chapter lays the groundwork for the technical framework of the system. It breaks
down how existing surveillance systems operate, identifies their weaknesses, and paves
the way for a proposed solution that incorporates real-time object detection, tracking,
and temporal evaluation to classify objects as abandoned or attended.

Role of System Analysis in Surveillance Projects


In the context of surveillance and public safety, system analysis helps to:
• Define the scope of surveillance objectives (e.g., unattended item detection)
• Identify critical failure points in human-operated or traditional detection
systems
• Determine environmental factors like lighting, crowd density, camera angle,
and occlusion that affect detection accuracy
• Establish performance metrics such as accuracy, false positive rate, and
response time
• Design a modular, scalable, and adaptable architecture for detection systems

System analysis ensures that the proposed solution aligns with real-world operational
requirements such as:
• Real-time performance: The system must operate on live video feeds without
significant delay.
• High accuracy: The system must distinguish between temporarily placed and
truly abandoned objects.
• Low resource usage: It must function on standard CCTV systems and general-

Ujjwal Choudhary (210060101171) Page No. 14


purpose hardware.
• Scalability: The design must support deployment across multiple cameras and
locations.

Challenges in Detecting Abandoned Objects


Detecting abandoned objects in a public environment is not a trivial task. It involves
understanding object behavior over time, often in the presence of multiple moving
entities and environmental disturbances. Some key challenges include:
1. Ambiguity in object behavior: A bag placed temporarily by a person may
appear abandoned to a naive system.
2. Crowd occlusion: Moving individuals may block the object view, making
consistent tracking difficult.
3. Lighting variation: Indoor/outdoor lighting changes affect detection
reliability.
4. Camera angle and resolution: Poor perspectives and low-quality footage can
hinder detection.
5. Ownership determination: A crucial yet complex aspect is distinguishing
whether an object is truly unattended.
These issues underline the need for not just detecting objects but understanding their
temporal behavior through tracking and scene analysis.

Objectives of System Analysis for This Project


The system analysis conducted for this project has the following objectives:
• To identify the gaps in manual surveillance and traditional detection
systems
• To define the technical scope and limitations within which the proposed
system must function
• To ensure that the solution is modular, extensible, and configurable
• To design a framework that combines object detection, temporal tracking,
and decision logic effectively
• To ensure the solution is deployable and maintainable in real-world security
settings
By conducting this analysis, the development process gains clarity in requirements,
resulting in a system that is both functionally sound and technically feasible.

Ujjwal Choudhary (210060101171) Page No. 15


9.1 Existing System

9.1.1 Overview
In the current security landscape, the most widely deployed systems for public
surveillance rely on Closed-Circuit Television (CCTV) setups, monitored in real-time
by human operators. These systems aim to detect suspicious activity, including the
presence of unattended or abandoned objects in public places such as airports, train
stations, malls, and office complexes.
While such setups have been in use for decades, the core method has remained
relatively unchanged—human-dependent observation with limited technological
assistance. In some cases, basic video analytics are incorporated, including motion
detection, background subtraction, or zone-based monitoring. However, these are
typically rule-based systems, offering minimal intelligence or scene understanding.

9.1.2 Process Flow of the Traditional System


The figure below outlines the generalized workflow of an existing (traditional)
surveillance setup:

CCTV Camera Feed



Human Surveillance Officer

Visual Monitoring of Multiple Screens

Manual Observation of Static Objects

Human Judgment → Is it Suspicious?

Raise Alarm or Notify Authorities

In cases where semi-automated software is used, the system often relies on simple
motion rules like:
• Detecting objects that appear in a scene and remain static
• Triggering alarms based on pixel-level changes or predefined zones

Ujjwal Choudhary (210060101171) Page No. 16


9.1.3 Limitations of the Existing System
Despite being widespread, traditional systems suffer from several critical
shortcomings:

Dependence on Human Attention


Operators often monitor 10–20 camera feeds simultaneously, resulting in reduced
attentiveness and increased chances of oversight. This is especially problematic in high-
traffic environments.

No Memory of Temporal Context


CCTV systems do not inherently track how long an object has remained stationary. Any
analysis of object behavior over time is left to human interpretation, which is prone to
inconsistency.

Poor Handling of Crowded Scenes


In scenarios with multiple people and overlapping movements, background subtraction
or motion-based systems fail to isolate abandoned objects due to occlusions or noise.

No Ownership Analysis
Traditional systems cannot assess whether an object is “unattended” or if the owner is
nearby. A bag placed by someone temporarily sitting or talking may be incorrectly
flagged or overlooked.

Static Rules and No Adaptation


Zone-based alerts or duration-based triggers in simple analytics tools lack intelligence.
They do not adapt based on crowd density, object type, or motion behavior.

9.1.4 Comparative Table – Traditional vs Intelligent Systems


Feature Traditional Modern Intelligent System
Surveillance System (like this project)
Object Detection None / Basic Motion Deep Learning (YOLOv8)
Analysis

Ujjwal Choudhary (210060101171) Page No. 17


Object Tracking None Yes (Tracker with motion
history)
Temporal Analysis Human-based memory Automated frame-based
duration logic
Alarm Triggering Manual or motion- Smart triggers based on
threshold based persistence & context
Ownership Detection No (Possible with extensions)
Real-time Alerting Manual Visual + Audio alerts
False Positive Control Low High (velocity + position
filter)
Crowded Environment Poor Optimized tracking +
Handling stability checks
Scalability to Multi- Limited by human Scalable with multi-stream
Camera Systems capacity processing
Consistency & Varies by operator Consistent logic and output
Repeatability
9.1.5 Real-World Example: A Common Failure Case
Let’s consider a metro station as an example. A person places a black backpack beside
a bench and walks away to use a vending machine. The bag remains unattended for 2
minutes. In a traditional system:
• The operator may miss it entirely due to monitoring other feeds.
• Motion-based software might not detect it, as there’s constant foot traffic and
background motion.
• Even if noticed, the timestamp or duration may not be clear, leading to delay in
raising an alert.
The consequences in such cases can be serious—ranging from security lapses to public
panic, depending on the context of the object and location.

9.1.6 Summary
Traditional systems are manual, static, and inefficient when it comes to abandoned
object detection. Their shortcomings in handling dynamic scenes, understanding
context, and providing reliable alerts make them unsuitable for modern urban
surveillance needs.

Ujjwal Choudhary (210060101171) Page No. 18


There is a growing demand for intelligent, automated solutions that go beyond pixel
changes and duration thresholds. This sets the stage for the Proposed System, which
leverages deep learning, tracking, and temporal logic to address these challenges.

9.2 Proposed System

9.2.1 Overview
To overcome the shortcomings of traditional surveillance systems, we propose a real-
time, intelligent abandoned object detection system that integrates object detection,
tracking, and temporal reasoning. This system leverages the power of YOLOv8, a
state-of-the-art deep learning model, to detect objects and combines it with a custom
tracking module to determine object movement and abandonment behavior.
The system is designed to function independently, with minimal human intervention,
and provides immediate visual and audio alerts for unattended objects such as
backpacks, handbags, and suitcases that remain stationary beyond a configurable
threshold (e.g., 90 frames ≈ 3 seconds at 30 FPS).

9.2.2 System Architecture Diagram (Text Layout)


Below is a simplified representation of the system’s architectural flow:
┌────────────────────┐
│ Video Feed (Input) │
└────────┬───────────┘

┌────────────────────────┐
│ Frame Preprocessing (Resize, Color) │
└────────┬───────────────┘

┌────────────────────────┐
│ YOLOv8 Object Detection Engine │
└────────┬───────────────┘

┌─────────────────────────────┐
│ Object Tracker (Bounding Box, Velocity, ID) │
└────────┬────────────────────┘

Ujjwal Choudhary (210060101171) Page No. 19



┌─────────────────────────────┐
│ Temporal Analyzer (Stationary Detection Logic) │
└────────┬────────────────────┘

┌────────────────────────────────┐
│ Abandonment Decision Module (Threshold-Based) │
└────────┬───────────────────────┘

┌────────────────────────────┐
│ Alert System (Visual Box + Optional Alarm) │
└────────┬───────────────────┘

┌──────────────────┐
│ UI Display + Logging System │
└──────────────────┘

9.2.3 Functional Modules


1. YOLOv8 Object Detection
• Uses a pretrained YOLOv8 model from the Ultralytics library.
• Detects common abandoned object classes (backpack, suitcase, handbag).
• Offers fast and accurate detections with real-time frame processing.

2. Tracking Module
• Assigns a unique ID to each object and maintains its trajectory.
• Records historical positions using deques for calculating movement.
• Filters false positives by analyzing position stability and velocity.

3. Temporal Logic & Abandonment Decision


• Analyzes the duration for which an object remains stationary.
• If the stationary period exceeds a set threshold, the object is considered
“abandoned.”
• Incorporates standard deviation and velocity thresholds to minimize false
alarms.

Ujjwal Choudhary (210060101171) Page No. 20


4. Alert System
• Displays a red bounding box around abandoned items.
• Adds object labels with time of stationary behavior (e.g., “Abandoned Bag
(12s)”).
• Triggers an audio alert (optional) using the pygame library for real-time
response.

5. User Interface
• Displays live annotated frames.
• Shows logs such as:
o Object ID
o Detected class
o Timestamp
o Duration stationary
o Alert status

9.2.4 Key Features of the Proposed System

Feature Description
Real-time Detection Detects objects and processes each frame within
milliseconds
High Accuracy Achieves over 94% detection accuracy on standard
datasets
Temporal Awareness Uses time-based tracking to classify abandonment
Visual Alerts Clear, colored bounding boxes and text overlays
Audio Integration Plays an alert sound for immediate operator attention
Modular Code Design Easily extendable with new object classes
Customizability Parameters like stationary threshold and alert delay
are tunable
Low Hardware Can run on mid-tier GPUs or CPUs with minor
Dependency performance tradeoff

Ujjwal Choudhary (210060101171) Page No. 21


9.2.5 Advantages Over Traditional Systems

Criteria Traditional System Proposed System


Detection Manual/Zone-Based Deep Learning (YOLOv8)
Technology
Tracking None or Basic Motion Persistent Object ID Tracking
Temporal None (Human Automated Frame-Based
Reasoning Judgment Only) Decision Logic
Alert Mechanism Manual / Rule-Based Real-Time Visual + Optional
Audio
Accuracy Moderate (Subjective) High (Model-Based with
Evaluation Metrics)
Handling Weak Optimized with Filtering and
Crowd/Noise Tracker History
Operational Low (Limited by High (Can Scale to Multi-
Scalability Operators) Camera Systems)
False Positive Minimal Velocity + Spatial Stability
Reduction Based Filtering

9.2.6 Benefits of the Proposed Design


• Efficient & Autonomous: Eliminates the need for round-the-clock human
attention to each feed.
• Adaptable: Can be tuned for different environments—airports, malls, metro
stations, etc.
• Scalable: Easily integrates with cloud systems, VMS (Video Management
Software), or IoT-based alerting.
• User Friendly: Offers clear visualization and logs for security officers and
auditors.

9.2.7 Deployment & Real-World Readiness


The proposed system has been tested using real-world CCTV footage and benchmark
datasets. With an average frame processing rate of 28 FPS, it meets real-time

Ujjwal Choudhary (210060101171) Page No. 22


requirements. It also logs abandoned object events for future inspection, ensuring
traceability and auditability.
The system can be deployed as:
• A standalone application for a single camera feed
• A background service on surveillance servers
• A containerized module integrated with existing smart city infrastructure

9.2.8 Summary
This proposed system represents a significant leap forward from static surveillance
systems. It combines the speed of YOLO-based detection with intelligent temporal
logic and practical alert mechanisms. The result is a system that is not only technically
sound but also functionally practical, operationally scalable, and aligned with modern
public safety needs.

9.2.9 System Functionality


The proposed system is designed with real-world usability in mind. In surveillance
scenarios such as railway platforms, metro stations, airport terminals, or shopping
malls, objects may be placed and forgotten or, in worst cases, left deliberately. The
challenge lies not just in recognizing an object, but in understanding the context in
which the object exists—how long it has been there, whether it has moved, and whether
it still belongs to someone nearby.
Unlike conventional detection systems that merely detect the presence of an object, this
system tracks the object over time, evaluating both its movement behavior and
stationary duration. If the object’s position remains almost unchanged beyond a
threshold period, the system begins treating it as a candidate for abandonment.
This approach is particularly beneficial because it incorporates temporal consistency—
a crucial element missing in many rule-based systems. Static rules (such as "trigger an
alert if a bag remains in a zone for 5 seconds") often lead to false alarms because they
lack awareness of object ownership, user interaction, and scene dynamics. Our system
reduces such issues by tracking objects independently of zones and considering how
their movement patterns change over time.
An important advantage of the system is that it is not limited to predefined zones. Many
existing solutions require security officers to manually mark areas of interest on the
screen, which becomes impractical when hundreds of cameras are deployed. The

Ujjwal Choudhary (210060101171) Page No. 23


proposed system detects, tracks, and analyzes every object, regardless of where it
appears in the frame, making it fully autonomous and more flexible for large-scale
deployments.

9.2.10 Modularity and System Flow


Each component in the system is designed with modularity in mind. This ensures that
parts of the system can be independently updated or replaced without affecting the
whole architecture. For instance:
• The detection engine (YOLOv8) can be updated to a newer model (e.g.,
YOLOv9 or another architecture) without modifying the tracking or alerting
logic.
• The temporal analysis logic can be tuned differently for different sites. A
railway station might use a 30-second threshold, while a shopping mall may
extend this to 60 seconds based on behavior norms.
This modularity enables easy deployment and configuration across varied
environments and security setups.
The software also supports frame skipping, an important optimization when working
with high-resolution video feeds. Instead of analyzing every single frame, it processes
one out of every few frames (e.g., every third frame), which maintains accuracy while
lowering CPU/GPU load.

9.2.11 Practical Considerations and Deployment Flexibility


This system is designed for real-time surveillance integration, meaning it can:
• Be installed alongside existing CCTV systems with minimal modification.
• Run continuously in the background, triggering alerts only when necessary.
• Integrate with security dashboards via simple logging or messaging systems.
It can operate locally on a workstation, or be scaled to cloud infrastructure to support
multiple camera streams. This makes it suitable not just for small-scale surveillance
(e.g., a single building) but also for large smart city environments where real-time
alerting is mission-critical.
In environments where network latency or processing capacity is a constraint, the
system can also be deployed on edge devices, allowing real-time decision-making at
the source (e.g., directly on the CCTV camera system or an on-premise GPU-enabled
device).

Ujjwal Choudhary (210060101171) Page No. 24


Core Objectives of the Proposed System
• Automate detection of unattended objects in video footage.
• Utilize deep learning models for high accuracy across various environments.
• Track object positions across time to identify abandonment.
• Minimize false positives by combining spatial and temporal constraints.
• Generate visual and audio alerts in real time.
• Ensure smooth operation on moderate-specification hardware.

9.2.12 Future-Ready Capabilities


While this system currently focuses on abandoned objects, it also lays the foundation
for more advanced surveillance features, including:
• Ownership tracking: Using person-object association logic to detect if a
person has left an object.
• Facial recognition integration: Associating abandoned objects with detected
identities for audit trails.
• Behavior analysis: Flagging suspicious movement patterns before
abandonment occurs.
• Multi-camera re-identification: Tracking the same object across multiple
camera angles or views.
Because the current system is built in Python, it can integrate easily with open-source
tools such as TensorFlow, OpenVINO, or even custom APIs to bring in these
capabilities with minimal rework.

9.2.13 Conclusion
In summary, the proposed system is not just a technical upgrade—it is a holistic
redesign of how surveillance can be automated in public areas to detect and respond to
potential threats. It combines state-of-the-art object detection (YOLOv8), custom
object tracking, and intelligent temporal reasoning to create a solution that is:
• Accurate
• Fast
• Scalable
• Customizable
• Ready for deployment
By automating one of the most error-prone aspects of human surveillance—identifying

Ujjwal Choudhary (210060101171) Page No. 25


when and where an object has been left unattended—the system significantly enhances
security effectiveness and reduces operational workload. It’s a forward-thinking
solution built for the future of smart surveillance systems.

Ujjwal Choudhary (210060101171) Page No. 26


10. SYSTEM DESIGN

Introduction
System design is the blueprint that translates the requirements and analysis into a
structured technical framework for implementation. In the case of Abandoned Object
Detection, the system is designed to monitor video feeds, detect objects of interest,
track their movement across frames, and determine if they have been left unattended
for a certain period. The design focuses on real-time performance, modular
integration, and scalability to meet the operational demands of modern surveillance.
This system integrates deep learning-based detection, custom object tracking, and
temporal behavior analysis into a single processing pipeline, enabling accurate,
automated, and efficient detection of abandoned objects in public environments.

Design Objectives
The system is designed with the following core objectives:
• Accuracy: Correctly detect objects of interest (e.g., bags, suitcases) with
minimal false positives.
• Real-Time Processing: Achieve high frame rates (≥25 FPS) for live video
feeds.
• Temporal Logic: Distinguish between temporarily stationary and abandoned
objects through tracking and duration analysis.
• Alert Generation: Notify users via visual overlays and optional audio signals
when abandonment is detected.
• Flexibility: Easily reconfigurable object classes, time thresholds, and video
sources.
• Scalability: Compatible with multi-camera systems and adaptable to edge or
cloud deployment.

System Flow and Key Components


The system is composed of several core modules that work together in a pipeline:

1.Video Input Capture

Ujjwal Choudhary (210060101171) Page No. 27


The system begins by acquiring frames from a camera feed or video file. It
uses OpenCV for consistent frame capture, supporting various formats and
resolutions. The frame rate and resolution are configurable depending on
deployment conditions and performance targets.

2.Preprocessing
Each frame is resized and formatted for compatibility with the object detection
engine. Color conversion (BGR to RGB) is applied since YOLO models are
typically trained on RGB inputs. Frame skipping is optionally employed to
reduce computational load.

3.Object Detection
This module uses the YOLOv8 model from Ultralytics to identify specific
object classes. It processes each frame to locate objects (e.g., bags) and returns:
o Bounding box coordinates
o Object class label
o Confidence score

4.Object Tracking
Detected objects are handed off to a custom tracking algorithm that assigns
persistent IDs, maintains position history, and computes velocity. This module
ensures continuity between frames, even if objects slightly shift or are partially
occluded.

5.Temporal Evaluation
Each tracked object is evaluated for movement. If the object remains nearly
stationary for a duration longer than the threshold (e.g., 90 frames ≈ 3 seconds),
it is flagged for potential abandonment. This prevents false positives from brief
halts or dropped bags.

6.Abandonment Detection Logic


The system makes a final decision based on:

Ujjwal Choudhary (210060101171) Page No. 28


o Number of frames the object is stationary
o Stability of its bounding box center
o Lack of nearby movement (optional enhancement)
If confirmed abandoned, the object is registered as such and added to
the alert system.

7.Alerting and Visualization


For every abandoned object, a red bounding box and class label with timestamp
(e.g., “Abandoned Suitcase (12s)”) is drawn. Optionally, an audible alert is
triggered to notify operators. These alerts are persistent throughout the video
feed session.

8.Logging and Reporting


The system maintains a structured log that includes:
o Object ID and class
o Time detected
o Duration stationary
o Abandonment status

Design Philosophy
This system emphasizes modularity, allowing independent development, testing, and
tuning of each component. For instance:
• YOLOv8 could be replaced with a different detection model (e.g., YOLOv9,
SSD).
• The tracking logic can be swapped with Deep SORT or Kalman filters.
• Alert outputs can be extended to connect with SMS/email APIs or IoT devices.
This makes the system highly adaptable, future-proof, and compatible with different
security workflows.

10.1 Architecture Diagram

10.1.1 Introduction
The architecture of an Abandoned Object Detection System represents the organized
structure of interconnected components that work together to achieve real-time

Ujjwal Choudhary (210060101171) Page No. 29


identification of unattended items in a video stream. Each module in the architecture is
responsible for a critical part of the process—from acquiring the video input to
analyzing object movement, determining if an object is abandoned, and triggering
appropriate alerts.
This layered approach ensures modularity, efficiency, and scalability, making it ideal
for real-world deployment in public areas such as train stations, airports, malls, or metro
terminals.

1. Video Input Module


The architecture begins with the Video Input module, which captures real-time frames
from live surveillance cameras or recorded video files. This module serves as the
gateway through which data (video feed) enters the system.
Key responsibilities include:
• Reading video using OpenCV from camera or file
• Ensuring frame rate consistency (e.g., 25–30 FPS)
• Supporting various resolutions and formats
• Buffering frames for real-time analysis
This module must be optimized for speed and minimal latency to maintain smooth
downstream processing. If deployed on edge devices, it may also include lightweight
frame compression and motion buffering.

2. Preprocessing Module
Once a frame is captured, it is sent to the Preprocessing module, which prepares it for
object detection. Preprocessing ensures that the input is uniform and compatible with
the YOLO model used for detection.
Steps involved:
• Resizing the image (e.g., to 640×640 pixels) for YOLOv8 compatibility
• Color space conversion from BGR (used by OpenCV) to RGB (used by YOLO
models)
• Normalization of pixel values if required
• Frame skipping logic to reduce redundant computation and improve
performance

Preprocessing minimizes the computational cost and helps the detection model focus

Ujjwal Choudhary (210060101171) Page No. 30


on relevant information, increasing detection speed and accuracy.

3. YOLOv8 Object Detection Module


At the heart of the architecture lies the YOLOv8 Object Detection module. YOLO
(You Only Look Once) is a deep learning model capable of identifying objects in a
single pass over the image.
YOLOv8 responsibilities include:
• Detecting and classifying objects like backpacks, handbags, and suitcases
• Generating bounding boxes around detected items
• Assigning confidence scores and class labels
• Filtering out low-confidence detections
YOLOv8 is selected because it balances speed and accuracy, enabling the system to
operate at real-time frame rates while reliably detecting multiple objects in crowded
scenes.

4. Temporal Analyzer (Stationary Detection Logic)


The outputs of the YOLO model are passed to the Temporal Analyzer, which is
responsible for object tracking and movement evaluation.
Functions include:
• Assigning a unique ID to each detected object
• Maintaining a history of object positions across frames
• Calculating object velocity using changes in the center point
• Determining if an object has remained stationary beyond a configured threshold
(e.g., 90 frames)

This module is crucial because abandoned object detection is not just about identifying
an object but evaluating its behavior over time. For example, a person setting down
a bag momentarily should not trigger an alert. Only truly stationary objects with no
interaction for a prolonged period are flagged.

5. Abandonment Decision Module


After temporal analysis, the decision on whether an object is abandoned is made in the
Abandonment Decision module.
Key logic includes:

Ujjwal Choudhary (210060101171) Page No. 31


• Checking if the object has exceeded the stationary duration
• Verifying position variance (standard deviation) to ensure minimal movement
• Optionally analyzing nearby motion (e.g., is someone standing near the bag?)

If all criteria are met, the object is classified as abandoned. It is then moved to the
alerting system for visual and audio indication.
This module also prevents re-detection or multiple alerts for the same item by flagging
it as already processed.

6. Alert System
The final module in the architecture is the Alert System, which provides real-time
visual and audio feedback to the operator or security staff.
It performs the following actions:

• Drawing a red bounding box around the abandoned object


• Adding a label such as “Abandoned Suitcase (15s)”
• Playing an alarm sound (e.g., using the Pygame library)
• Logging the event with metadata (timestamp, object type, duration)
This alert is persistent until the system is reset or the object is removed. It ensures that
even in busy control rooms, critical alerts are not missed.

Advantages of the Architecture


The architecture of the system is designed to fulfill real-world needs in surveillance
with the following benefits:

• Real-Time Processing: Each component is optimized to support frame-by-


frame operation without delay.
• High Accuracy: YOLOv8 provides strong object detection capabilities.
• Low False Positives: Temporal logic ensures only genuinely abandoned items
are flagged.
• Scalability: Can support multiple camera streams or be deployed on different
hardware platforms.

Ujjwal Choudhary (210060101171) Page No. 32


Figure: 10.1.1

Ujjwal Choudhary (210060101171) Page No. 33


10.2 Data Flow Diagram

10.2.1 Level 0 Diagram (Context-Level DFD)


The Level 0 Data Flow Diagram, also known as the Context-Level DFD, provides a
high-level overview of the entire Abandoned Object Detection System as a single,
unified process. It visualizes the flow of information between the system and its
external entities without going into internal complexities. This diagram is essential for
understanding the overall boundaries of the system, identifying external actors, and
showing how data is exchanged between the environment and the system.
In the context of this project, the system is designed to detect unattended or abandoned
objects such as backpacks, handbags, and suitcases in public environments like railway
stations, malls, or bus terminals, using AI-powered surveillance.

System Components in Level 0 DFD


The Level 0 DFD includes three core elements:

1. Surveillance Cameras (External Entity)


These act as the system’s primary data source. Multiple CCTV cameras provide
real-time video feeds that serve as the input for the detection system. These
feeds continuously capture live footage from different locations and transmit it
to the system for analysis.

2. Abandoned Object Detection System (Main Process)


This is the central process represented in the Level 0 DFD. It receives input
from the surveillance cameras and performs several internal tasks (explained in
detail in Level 1), such as frame capture, object detection, tracking, and
abandonment decision-making. However, in this level, it is treated as a black
box — meaning its internal mechanics are not exposed — and is simply
responsible for transforming input data into meaningful output (alerts, visual
flags, logs).

3. Human Operator (External Entity)

Ujjwal Choudhary (210060101171) Page No. 34


The final output of the system — abandoned object alerts and threat information
— is sent to a human security operator. This operator is responsible for
monitoring the alert interface, verifying results, and potentially initiating
emergency responses or physical investigations. In some cases, the operator
may also interact with the system to acknowledge alerts or review detection logs

Figure: 10.2.1

Data Flow Description

The system follows a clear and linear flow of data:


• Step 1: Video Input
Live video feeds from multiple surveillance cameras are streamed into the

Ujjwal Choudhary (210060101171) Page No. 35


system continuously. These feeds include real-time footage of people and
objects in public areas.

• Step 2: System Processing


The system processes the incoming video to detect objects, track their
movements, and analyze whether they have been left unattended over a
predefined threshold of time. Though not shown in detail at this level, these
internal operations occur within the boundaries of the central process box.

• Step 3: Alert Output


Once the system determines that an object has been abandoned, it generates an
alert containing key metadata: the object type (e.g., suitcase), its last known
location in the frame, and the duration it has remained unattended. This alert is
passed to the human operator interface.

Significance of the Level 0 DFD

The Level 0 diagram is critical in understanding how the system interacts with
external entities and the environment. It:
• Clearly identifies all data inputs and outputs
• Helps stakeholders understand what the system does, even without technical
knowledge
• Forms the foundation for decomposing the system in subsequent levels (e.g.,
Level 1 and Level 2 DFDs)
It also highlights the real-time nature of the data flow, where immediate input from
surveillance cameras results in continuous monitoring and rapid response through
alerts to human operators.

Use in Real-World Applications

In real-world deployments (e.g., at a metro station or airport), the Level 0 DFD helps
system integrators and security teams understand:
• Where the video inputs come from
• What role the detection system plays

Ujjwal Choudhary (210060101171) Page No. 36


• Who receives the outputs and takes action
This clarity is essential for planning infrastructure, determining system
responsibilities, and ensuring a smooth flow of data from monitoring to response.

10.2.2 Level 1 Data Flow Diagram


The Level 1 Data Flow Diagram (DFD) elaborates on the high-level system architecture
by breaking the primary system process into detailed sub-processes. Unlike the Level
0 DFD, which only presents a broad overview, the Level 1 DFD focuses on how the
internal components interact and how data moves between these functional units.
In the case of the Abandoned Object Detection System, the goal is to capture live video
footage, detect objects of interest (such as bags), track them over time, and determine
if they have been left unattended. Each step in the diagram plays a vital role in this
decision-making pipeline, and together they create a robust, intelligent surveillance
system capable of enhancing security operations in public environments.

1. Capture Frame
The data flow begins with the Capture Frame process. Surveillance cameras provide
live video feeds that the system captures frame-by-frame. Each video stream is
processed as a series of static images (frames), which are fed one at a time into the
system for object detection.
This module ensures:
• Real-time frame acquisition from various sources (IP cameras, recorded video)
• Synchronization with frame rate (typically 25–30 FPS)
• Compatibility with downstream modules for seamless processing
This is the foundation of the system. If frame capture fails, none of the downstream
modules can operate correctly.

2. Preprocess Frame
After frames are captured, they enter the Preprocessing Module. This module prepares
the frame for deep learning inference and includes:
• Resizing the frame to match the input size expected by the YOLOv8 model
(e.g., 640×640 pixels)
• Color conversion from BGR (default OpenCV format) to RGB, which is
required by most deep learning models

Ujjwal Choudhary (210060101171) Page No. 37


• Normalization of pixel values if needed
• Frame skipping, which helps reduce the processing burden by analyzing every
2nd or 3rd frame in real-time scenarios
The preprocessing ensures that data is consistent, clean, and suitable for the model,
improving detection accuracy and performance.

3. Object Detection (YOLOv8 Model)


The preprocessed frame is then passed to the Object Detection Module, powered by
YOLOv8. This is one of the most crucial components in the system. YOLOv8 is a state-
of-the-art, single-shot object detector known for its balance of speed and accuracy.
In this step:
• The model scans the image for predefined object classes (e.g., backpack,
handbag, suitcase)
• It identifies objects and draws bounding boxes around them
• Each detection is assigned a confidence score and class label
• Only objects above a certain confidence threshold (e.g., 0.5) are passed on to
the tracking module
This module transforms static pixel data into meaningful, structured detections.

4. Object Tracking
Once the objects have been detected, the Object Tracking process begins. This module
ensures continuity of detection across multiple frames. Each object is assigned a unique
ID, and its movement history is recorded.
Key responsibilities include:
• Monitoring the object's position over time
• Calculating velocity (how fast it’s moving)
• Updating position across each new frame
• Handling occlusions (temporary hiding) and re-identification
This data is essential for temporal analysis. Without tracking, the system cannot
determine how long an object has remained in the scene or if it is stationary.

5. Abandonment Decision & Temporal Analysis


This module evaluates whether an object has been abandoned based on its motion and
time data. It analyzes:

Ujjwal Choudhary (210060101171) Page No. 38


• How long the object has remained within a small region (i.e., stationary)
• Whether the object owner (person) is still nearby (optional advanced logic)
• Whether the movement history indicates inactivity over a threshold (e.g., 90
frames ≈ 3 seconds)
If the object meets all the criteria, it is classified as “Abandoned.” This decision is
timestamped and passed on for alert generation.
This process is critical for reducing false positives, ensuring that only genuinely
unattended objects are flagged.

6. Alert Generation Module


Once an object is confirmed as abandoned, the system generates real-time alerts. This
module enhances situational awareness by visually and audibly notifying the security
team.
It performs the following actions:
• Draws a red bounding box around the abandoned object
• Labels the object with type and time (e.g., “Abandoned Bag – 15s”)
• Plays an alarm sound to alert security staff (optional)
• Logs the detection with metadata: object ID, class, duration, frame number
This module provides immediate, actionable information to human operators for rapid
decision-making.

7. Human Operator Interface (HCI)


The final component of the DFD is the Human-Computer Interface. Here, operators
or security personnel interact with the system. They receive:
• Real-time alerts with visual annotations
• Audio notifications
• Access to detection logs
• Options to acknowledge or dismiss alerts
The HCI bridges the gap between machine intelligence and human judgment. While
the system automates detection, the operator is still responsible for verifying and
acting on high-threat cases.

Ujjwal Choudhary (210060101171) Page No. 39


Figure: 10.2.2

10.3 Entity-Relationship (ER) Model

10.3.1 Introduction
An Entity-Relationship (ER) Model provides a conceptual view of the data structure
and its relationships within a system. It defines how real-world entities such as objects,
frames, alarms, and events relate to one another in the Abandoned Object Detection
System. This model is crucial for understanding the data dependencies and logical flow

Ujjwal Choudhary (210060101171) Page No. 40


that drive object detection, classification, and alert generation in the system.
In this system, entities like Motion Event, Object, and Frame represent core
components of the detection process, while relationships such as Generates, Detects,
and Captured In depict the dynamic interaction between those components.

10.3.2 ER Diagram Overview


The ER diagram contains the following key entities and relationships:
Entities:
1. Motion Event
2. Frame
3. Object
4. Object Class
5. Alarm
Relationships:
• Generates (between Motion Event and Frame)
• Captured In (between Motion Event and Alarm)
• Detects (between Frame and Object)
• Belongs To (between Object and Object Class)
• Named (between Detects and Object Class)
• Triggers (between Frame/Object and Alarm)

These relationships define the flow of information starting from when a motion event
is observed to when an object is flagged as abandoned and an alert is triggered.

10.3.3Explanation of Entities

1. Motion Event
A motion event represents any detected change in the video feed. This could be a
person walking, an object being placed, or any kind of frame-to-frame activity.
Motion is the entry point into the object detection workflow.
• Attributes: Event ID, Timestamp, Location
• Significance: Initiates frame capture and analysis

2. Frame

Ujjwal Choudhary (210060101171) Page No. 41


The frame represents a single image extracted from the video feed at a specific time.
These frames are processed in sequence to detect changes, objects, and their
stationary status.
• Attributes: Frame ID, Time Captured, Camera ID
• Significance: Container for object detection and abandonment analysis

3. Object
An object refers to any item detected within a frame. In this system, the focus is on
objects such as backpacks, handbags, and suitcases, which may pose a security risk if
left unattended.
• Attributes: Object ID, Bounding Box Coordinates, Status (e.g., stationary,
moving)
• Significance: Tracked over time to detect abandonment

4. Object Class
This entity categorizes the object. For instance, YOLOv8 might classify an item as a
“backpack” or “suitcase.” Object classification allows the system to filter out
irrelevant items and focus only on classes prone to abandonment.
• Attributes: Class ID, Class Name (e.g., handbag, backpack, suitcase)
• Significance: Enables class-specific abandonment logic

5. Alarm
The alarm entity records an alert triggered when an object has been confirmed
abandoned. It connects with motion events and frames to indicate when and why the
alert was raised.
• Attributes: Alarm ID, Trigger Time, Type (Visual/Audio), Severity
• Significance: Final actionable output of the system

10.3.4 Explanation of Relationships

1. Generates (Motion Event → Frame)


Every motion event generates a sequence of frames. This relationship captures the
link between detected activity and the video data that needs to be analyzed.
• Type: One-to-Many

Ujjwal Choudhary (210060101171) Page No. 42


• Example: A motion event triggers frame capture for 5 seconds = 150 frames
at 30 FPS

2. Captured In (Motion Event ↔ Alarm)


This relationship shows that alarms are associated with motion events. An alarm
cannot exist without being triggered by a motion event, and one motion event may
result in multiple alarms if multiple objects are abandoned.
• Type: One-to-Many
• Example: A single bag drop triggers one motion event and two abandonment
alerts

3. Detects (Frame ↔ Object)


A frame may detect multiple objects, and each object must appear in at least one
frame. This relationship forms the basis of the detection logic.
• Type: Many-to-Many
• Example: Frame 102 detects 2 objects: a backpack and a suitcase

4. Belongs To (Object ↔ Object Class)


Each object belongs to one object class. This relationship is essential for
categorization and filtering, as only certain classes are relevant for abandonment
detection.
• Type: Many-to-One
• Example: Object ID 12 → Belongs to Class: Backpack

5. Named (Detects ↔ Object Class)


This relationship connects detection events to object classes, enhancing semantic
understanding. It allows frames to recognize specific object types directly.
• Type: Many-to-One
• Example: Frame 200 detects a suitcase → Named as "Luggage"

6. Triggers (Object/Frame → Alarm)


An alarm is triggered when a frame (or the object within it) meets abandonment
criteria. This relationship closes the loop, connecting detection with response.
• Type: Conditional One-to-One or One-to-Many

Ujjwal Choudhary (210060101171) Page No. 43


• Example: Frame 120 triggers an alarm after Object ID 15 is stationary for 3+
seconds

10.3.5 Importance of the ER Model


This ER model serves multiple purposes:
• Logical clarity: It defines how components like frames, objects, and alarms
relate logically.
• Database design: Acts as a blueprint for developing backend databases (e.g.,
SQL schema).
• System expansion: Facilitates modular expansion—e.g., adding person
tracking, camera metadata, or alert logs.
• Data consistency: Ensures all object and alert data are traceable from their
origin (motion events) through to system outputs (alarms).

Figure: 10.3.5

Ujjwal Choudhary (210060101171) Page No. 44


11. IMPLEMENTATION

11.1 Introduction
The implementation phase involves converting the theoretical design and architecture
of the Abandoned Object Detection System into a functional software solution. This
stage translates the system design—consisting of data flows, algorithms, and
architecture—into Python code, object detection models, and real-time surveillance
tools. The core focus of implementation is to ensure that the system operates in real-
time, is accurate, and minimizes false positives.
This section outlines the key implementation steps, code structure, logic flow, and how
various components interact to accomplish the detection of unattended or abandoned
objects in public places.

11.2 Programming Environment


The system is implemented using Python 3.x, a high-level programming language
widely used for machine learning, computer vision, and real-time applications.
Libraries such as OpenCV, NumPy, and Ultralytics YOLOv8 are integrated to build
and deploy the detection pipeline. The system also includes support for optional sound
alerts using Pygame and logging through standard file handling techniques.

11.3 Project Structure and Modules


The implementation is modular, with code divided into multiple Python files for
maintainability and clarity.
1. main.py
• Handles real-time video capture.
• Calls detection, tracking, and alerting functions.
• Displays the final annotated output window.

2. tracker.py
• Contains the TrackableObject class.
• Manages object IDs, movement history, velocity, and stationary analysis.

3. filtertracker.py

Ujjwal Choudhary (210060101171) Page No. 45


• Optimized version of tracker logic for faster processing.
• Applies filtering to remove unstable detections.

4. alarm.py (optional)
• Plays an alarm sound when an object is flagged as abandoned.

11.4 Core Logic and Workflow


1. Frame Capture
Frames are captured from a live video feed or recorded file using OpenCV.
2. Object Detection
YOLOv8 detects objects like backpacks, handbags, and suitcases with high
speed and confidence.
3. Tracking
Each object is assigned a unique ID and tracked across frames. The system
monitors center point changes and movement.
4. Stationary Detection
An object is flagged as "stationary" if it remains within a small area for more
than 90 frames (about 3 seconds at 30 FPS).
5. Abandonment Decision
If the object is no longer associated with a person and remains stationary, it is
marked as abandoned.
6. Alert Generation
The system overlays a red bounding box and label (e.g., “Abandoned Suitcase
(10s)”) on the video feed and optionally plays an alarm.
7. Logging
Detected classes, frame count, timestamps, and object IDs are logged for
future auditing.

11.5 Optimizations
To achieve real-time performance:
• Frame Skipping is used (e.g., process every 3rd frame).
• Lightweight Tracker reduces memory and computation overhead.
• Bounding boxes and labels are rendered using optimized OpenCV functions.

Ujjwal Choudhary (210060101171) Page No. 46


11.6 Output and Interface
The output interface includes:
• Live video feed window
• Real-time bounding boxes and object labels
• Printed logs in the terminal or saved to file
The simplicity of the interface ensures usability even in high-pressure surveillance
environments.

11.1.(A) Technologies Used

Introduction
Modern surveillance systems rely heavily on advanced technology stacks that combine
artificial intelligence, real-time video processing, and automation. This section outlines
and explains the core technologies, frameworks, libraries, and models used in building
the Abandoned Object Detection System.
Each technology plays a unique role in ensuring the system is accurate, scalable, and
real-time ready.

Python 3.x
Python is the backbone of the project. Known for its readability and vast ecosystem,
Python supports fast development and integration of AI, computer vision, and video
processing libraries.
Why Python?
• Rapid development speed
• Extensive library support
• Easy integration with machine learning and deep learning models
• Cross-platform compatibility
Usage in the Project:
• Frame handling and real-time video I/O
• Model loading and inference
• Custom object tracking and alert logic

OpenCV (Open Source Computer Vision Library)


OpenCV is used for real-time image and video analysis.

Ujjwal Choudhary (210060101171) Page No. 47


Features used:
• Video capture (cv2.VideoCapture)
• Frame resizing and color conversion
• Drawing bounding boxes and text
• Displaying annotated frames
• Real-time rendering via cv2.imshow()
OpenCV ensures smooth handling of video feeds and efficient rendering of detection
results.

NumPy
NumPy is a fundamental Python library for numerical computation.
Roles in the Project:
• Matrix and vector operations on image data
• Calculating distance and velocity between object positions
• Handling object history and frame data efficiently
The high-performance operations of NumPy help in real-time movement tracking and
filtering logic.

YOLOv8 (Ultralytics)
YOLOv8 is the core object detection engine. It is a deep learning model trained to
detect objects in a single pass, making it ideal for real-time applications.

Key Advantages:
• Fast inference (real-time capable)
• High accuracy in detecting small and overlapping objects
• Lightweight and easy to integrate with Python

How It Works:
• YOLOv8 divides the image into a grid
• Predicts bounding boxes and class probabilities
• Filters objects based on confidence threshold
YOLOv8’s flexibility allows retraining for custom object classes or fine-tuning
detection sensitivity.

Ujjwal Choudhary (210060101171) Page No. 48


Pygame (Optional Alert System)
Pygame is a multimedia library used to trigger audio alerts.
Use Case:
• Play a sound when an object is confirmed as abandoned
• Provide audio feedback to enhance operator awareness
Though not essential to core detection, it improves usability in security environments.

Threading (Python threading module)


Threading is used to handle background processes like playing an alarm or logging
data without interrupting the detection pipeline.
Why it's used:
• Keeps detection pipeline smooth and uninterrupted
• Avoids UI freezing or lag during alert playback
Multithreading ensures a responsive user experience even during high load
conditions.

Logging and File I/O


Custom logs are generated using Python’s built-in file handling.
Details Logged:
• Frame number
• Object class
• Abandonment timestamp
• Duration of inactivity
This enables performance review, auditing, and debugging.

System Requirements
Hardware:
• CPU: Intel i5/i7 or Ryzen 5+
• RAM: Minimum 8 GB
• GPU (optional): NVIDIA GTX 1650 or better
Software:
• Python 3.x
• OpenCV

Ujjwal Choudhary (210060101171) Page No. 49


• Ultralytics YOLOv8
• NumPy
• Pygame (optional)
• OS: Windows/Linux/macOS

The project combines a modern tech stack involving deep learning, real-time video
processing, and intelligent object tracking. The integration of Python, OpenCV,
YOLOv8, and supporting libraries results in a powerful, efficient, and accurate system
ready for deployment in real-world surveillance scenarios.
Each technology is chosen for its performance, reliability, and ease of integration,
ensuring that the final system is robust, extensible, and responsive under real-time
constraints.

11.2(B) Coding and Modules

Introduction
The Abandoned Object Detection System is implemented using Python, integrating
computer vision and deep learning to detect unattended objects in real-time video
streams. The code is organized into modular components, each with a specific
function such as video capture, object detection, object tracking, and abandonment
decision-making.
This modular architecture makes the system scalable, maintainable, and easy to
upgrade. The key files involved in implementation are main.py, tracker.py, and
optionally filtertracker.py or an alarm.py. These scripts interact with the YOLOv8
object detection model, process video frames using OpenCV, and handle object logic
with NumPy.

Module 1: main.py – Core Application Pipeline


This is the central script that initializes all components, processes video frames,
performs detection and tracking, and displays output.

Major Functions:
• Load YOLOv8 model using Ultralytics
• Open the video stream (live or file)

Ujjwal Choudhary (210060101171) Page No. 50


• Preprocess frames
• Detect objects of interest
• Track their positions and determine if they are abandoned
• Display bounding boxes and generate alerts

Key Code Snippet: Model Initialization

self.model = YOLO(config['model_path']) # Load YOLOv8 model

This loads a pre-trained model like yolov8n.pt to detect classes such as backpack,
suitcase, or handbag.

Core Processing Loop:

while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
processed_frame = detector.process_frame(frame)
cv2.imshow('Abandoned Object Detection', processed_frame)

Each frame is passed through the system to detect and annotate objects. The processed
frame is then rendered with bounding boxes and status text.

Module 2: tracker.py – Object Tracking Logic


This module contains the TrackableObject class, responsible for:
• Tracking object position
• Calculating velocity and movement history
• Determining if the object is stationary

Key Methods:

update(self, bbox, frame_count, class_name)


Updates object coordinates, calculates movement, and keeps track of motion variance.

Ujjwal Choudhary (210060101171) Page No. 51


movement = np.linalg.norm(np.array(current_center) - np.array(prev_center))
self.stationary_frames += 1 if movement < 5 else -2

is_stationary(self, threshold=10)
Determines if an object is stationary based on movement and deviation over time.

if self.stationary_frames > threshold and avg_movement < 8:


return True

This function is central to identifying abandoned objects.

Module 3: filtertracker.py – Optimized Tracker (Optional)


This module is similar to tracker.py but optimized for lower memory and faster
execution.
Enhancements:
• Reduced frame history
• Tighter bounding box matching
• Faster velocity calculations using a smaller deque
Use this in deployments requiring high FPS or when running on limited hardware.

Module 4: Alarm Integration (Optional)


To alert security personnel, an optional audio alarm is triggered when an object is
marked abandoned.

Code Example:

def play_alarm(self):
import pygame
pygame.mixer.init()
pygame.mixer.music.load('alarm.mp3')
pygame.mixer.music.play()

Ujjwal Choudhary (210060101171) Page No. 52


To improve performance:

if self.frame_count % (self.config['frame_skip'] + 1) != 0:
return frame # Skip this frame

Abandonment Check Logic


Objects are considered abandoned if:
• Stationary for more than stationary_threshold_frames
• No updates received for max_frames_missing
• Not seen recently in tracker dictionary

if tracker.is_stationary() and (self.frame_count - tracker.last_seen) >


self.config['stationary_threshold_frames']:
self.abandoned_objects[tid] = ...

Alert Drawing
Bounding boxes and labels for abandoned objects:

cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 0, 255), thickness)


cv2.putText(frame, f"Abandoned {class_name} ({duration}s)", ...)

Red boxes and duration tags help visually identify threats.

Logging and Performance Stats


At the end of processing, key statistics are printed:

print(f"[STATS] Processed {frames} frames at {fps:.2f} FPS")


print("Abandoned objects detected:", len(detector.abandoned_objects))

Optional logging can include:


• Object class
• Frame/time detected
• Duration of inactivity
• Tracker ID

Ujjwal Choudhary (210060101171) Page No. 53


Integration with YOLOv8
The code integrates directly with the Ultralytics YOLOv8 interface:

results = self.model.predict(frame_rgb, imgsz=640)[0]


This allows the system to:
• Detect objects in each frame
• Use model-defined class names
• Work with confidence thresholds and bounding box coordinates

Pseudo Code: Abandoned Object Detection System

Ujjwal Choudhary (210060101171) Page No. 54


Tracker Update Logic

Stationary Object Check

Alert Trigger:

Ujjwal Choudhary (210060101171) Page No. 55


Pseudo-Code: YOLO Algorithm:

Ujjwal Choudhary (210060101171) Page No. 56


12. TESTING AND VALIDATION

12.1 Introduction
Testing and validation are crucial stages in any software development life cycle,
especially for systems that involve real-time decision-making and safety-critical
applications such as abandoned object detection. These phases ensure that the
implemented system behaves as expected under various scenarios, performs efficiently,
and meets the design goals in terms of accuracy, speed, and usability.
The aim of this section is to document how the Abandoned Object Detection System
was tested, what evaluation strategies were used, how the output was validated against
expectations, and how reliable the system is in real-world environments.

12.2 Objectives of Testing


The key objectives of testing this system were:
• To verify that all components (detection, tracking, abandonment logic, alerting)
function correctly.
• To validate that the system can process real-time video feeds without crashes or
delays.
• To ensure that objects are detected accurately and that alerts are generated only
for genuinely abandoned items.
• To measure false positives, false negatives, and overall detection accuracy.
• To evaluate system performance in terms of frame rate (FPS), detection speed,
and resource utilization.

12.3 Testing Environment Setup


Hardware Used:
• CPU: Intel Core i7-1165G7 @ 2.80GHz
• RAM: 16 GB DDR4
• GPU: NVIDIA GTX 1650 (4GB)
• Storage: SSD (512GB)
• OS: Windows 11 / Ubuntu 22.04 (Dual Boot)

Ujjwal Choudhary (210060101171) Page No. 57


Software Stack:
• Python 3.10
• OpenCV 4.7.0
• NumPy
• Ultralytics YOLOv8
• Pygame (for audio alerts)
• Jupyter Notebooks (for experimental testing and visualization)

Test Data:
• Live webcam feed
• Public surveillance datasets (e.g., PETS2006)
• Custom video recordings (with simulated bag placements)

12.4 Testing Methodology


Testing was performed using a combination of:
• Unit Testing: For individual functions like frame skipping, stationary
detection, and tracker updates.
• Integration Testing: For verifying end-to-end functionality across multiple
modules (e.g., detection + tracking + abandonment decision).
• System Testing: For assessing the system as a whole under real-world
conditions.
• Performance Testing: To measure FPS, CPU usage, and memory footprint.
• User Acceptance Testing (UAT): Informal feedback gathered from test users
(security personnel, classmates) to judge usability.

12.5 Functional Testing

Test Case 1: Object Detection Accuracy


Objective: To verify that YOLOv8 correctly detects objects of interest (e.g.,
backpacks, handbags, suitcases).
Method: Run detection on test frames with labeled ground truth.
Result:
• Precision = 85-88%
• Recall = 82-85%

Ujjwal Choudhary (210060101171) Page No. 58


Test Case 2: Tracking Consistency
Objective: Ensure that each object retains its unique ID across frames.
Method: Simulate movement and partial occlusion.

Test Case 3: Stationary Status Detection


Objective: Detect when an object has been stationary for a configured duration.
Method: Place a bag for >90 frames (3 seconds); observe tracker history.
Result: Stationary objects consistently detected in ≤3.5s of inactivity.

Test Case 4: False Positive Filtering


Objective: Avoid triggering alerts for moving or attended objects.
Method: Place objects and simulate user interaction nearby.
Result: No alert raised when interaction continued; alert only when object left behind.

12.6 Performance Testing


System performance was evaluated in terms of:

1. Frame Rate (FPS)


• Achieved 24–28 FPS with YOLOv8n on GPU
• 15–18 FPS on CPU-only mode (optimized frame skipping enabled)
2. Detection Latency
• Average inference time per frame: ~38ms
• End-to-end pipeline latency: ~52ms per frame (real-time ready)
3. Resource Usage
• CPU usage: ~35–55%
• GPU usage: ~40–60%
• RAM usage: 2–3 GB sustained during operation
These metrics demonstrate the system’s readiness for real-time surveillance
applications.

12.7 Validation Strategy


Validation was done by comparing system-generated alerts against expected

Ujjwal Choudhary (210060101171) Page No. 59


outcomes under test conditions. The key performance indicators (KPIs) used:

1. True Positive (TP):


System correctly detects an abandoned object.

2. False Positive (FP):


System incorrectly flags an object that isn’t abandoned.

3. False Negative (FN):


System fails to flag a genuinely abandoned object.

4. True Negative (TN):


System correctly ignores non-abandoned objects.
From these, we calculated:
• Precision = TP / (TP + FP)
• Recall = TP / (TP + FN)
• F1-Score = 2 × (Precision × Recall) / (Precision + Recall)

Validation Results (based on 40 video tests):


• Precision: 85-88%
• Recall: 82-85%
• F1-Score: 0.87

12.8 Edge Case Testing


Edge cases were introduced to test robustness:

Scenario Result
Low-light video Moderate detection performance

Bag placed then picked up No false alert generated


Two bags placed at once Both detected and tracked separately
Bag partially occluded by people Detected with slight delay

Camera shake (simulated instability) No false detections, stabilized input

Ujjwal Choudhary (210060101171) Page No. 60


These tests verified that the system remains reliable in unpredictable real-world
conditions.

12.9 User Feedback and Usability


As part of user acceptance testing, the system was demonstrated to:
• University lab staff
• Peers from the Computer Science department
• A local security team
Feedback highlighted:
• Easy-to-understand interface
• Reliable alerting
• Desire for facial recognition or owner tracking (future enhancement)

Ujjwal Choudhary (210060101171) Page No. 61


13. RESULTS & DISCUSSION

13.1 Introduction
The results and discussion section evaluates the functional, technical, and practical
performance of the implemented Abandoned Object Detection System. The objective
of this stage is to interpret how well the system achieves its goals under different
conditions, based on various test cases, accuracy measurements, and user experience.
Through real-time object detection, tracking, and abandonment classification, the
system demonstrated effectiveness in identifying unattended items across a variety of
environmental scenarios. This section summarizes the outcomes, highlights success
rates, explores challenges, and discusses the system’s real-world usability and
limitations.

13.2 System Output Observations


Upon deploying the system with various test videos, the following outcomes were
consistently observed:
• The system successfully identified backpacks, handbags, and suitcases using
YOLOv8 in real-time.
• Abandoned object alerts were triggered only after the object remained
stationary beyond 90 frames, preventing false positives from temporary stops.
• The red bounding boxes and time labels helped users easily distinguish between
active and abandoned items.
• Visual feedback through OpenCV and optional audio alerts using Pygame
improved user awareness.

While the primary results have demonstrated high accuracy and system efficiency, a
deeper dive into frame-wise performance, response behavior, and alert thresholds
reveals a more nuanced picture of the system’s capabilities. Over a series of 10+ video
trials, with varying durations (ranging from 1 to 5 minutes), the system displayed
consistent performance, but sensitivity varied based on motion complexity and camera
angle.
In several trials, especially under controlled conditions with slow-moving objects and

Ujjwal Choudhary (210060101171) Page No. 62


minimal occlusion, the system achieved near-perfect detection with no false positives.
However, in real-world datasets involving crowds or non-linear object motion, the
system had to rely more heavily on velocity thresholds and historical tracking to avoid
premature abandonment detection.
One interesting observation was that object class impacted detection timing. For
instance, smaller items like handbags were detected slightly slower than larger ones
like suitcases due to visual density and bounding box confidence levels. The system’s
frame-skip parameter also influenced detection delay: higher skips reduced processing
load but slightly increased abandonment recognition time. A frame skip of 2–3 emerged
as the best balance between speed and sensitivity.

13.3 Quantitative Performance Results


System performance was evaluated across varies custom and public test videos
simulating various object abandonment scenarios in different environmental
conditions.
Performance Metrics Summary:

Metric Result
Precision 85-88%
Recall 82-85%
F1-Score 85.49%
FPS (on GPU) 24–28
FPS (on CPU) 15–18
Detection Delay ~2.5 seconds
Table no. 13.3
These results demonstrate that the system achieves high detection accuracy, low
false alarm rates, and efficient real-time responsiveness.

Ujjwal Choudhary (210060101171) Page No. 63


Fig no. 13.4
13.4 Behavior in Real-World Conditions
The system was tested under simulated real-world conditions, such as:
• Crowded and low-traffic environments
• Indoor vs. outdoor lighting
• Partial occlusions (objects hidden behind people or furniture)
• Continuous movement around the abandoned object

Observations:
• The system handled crowd occlusion reasonably well, using motion tracking
to retain object ID.
• In low-light conditions, detection accuracy slightly dropped, which can be
improved by using image enhancement techniques.
• Moving persons interacting briefly with the object did not trigger false alerts,
validating the system’s temporal logic.
• Camera jitter or minor instability was tolerated due to movement smoothing
via the tracking module.

13.5 Real-Time Responsiveness


The ability of the system to function in real-time was a core objective. With

Ujjwal Choudhary (210060101171) Page No. 64


optimizations such as:
• Frame skipping (processing every 2nd or 3rd frame)
• Lightweight YOLOv8 model (yolov8n.pt)
• Tracker history reduction (only storing last 15 frames)
The system achieved consistent real-time detection speeds of over 24 FPS on GPU-
enabled machines, making it deployable on live CCTV streams.

13.6 Comparison with Traditional Systems


Feature Traditional Abandoned Object Detection
Surveillance System
Manual Required Automated
Monitoring
Temporal No Yes
Analysis
Real-Time Limited Achieved
Detection
Alert Mechanism Visual-only or delayed Visual + Audio, Real-Time
False Alarm Weak Strong (tracking + velocity)

This system outperforms conventional systems by automating the entire process and
reducing reliance on human vigilance.

13.7 Discussions and Insights

1. Strengths
• Modular Code Structure: Easy to modify and integrate.
• Accuracy and Precision: High performance in identifying unattended items.
• Real-Time Capability: Operates well on mid-range hardware.
• Robust Object Tracking: Maintains object identity even with partial
occlusions.

2. Challenges
• Sudden camera movement may disrupt tracking temporarily.

Ujjwal Choudhary (210060101171) Page No. 65


• Abandonment threshold needs adjustment for different environments (e.g.,
stations vs. offices).
• Complex scenes with overlapping people and objects increase computational
load and occasionally affect precision.

3. Improvement Opportunities
• Integration of person-object association to identify object owners.
• Adding facial recognition to alert based on both object and individual
behavior.
• Using multi-camera coordination to improve coverage and track object
movement across views.

13.8 User Feedback and Practical Usability


Demonstrations with test users (security guards, university staff) resulted in positive
responses. They appreciated the clear bounding boxes, real-time alerts, and minimal
false positives. Users suggested future features like:
• SMS or email alerts
• Web-based remote viewing
• Time-based log reports of events

Ujjwal Choudhary (210060101171) Page No. 66


14. CONCLUSIONS & FUTURE WORK

14.1 Conclusion
In an era where public safety and surveillance are more critical than ever, the need for
intelligent monitoring systems is undeniable. The project titled "Abandoned Object
Detection System" presents a real-time, automated solution for identifying unattended
items in public spaces, using computer vision and deep learning. Through this work,
we have built a system that combines object detection, tracking, temporal analysis, and
alert generation into a functional and responsive tool.
The project successfully integrates the YOLOv8 object detection model with a custom
tracking and movement history system to determine when an object has remained
stationary beyond a predefined threshold. This ability to link object presence over time
with inactivity allows the system to intelligently detect truly abandoned objects rather
than issuing alerts for every temporarily placed item.

System Achievements
• The system operates in real-time, consistently delivering over 24 FPS on GPU
hardware and 15–18 FPS on CPU.
• It achieves a precision of (85-88)% and an F1-score above 85.49%, indicating
reliable performance with minimal false positives.
• The modular design supports easy integration and extension, including
compatibility with new object classes and future analytics modules.
• The GUI overlays bounding boxes and time-based abandonment labels, helping
operators quickly assess threat levels.
• Audio alerts enhance situational awareness for users who may not be watching
the screen continuously.
These features make the system suitable for deployment in areas such as railway
stations, airports, shopping malls, and other crowded or high-risk environments.

Technical Contributions
The implementation included the following key innovations:
• A lightweight tracker using position history and velocity vectors to determine

Ujjwal Choudhary (210060101171) Page No. 67


object movement.
• Temporal logic to evaluate stationary behavior, reducing false positives from
brief pauses.
• Optimized frame handling and processing pipelines for efficient real-time video
analysis.
• Use of open-source libraries like OpenCV, NumPy, and Ultralytics YOLOv8,
making the system cost-effective and accessible.

Through careful tuning of parameters such as the stationary threshold, frame skipping,
and movement tolerance, the system balances accuracy with performance in real-time
environments.

14.2 Limitations
While the system demonstrates a high level of reliability, certain limitations were
identified during testing:

1. Lighting Conditions: Performance may degrade under poor lighting or infrared


camera setups.
2. Camera Instability: In cases of jittery or moving camera footage, object
tracking can be disrupted temporarily.
3. Ownership Analysis: The system does not currently associate objects with
specific individuals, making it difficult to distinguish between abandoned and
attended bags in proximity.
4. Multi-View Limitation: The system analyzes a single feed at a time and lacks
the capability to correlate objects across multiple camera views.
5.
These limitations are not flaws in design but areas for enhancement, which form the
foundation of future work.

14.3 Contribution to the Field


This project contributes meaningfully to both academic and industrial perspectives
of video analytics. It provides:
• A reference implementation for real-time object abandonment detection using
open-source tools.

Ujjwal Choudhary (210060101171) Page No. 68


• A model for low-cost surveillance automation, ideal for smart cities and
government security systems.
• A practical demonstration of how deep learning inference can run efficiently
even on limited hardware setups like laptops or Jetson boards.
Moreover, this system addresses a key security issue without requiring complex
infrastructure or human monitoring, showing the power of AI in autonomous public
safety systems.

14.4 Future Work


While the current implementation meets its goal of real-time abandoned object
detection, it also lays the foundation for a much broader, intelligent surveillance
framework. One of the most promising areas for future development is the transition
from single-camera static analysis to a multi-camera coordinated detection system. In
practical deployments—such as airports, stadiums, or shopping malls—objects may
move from the field of view of one camera into another. Currently, our system tracks
abandonment within a single video stream. However, incorporating multi-view learning
and synchronization algorithms would allow seamless object tracking across different
perspectives. This could further be enhanced by integrating GPS metadata or indoor
location systems, enabling precise spatial localization of detected threats.
Additionally, as surveillance moves toward smarter cities and AI-driven security
ecosystems, the integration of this system with cloud-based analytics dashboards
becomes a natural evolution. Instead of being limited to local detection and alerting,
the system could log alerts, visualize threat maps, and track activity histories through a
secure, remote web interface. Security officers could monitor multiple locations
simultaneously, compare historical data, and receive predictive warnings based on
learned patterns of abandonment across different times of day or event types. This
vision aligns with modern trends in urban infrastructure digitization, wherein AI
systems are expected not just to alert but to inform strategic planning.
Another compelling direction is context-aware analysis, which moves beyond basic
motion tracking. In current logic, an object’s abandonment is defined primarily by its
stationary nature and the absence of a nearby person. But future iterations could
incorporate environmental understanding using scene segmentation and semantic
labeling. For instance, an object left in a corridor may be considered more suspicious
than one beside a café table. With proper contextual intelligence, the system can adjust

Ujjwal Choudhary (210060101171) Page No. 69


the sensitivity of its alerting mechanism, reducing false alarms and prioritizing high-
risk scenarios.
Equally significant is the opportunity to explore human-object association modeling,
where the system learns to detect not just abandonment, but also ownership. Using
combined person and object tracking data, the system could determine which individual
last interacted with an item. This adds a crucial layer of accountability—especially in
high-security zones. In the future, this could be extended to identity linkage using facial
recognition, where abandoned objects are automatically matched with the known face
of the last associated person. In environments where personnel tracking is essential,
such as airports or government facilities, this feature could dramatically improve both
security intelligence and incident resolution speed.
Another promising avenue lies in the use of edge computing and model quantization.
Presently, real-time object detection is performed using a YOLOv8 model running on
mid- to high-end CPUs or GPUs. To bring this technology into resource-constrained
environments—such as rural bus depots, schools, or metro platforms—future versions
of the system could deploy quantized models (e.g., YOLOv8n-int8) on edge devices
like Jetson Nano, Raspberry Pi 4 with Coral TPU, or even mobile phones. This would
enable scalable and cost-effective surveillance solutions for locations with limited
technical infrastructure but pressing security needs.
Moreover, as public trust and ethical AI use grow in importance, a future-ready system
must also account for data privacy, legal compliance, and transparency. Logging
mechanisms could include encrypted metadata, tamper-proof audit trails, and
configurable data retention policies. Real-time alerts should be accompanied by
operator controls and explanations to avoid alarm fatigue or misinterpretation. By
integrating compliance features aligned with laws such as GDPR, DPDP, or national
CCTV guidelines, the system ensures not just performance but social responsibility and
regulatory adherence.
Finally, as the AI community moves toward self-learning systems, future iterations may
include models that adapt to the behavior of specific environments. For instance, a mall
in a busy city may have frequent temporary object placements, while a military facility
will require stricter thresholds. By incorporating reinforcement learning or continuous
online training, the system could learn from operator feedback—automatically
adjusting its sensitivity, refining its object classifications, and improving over time.
This vision supports long-term deployment in diverse environments with minimal

Ujjwal Choudhary (210060101171) Page No. 70


manual reconfiguration.

Building upon the current version of the system, there are several directions in which
the project can evolve. These improvements can significantly enhance the robustness,
intelligence, and scalability of the system.

1. Ownership Tracking
Currently, the system does not associate detected objects with the people who placed
them. Future versions can integrate person-object interaction tracking, where:
• Each detected person is tracked alongside their belongings.
• If a person leaves the scene and the object remains, it is flagged as abandoned.
This would significantly improve decision-making accuracy, especially in crowded
scenes.

2. Integration with Facial Recognition


By integrating facial recognition modules, the system could:
• Log individuals interacting with abandoned items
• Trace the owner of the object using identification history
• Assist in post-event investigations
This feature would be highly beneficial in security-sensitive environments such as
airports or banks.

3. Multi-Camera Object Re-Identification


Future enhancements could enable the system to:
• Track the same object across multiple camera feeds
• Re-identify the object in different views or angles
• Maintain continuity even if the object moves out of one frame and enters
another
This would require embedding support for object re-identification models (Re-ID) and
synchronization logic.

4. Mobile and Edge Deployment


To improve accessibility and reduce cloud/server dependency, the system can be:
• Deployed on edge devices like NVIDIA Jetson Nano, Raspberry Pi (with Coral

Ujjwal Choudhary (210060101171) Page No. 71


TPU), or mobile phones
• Optimized using model quantization (e.g., YOLOv8n-int8)
This would enable low-cost, low-power surveillance solutions for remote locations.

5. Cloud-Based Dashboard and Logging


A centralized web dashboard could provide:
• Real-time map-based alerts across camera locations
• Event history and playback options
• Role-based access for security teams
This would enhance usability and support large-scale deployment in smart cities.

6. Behavior Analysis and Prediction


With enough data collected, the system could use machine learning or reinforcement
learning to:
• Predict object abandonment based on previous behavior patterns
• Assign threat levels dynamically
• Learn environmental norms and adjust thresholds
Such predictive analytics would allow pre-emptive action and smarter alerting.

7. Integration with Emergency Protocols


The alert system can be integrated with:
• SMS/Email APIs for automated notification to authorities
• IoT-based alarm systems for physical lockdown of zones
• Access control systems to restrict movement in high-risk cases

14.5 Broader Impact and Societal Relevance


The application of this project goes far beyond academic or technical curiosity. In a
time when public safety is paramount, the Abandoned Object Detection System
provides a practical tool for:
• Preventing terrorist threats
• Minimizing public panic
• Automating surveillance operations
• Enabling rapid response to unattended baggage
By reducing reliance on human vigilance, it empowers security personnel and increases

Ujjwal Choudhary (210060101171) Page No. 72


the overall efficiency of public safety infrastructure.

14.5 Ethical and Security Considerations


With any surveillance system, especially one using AI, it is essential to ensure:
• Privacy Compliance: The system should avoid storing unnecessary personal
data or violate individual privacy.
• Bias Mitigation: Object detection and alert logic should be tested across diverse
settings to prevent demographic bias.
• Data Security: If cloud integration is used, strict encryption and access control
policies must be followed.
Our current system does not record faces or personal information, adhering to these
principles.

14.6 Final Remarks


This project stands as a successful proof-of-concept demonstrating the application of
artificial intelligence in public safety. It showcases how deep learning models like
YOLOv8, when paired with real-time video analytics, can result in intelligent and
context-aware systems. The solution is practical, scalable, and ready for integration into
modern surveillance infrastructures.
With further enhancements such as ownership tracking, cross-camera correlation, and
cloud-based dashboards, the system can evolve into a full-fledged commercial or civic
security solution. The project not only achieves its stated objectives but also opens the
door to a new frontier in smart surveillance and proactive threat mitigation.

Ujjwal Choudhary (210060101171) Page No. 73


15. REFERENCES

1. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.


“You Only Look Once: Unified, Real-Time Object Detection.”
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016. Mentioned in Literature Review
Explanation: This foundational paper introduced the YOLO architecture,
which the project builds upon via YOLOv8 for object detection.

2. Ultralytics YOLOv8 Documentation


https://docs.ultralytics.com
Explanation: Official documentation for implementing, training, and
running YOLOv8, which was used in this project to detect objects like
backpacks and suitcases.

3. OpenCV Library Documentation


https://docs.opencv.org
Explanation: Used for real-time video frame processing, drawing bounding
boxes, and rendering the user interface in the system.

4. NumPy Documentation
https://numpy.org/doc/
Explanation: Used to perform matrix operations, calculate object
movement, center points, and motion variance in tracking.

5. PETS Dataset (Performance Evaluation of Tracking and Surveillance)


http://www.cvg.reading.ac.uk/PETS2006/
Explanation: Served as a benchmark dataset to test object tracking and
abandonment scenarios during system validation.

6. Rahul Sharma, et al.

Ujjwal Choudhary (210060101171) Page No. 74


“Abandoned Object Detection Using Background Subtraction and Tracking.”
International Journal of Computer Applications, Vol. 182, No. 35, 2018.
Explanation: Helped understand traditional rule-based methods like
background subtraction for comparison with deep learning-based approaches.

7. S. Kumar, V. Rajan, R. Sharma.


“Real-Time Surveillance for Unattended Bag Detection in Public Areas Using
Deep Learning.”
IEEE International Conference on Smart Technologies, 2020.
Explanation: Inspired the application of deep learning for real-time
unattended object detection.

8. Pygame Library Documentation


https://www.pygame.org/docs/
Explanation: Used in the project to play an alarm sound when an
abandoned object is detected. Enhanced usability and alert functionality.

9. Threading in Python – Official Python Docs


https://docs.python.org/3/library/threading.html
Explanation: Helped in running background threads for playing audio and
handling multiple real-time tasks without freezing the interface.

10. J. Ren, X. Cao, Y. Wei, C. Wang.


“Abandoned Object Detection via Temporal Consistency Modeling.”
IEEE Transactions on Multimedia, Vol. 21, No. 10, 2019.
Explanation: Provided the theoretical foundation for using movement
history and temporal logic to determine object abandonment.

Ujjwal Choudhary (210060101171) Page No. 75

You might also like