0% found this document useful (0 votes)
27 views

Abandoned Object Detection and Classification Using Deep Embedded Vision

Uploaded by

Madhusha S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Abandoned Object Detection and Classification Using Deep Embedded Vision

Uploaded by

Madhusha S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received 30 January 2024, accepted 19 February 2024, date of publication 23 February 2024, date of current version 11 March 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3369233

Abandoned Object Detection and Classification


Using Deep Embedded Vision
ARBAB MUHAMMAD QASIM1 , NAVEED ABBAS 1, AMJID ALI1 ,
AND BANDAR ALI AL-RAMI AL-GHAMDI2
1 Department of Computer Science, Islamia College University Peshawar, Peshawar, Khyber Pakhtunkhwa 25120, Pakistan
2 Faculty of Computer Studies, Arab Open University, Riyadh 11681, Saudi Arabia
Corresponding author: Naveed Abbas ([email protected])
This work was supported by the Faculty of Computer Studies, Arab Open University, Riyadh, Saudi Arabia, under
Grant AOURG-2023-010.

ABSTRACT One indispensable element within security systems deployed at public venues such as airports,
bus stops, train stations, and marketplaces is video surveillance. The evolution of more robust and efficient
automated technological solutions for video surveillance is imperative. In light of the escalating global
threat of terrorist attacks in recent years, any unattended object in public areas is treated as potentially
suspicious. Ensuring the protection of individuals in these public spaces necessitates the implementation of
safety measures. The intricacies of surveillance recordings introduce challenges when it comes to identifying
abandoned or removed objects, owing to factors like occlusion, abrupt lighting changes, and other variables.
This paper proposes a novel two-stage method for identifying and locating stationary objects in public
settings. The first stage uses a sequential model to capture temporal features and detect potential abandoned
objects within the monitored area. When the sequential model detects such an object, it triggers a subsequent
phase. The second stage uses the YOLOv8l model to precisely locate the detected objects. YOLOv8l is
renowned for its ability to accurately pinpoint object locations within the surveillance scene. The proposed
method achieves remarkable accuracy rates of 99.20% and 99.70% on combined PETS 2006 and ABODA
datasets, respectively, effectively localizing the target object. This achievement not only underscores the
model’s precision in accurately pinpointing the object’s position within the given context but also establishes
its superiority over other existing models. By integrating these two stages, our method provides an effective
solution for enhancing the detection of abandoned objects in public spaces, contributing to improved security
and safety measures.

INDEX TERMS Abandoned object localization, stationary object detection, embedded vision, abandoned
object, video-surveillance.

I. INTRODUCTION object detection in videos by approaching it as the recognition


In the various applications of computer vision, video surveil- of objects within each frame, essentially treating each frame
lance is attracting the attention of researchers, and actively as a standalone image [7]. Video surveillance mainly consists
search for detecting and tracking the objects in the videos [1]. of object detection and tracking such as automotive driving,
In real-time applications, intelligent video surveillance sys- and intelligent robotic technology [8]. In the realm of video
tems are drastically developed to automate surveillance [2]. surveillance, there is a focus on dynamic environments to
The smart video surveillance system autonomously identifies track cars and various real-world objects [9]. In computer
specific occurrences like trespassing, lingering, and aban- vision applications, object tracking and object detection in
doned objects [3], [4], [5], [6]. Researchers have explored video surveillance progress hand in hand [10]. Object detec-
tion involves classifying and locating objects or instances of
The associate editor coordinating the review of this manuscript and interest within a suspicious frame, whereas object tracking is
approving it for publication was Zhongyi Guo . the process of recognizing the trajectory across consecutive
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 35539
A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

frames [11]. Conversely, the static detection and segmenta- segment objects in complex scenes but also classify them.
tion of objects in videos remain a challenging and actively A recent study [1] has used ANN for abandoned object detec-
researched area [10]. Object segmentation and static object tion in outdoor environments, which detects hand luggage
analysis involve the identification, tracking, and assessment using a deep learning-based detection method. However, their
of an object’s presence [12]. The most recent advancement method only focuses on the hand luggage object and does not
in video surveillance technology allows it to automatically consider other types of objects that may be abandoned, such
identify abandoned objects in public areas and illegally as backpacks, and boxes.
parked cars in traffic monitoring systems [13]. Detection In this paper, we propose a novel method for abandoned
of an abandoned object in video surveillance is challenging object detection that can handle both suspicious and non-
and essential for maintaining safety [14]. Public safety mea- suspicious scenes. Our method consists scene classification
sures include the automated detection of abandoned items, module (SCM) and an object detection module (ODM) which
as manually processing such a huge amount of data seems will be discussed in more detail in the methodology section.
impossible and time-consuming [15]. An abandoned item is Our main contributions are as follows:
an object that has been left behind by its owner and has not We develop a novel two-stage method for abandoned
been reclaimed within a predetermined period [5]. Mostly object detection that can adapt to different types of scenes
existing abandoned object-detecting algorithms utilize back- and reduce false positives.
ground models for extracting foreground information [16]. Our proposed model achieves an exceptional level of pre-
The background model serves as an effective method for cision, with an impressive accuracy rate of 99.20% for the
extracting foreground information from images. However, classifier and 99.70% for the localizer, setting a new stan-
a notable challenge arises over time, wherein the distinc- dard in abandoned object detection and surpassing existing
tion between the foreground and background becomes less methods.
discernible. In other words, as the processing continues, Our proposed method outperforms all other existing meth-
the initial clarity between the foreground and background ods on two public datasets, ABODA and PETS 2006.
diminishes, as highlighted by the gradual merging of these The paper is structured to provide a clear progression of
elements [17]. This phenomenon creates a hindrance for the research. It starts with a review of related work, offering
algorithms that heavily rely on foreground information to context and identifying research gaps. Following this, the
detect and track target objects [18]. Additionally, these mod- proposed methodology introduces the innovative two-stage
els are susceptible to variations in lighting, which can alter model for abandoned object detection, addressing surveil-
the image’s shape and lead to unstable model outputs. These lance system challenges. The experimental evaluation section
effects can notably elevate the false alarm rate, resulting in is divided into three sub-sections. First, the ‘‘Experimen-
the subpar performance of the surveillance system [19]. One tal Setup’’ outlines the tools and data sources used. Then,
of the core elements of these systems is object detection the ‘‘Experimental Results’’ demonstrate the model’s perfor-
and tracking, which watches the target over time [20]. Fur- mance with thorough analysis and evidence of its superiority
thermore, the use of video surveillance cameras to identify over existing approaches. This structure ensures a logical
suspicious situations has significantly increased in the past flow from the background research to the proposed solution
few years [21]. With recent progress in object detection and and empirical support for the research’s contributions.
facial classification, video surveillance systems incorporating
both object detection and facial recognition have become II. RELATED WORK
more prevalent [22]. Motion detection-based methods are Several methods were used in this area, including object
used for abandoned object detection in surveillance sys- tracking, object identification, and object classification.
tems [23], the method consists of background subtraction Diverse techniques were applied to delineate the background
techniques, followed by optical flow analysis techniques and and foreground regions of stationary objects over a certain
temporal differences techniques [23]. Detecting intentionally duration. Many prior investigations on abandoned object
discarded or abandoned objects within a scene poses a signif- detection have focused on the analysis of foreground informa-
icant challenge for object detection techniques. Unattended tion derived from one or multiple background models. This
object detection, which identifies unattended items in a series analysis is conducted to discern the differentiation between
of video frames, aims to address this issue [24]. stationary and dynamically moving objects. Subsequently,
One of the domains that Artificial Neural Networks the stationary objects are tracked over a specified duration
(ANNs) have been successfully applied to is computer vision, to ascertain whether they exhibit characteristics indicative of
which is the field of study that enables machines to under- abandonment. In their work, Fan and Pankanti [25] achieved
stand and interpret visual information, such as images, and a reduction in the false alarm rate through the utilization of a
videos. All ANNs possess a shared capability to extract and single background model and a finite state machine (FSM).
learn high-level features from visual information, enhancing They accomplished this by modeling objects that exhibit
machines’ ability to comprehend and interpret visual data temporary static behavior, such as a car that briefly halts and
more effectively. Abandoned object detection is a challenging subsequently resumes movement. Their investigation also
task in which not only machines are required to locate and encompassed the concept of ‘‘healed’’ objects, referring to
35540 VOLUME 12, 2024
A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

FIGURE 1. The 3-tier main framework of the proposed method: The first tier depicts the data acquisition and data pre-processing stage.
The second tier involves ConvLSTM classification of objects, while the third tier integrates the YOLOv8l localizer for object detection.

those objects that have already assimilated into the back- (Mixture of Adaptive Gaussian) was introduced to improve
ground. However, it’s noteworthy that their study did not the efficiency of foreground segmentation during the detec-
address the issue of illumination changes. Omrani et al. [26] tion phase, which includes noise reduction and foreground
introduced a system using stereovision for detecting and segmentation. Din et al. [29] have introduced a framework
tracking objects in maritime environments. An autonomous designed for the detection of abandoned luggage within pub-
surface vehicle (ASV) with a stereo camera was used to test lic areas. The framework begins by utilizing the initial frames
the system. In the first stage, the ASV approached stationary to model the background scene. To detect and track moving
objects to identify both static and dynamic things. After that, objects, including both the luggage owner and the luggage
the ASV tracked a target boat using RTK-GPS to determine itself, the framework employs a model. Significantly, this
its absolute and relative positions. To differentiate between method sustains its effectiveness in the face of challenges
moving and static objects, image processing and stereo vision like affine distortion, noise, and occlusion. Moreover, for
techniques were applied. establishing the history of luggage, the model under con-
Narwal and Mishra [27] presented a system designed sideration records the positional history of mobile objects
specifically for real-time unattended baggage detection, using and utilizes frame differencing as a fundamental technique.
frame captures from video surveillance cameras. The sys- Hassan et al. [30] presented a real-time system to detect and
tem encompasses several phases. It initiates with background track moving objects that become stationary in the restricted
subtraction, followed by subsequent steps that leverage the zone a pixel classification method based on a segmenta-
background-subtracted frames for recognizing static regions tion history image is used to identify stationary objects.
in the foreground, identifying object types, and validating Park et al. [19] introduced an algorithm specifically crafted
thresholds. Based on the outcomes of these earlier steps, if an to reliably detect abandoned objects, even when faced with
object is determined to be unattended, the system triggers an changes in illumination. This algorithm demonstrates the
alarm to alert the relevant authorities. Mahalingam and Sub- ability to quickly adjust to a range of illumination variations.
ramoniam [28] presented an effective method for tracking and It includes a presence authentication mechanism relying
detecting objects in videos that are divided into three separate on the largest contour, allowing accurate tracking of target
stages: tracking, evaluating, and detecting. The MoAG model objects and the identification of abandonment, regardless of

VOLUME 12, 2024 35541


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

foreground information presence and the impact of illumina- confusion matrix. Table 1 provides a concise summary of
tion variations. Park et al. [31] presented a novel algorithm key literature, offering insights into the main findings and
that accurately differentiates between stolen items, ghost methodologies employed by various studies in the field.
regions, and abandoned objects by fusing conventional image TABLE 1. Key findings and methodologies from relevant studies in the
processing methods with artificial intelligence technologies. field.
This method uses two main strategies: segmenting objects
using CNN-featured mask regions (Mask R-CNN) to pro-
vide comprehensive object mask information and using a
dual background model to detect possible stationary objects.
Lwin and Tun [32] proposed a method using YOLOv4
to identify abandoned objects in video surveillance. This
study developed a framework for detecting abandoned items
by expanding the capabilities of YOLOv4. To support this
research, a self-assembled dataset was used for training. The
neural network was trained to predict six specific objects,
including people, backpacks, handbags, books, hats, and
backpacks. Wahyono et al. [33] introduced a dual Gaussian
mixture model-based cumulative dual foreground difference
for stationary object detection. An SVM-based classifier
is then integrated to verify the region candidates whether
they are vehicles, humans, or other objects. Palivela and
Ramachandran [34] introduced a system for abandoned
object detection, employing a hash-based approach and incor-
porating an SVM classifier. Their main area of interest is
the identification of abandoned or unattended objects in
indoor and outdoor environments. This is accomplished by
taking video frames, extracting their features, and using
machine learning algorithms to analyze them. Using hash
value descriptors from the training phase, the performance of
a binary SVM classifier is assessed, and a confusion matrix is
used to compare the results with those from other classifiers.
Samaila et al. [35] have introduced a real-time vision-based
abandoned object detection system. This system utilizes
the Gaussian Mixture Model (GMM) and is capable of
detecting objects as small as a teacup. It outperforms the
Self-organizing Background Subtraction (SOBS) technique
in handling background obstacles. Additionally, this system
can classify abandoned items into two categories: non-human
and human, a capability not found in other existing tech-
niques. Smitha and Palanisamy [36] proposed a mathematical
technique called running average for video sequences of traf-
fic taken from a static camera the background image is a static
image that represents the scene without any moving objects.
Chen et al. [37] have developed a systematic model pruning
approach that evaluates the balance between accuracy and
efficiency across diverse structured model pruning techniques III. PROPOSED METHOD
and datasets, including CIFAR-10 and ImageNet. They used The proposed method comprises three components: (1) an
the VGG-16 model on Tensor Processing Units (TPUs) enhanced pre-processing step that enhances data quality
as a representative example. Additionally, they have intro- through the use of advanced and refined techniques; (2) a
duced a structured model pruning package for TensorFlow2, ConvLSTM layer that captures both temporal and spatial
allowing for in-place modification of models to evaluate information from the frames; and (3) the state-of-the-art
their real-world performance. In their work, Palivela and YOLOv8, which identifies stationary abandoned objects
Ramachandran [34] introduced a hash-based approach for within the frames. The complete workflow of the proposed
abandoned object detection using an SVM classifier. They method is depicted in Figure (1). The proposed method for
evaluated its performance in identifying and classifying unat- abandoned object detection can handle both suspicious and
tended objects, comparing it to other classifiers through a non-suspicious scenes. Firstly, SCM is based on a sequential
35542 VOLUME 12, 2024
A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

model that can analyze the input image and classify it as a annotated with bounding boxes and ownership information
suspicious or non-suspicious scene. Secondly, ODM is based of the abandoned object, indicating whether they belong to a
on the YOLOv8 architecture, which can locate and classify person who is still present in the scene or not.
various types of objects in the input image. If the scene is
classified as suspicious, the ODM will detect the objects that B. DATA PREPROCESSING
may be abandoned, such as luggage, backpacks, and boxes. Pre-processing is a vital and indispensable step for achieving
Conversely, if the scene is classified as non-suspicious, the better performance in modeling. This process encompasses
object localization will not be attempted by ODM. The ODM all the steps that enhance the quality of data, enabling the
can detect various objects that may include abandoned items, model to extract and interpret the data effectively. Conse-
such as luggage, backpacks, boxes, etc. Our method can not quently, the foundation of superior model performance is the
only detect abandoned objects in real time but also distinguish pre-processing steps applied to the data before model feeding.
between different types of objects that may have different
1) FRAME EXTRACTION
levels of risk or importance.
Our data preprocessing steps involve frame extraction from
and convert it into a batch of frames for models. We extract
15 frames per second from the video and discard irrele-
vant frames that have no significance for the model. This
reduces the computational burden and likelihood of overfit-
ting. We specifically retain the frames with totally distinct
features relevant to the model, discarding irrelevant frames.

2) FRAME NORMALIZATION
The pre-processing steps involve the normalization of the
characteristics of individual frames, this ensures consistency
and comparability. Normalization frames can enhance the
performance and reliability of the model. The formula of
normalization is represented in equation (1):
FIGURE 2. ConvLSTM structure.
I (x, y) −I min
I normalized (x, y) = (1)
I max − I min
A. DATASETS DESCRIPTION
The dataset is a critical element for assessing the performance Where I (x, y) represents the intensity (brightness) value of
of any system. Evaluating the proposed algorithm using a the pixel. Imin is the minimum intensity value found in the
well-established dataset presents a notable challenge in the frame. Imax is the maximum intensity value found in the
domain of visual surveillance systems. In recent years, the frame.
availability of standard datasets for abandoned object detec-
3) FRAME CROPPING
tion has been limited.
In the pre-processing steps, we cropped the frame to size
1) PETS 2006 DATASET 512,512 size before feeding it to the model. This allowed
PETS 2006 is a publically available dataset for abandoned us to extract specific regions of interest from a larger frame,
object detection, which we have used for our experiments. which is beneficial for various reasons. Firstly, cropping helps
The dataset consists of seven videos with 25 frames per the model to focus on key details, enabling it to concentrate
Second (FPS) and a standard resolution of 768 × 576 for its attention on the most relevant information for the task at
evaluating the performance of object detection and tracking hand. Secondly, it reduces the data size, making it more man-
systems. The videos show different scenarios of left-luggage ageable for subsequent processing and lower computational
events in an outdoor parking lot, captured from multiple cam- requirements. The mathematical form of the cropping can be
eras and different angles. The dataset videos are annotated represented as in the equation (2).
with bounding boxes and event types for abandoned object C = I [x1 : x2 , y1 : y2 ] (2)
detection, such as left luggage, removed objects, or vehicle
movement. where I [x1 :x2 , y1 : y2 ] represents a subarray or sub region of
the original image or frame I. x1 and 1y1 specify the starting
2) ABODA DATASET coordinates (top-left corner) of the crop region. x2 and y2
ABODA is a public dataset for abandoned object detection. specify the ending coordinates (bottom-right corner) of the
The dataset consists of 11 video sequences captured from crop region.
different CCTV footage, showing various rea-application
scenarios that are challenging for abandoned object detection, 4) FRAME AUGMENTATION
such as crowded scenes, lighting changes, night-time detec- In our preprocessing data, we augmented frames by apply-
tion, and indoor and outdoor environments. The videos are ing various methods such as rotation, scaling, translation,

VOLUME 12, 2024 35543


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

FIGURE 3. Propose ConvLSTM model architecture featuring five ConvLSTM layers, two dense layers with 1000 units each, and final classifier layer with
one unit for classifying suspicious and non-suspicious objects.

flipping, and brightness adjustment. The process helps the output, and the hidden state of the cell is denoted as Ht .
model become resilient to variation in real-world scenarios Furthermore, it , ft , and ot are indicative of the input gate, and
such as illumination changes in viewpoint and object orienta- the sigmoidal function is represented by σ . The convolution
tions. Moreover, frame augmentation can help in overcoming kernels are denoted as W, as previously mentioned in the
overfitting by expanding the volumetric dataset, thereby reference [20].
improving the model’s ability to generalize to unseen data.
The general formula of augmentation is given in equation (3). D. PROPOSED SEQUENTIAL CLASSIFIER MODEL
A (I ) = T (I , p1 , p2 , . . . , pn ) (3) In the proposed method we have harnessed a sequential
where A (I) represents the augmented version of the input data model which extracts spatial-temporal features from the
I.T denotes a data transformation function. p1 , p2 . . . , pn are video. Recognizing the effectiveness of ConvLSTM, a widely
parameters that control the specific augmentation techniques acknowledged method for extracting features from sequen-
applied, such as rotation angles, scaling factors, translation tial data, we integrate it into our framework to distinguish
distances, brightness adjustments, etc. between static abandoned and non-abandoned objects. The
architecture of our ConvLSTM model comprises five layers,
C. CONVLSTM ARCHITECTURE with the initial layer serving as the input layer, accom-
ConvLSTM is a convolutional neural network with an LSTM modating sizes 15 × 512×512 × 3. Here ‘15’ Signifies
network. This is similar to LSTM with the additional func- the sequence duration. While the subsequent dimensions
tionality of convolutional operation performed on the tensor, correspond to the frame’s spatial dimensions and color chan-
the structure of ConvLSTM is illustrated in Figure (2). nels. The first layer incorporates 512, with a stride of 1,
The network is suitable for sequential problems like videos and padding set to ‘same’. For the subsequent layers, the
where time is dependent. Spatial feature extraction in the overall structure remains consistent, except for the num-
model is accomplished through the utilization of convolu- ber of filters. Specifically, the second employs 256 filters,
tional layers. These layers apply filters to individual frames, the Third layer uses 128, the fourth layer employs 64, and
allowing the capture of crucial spatial information such as the final layer involves 32 filters. After the third, fourth,
object shapes and textures. On the other hand, the Con- and last layers, we employed the dropout layers’ rate of
vLSTM component handles temporal feature extraction by 0.5 values. Following this, a flattened layer and a flatten-
maintaining hidden states, enabling it to grasp the evolution ing layer are applied to transform the 2D data into a 1D
of video frames as they progress over time. This capability format, rendering it suitable for prediction. Subsequently,
enables the model to understand motion, object interactions, two dense layers with 1000 neurons at each, are intro-
and alterations in the video sequence. The key operation of duced. Finally, the classification layer classifies static objects
ConvLSTM is given by the equation (4). Where ∗ shows the within the frames Table 2 presents the hyper-parameters
convolution operator and ◦ shows the Hadamard metric. employed in the configuration of the proposed classifier
it = σ Wxi ∗ Xt + Whi ∗ Ht−1 + Wci◦ Ct−1 + bi ,
  

 
 model. Throughout the training, the dataset is divided into
 ft = σ Wxi ∗ Xt + Whf ∗ Ht−1 + Wcf Ct−1 + bf ,  training, validation, and test sets with a distribution ratio

 ◦


 
of 70:20:10. Upon generating predictions, if the model’s
 bf , 

 Ct = ft Ct−1 + it tanh Wxc ∗ Xt + W hc ∗ Ht−1 +
confidence surpasses a predefined threshold, the output is
σ ,
◦C

O = W ∗ X + W ∗ H + W + b

t xo t ho t t−1 o
 

 co 
 directed to a subsequent static localization model. This
Ht = Ot tanh (Ct )
 
model effectively pinpoints the location of static objects
(4) within the frames enhancing the precision of the proposed
In this particular context, the variables are defined as follows: method. Figure (3) illustrates the proposed classifier model
Xt represents the input to the cell, Ct corresponds to the cell’s architecture.

35544 VOLUME 12, 2024


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

TABLE 2. Hyper-parameters of the proposed classifier model.

E. PROPOSED YOLOv8 LOCALIZER MODEL


YOLOv8 is the most recent version of the YOLO object
detection model. While maintaining the same foundational FIGURE 4. Transfer learning mechanism.

architecture as its predecessors, YOLOv8 incorporates sev-


eral notable enhancements over prior versions. These include achieving either a higher precision or faster inference time.
the adoption of an advanced neural network structure that In our proposed method, YOLOv8 utilizes CSPDarknet53
leverages both Feature Pyramid Network (FPN) and Path as its backbone. CSPDarknet53 is a deep neural network
Aggregation Network (PAN), alongside the introduction of an that excels in extracting features at various resolutions or
improved labeling tool designed to streamline the annotation scales by gradually reducing the size of the input image.
process. The labeling tool encompasses a range of valuable The feature maps generated at different resolutions hold
functionalities, including automated labeling, convenient essential information about objects present in the image at
labeling shortcuts, and the ability to customize hotkeys. various scales, offering varying levels of detail and abstrac-
The synergy of these capabilities significantly simplifies the tion. Our approach harnesses Yolov8’s capability to leverage
process of annotating images for model training purposes. these diverse feature maps at different features map at dif-
Additionally, it’s worth noting that YOLOv8 has several ferent scales to gain insight into the object morphology and
versions, such as YOLOv8-S, YOLOv8-M, YOLOv8-L, and texture of objects, thereby enhancing precision in the detec-
YOLOv8-XL. The Feature Pyramid Network (FPN) oper- tion of static abandoned detection objects. Yolov8 backbone
ates by progressively decreasing the spatial resolution of consists of four sections, each commencing with a single
the input image while concurrently augmenting the number convolution followed by a c2f module [40]. Notably, our
of feature channels. This process leads to the generation methodology leverages the C2F module introduced by CSP-
of feature maps with the capacity to identify objects at Darknet53. The module incorporates splits where one branch
varying scales and resolutions. In contrast, the Path Aggre- traverses through a bottleneck module characterized by Two
gation Network (PAN) architecture integrates features from 3 × 3 convolutions with residual connections. The output of
diverse network levels using skip connections. Through this the bottleneck module undergoes further splitting, occurring
approach, the network becomes more adept at capturing fea- N times, with N corresponding to the Yolov8 model size.
tures across various scales and resolutions. This capability These splits are subsequently concatenated and channeled
is of utmost importance for achieving precise object detec- through a final convolution layer, which serves as the layer
tion, particularly when dealing with objects of diverse sizes responsible for activating the network. This integrated archi-
and shapes [18]. tecture enhances our approach’s capability to detect static
abandoned objects effectively. The activation map associated
1) YOLOv8 LOCALIZER MODEL SPECIFICATIONS with the shallowest c2f module reveals prominent activations
In the concluding phase of our proposed methodology, corresponding to the object of boards. This module special-
it detects the static abandoned with higher precision. izes in detecting small objects and identifying their respective
We deliberately selected Yolov8 as the proposed model classes. Moving to the second activation plays a crucial role in
for static abandoned object detection, under the premise determining the presence of static abandoned objects. As we
that exhibits the highest probability of detecting stationary delved deeper into the model, the third activation started
objects. Yolov8 is the latest state-of-the-art method known capturing intricate textures associated with static abandoned
for its higher Mean Average precision (mAPs) and lower objects. Ultimately, the model’s final C2F module activates to
inference speed. The model has been meticulously trained capture exceptionally fine-grained details and outlines within
on one of the most well known datasets, the COCO dataset. the image under consideration.
Our research also entailed a comprehensive evaluation of
various state-of-the-art object detection models including 2) LOCALIZER OPTIMIZATION USING TRANSFER LEARNING
Faster-RCNN [38], Fast-RCNN, and SSD [39]. Among these Yolov8 originally pre-trained on the COCO dataset, encom-
all yolov8 series outperformed its counterparts, consistently passes a wide array of object classes. However, our objective

VOLUME 12, 2024 35545


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

pertains specifically to detecting static objects such as bags. IV. EXPERIMENTAL RESULTS AND DISCUSSION
To optimize the model’s performance for this specialized A. EXPERIMENTAL SETUP
task, it must be tailored accordingly. Training for the specific In this section, we present a comprehensive overview of the
from scratch would be prohibitively expensive and need a experimental setup of the proposed methodology. The system
more specific dataset. Therefore, we adapted the Transfer implementation was executed using Python 3.10.4, PyTorch.
leaning [41] technique to leverage the knowledge acquired 1.12.1, and CUDA 11.7, with both training and inference
during pre-training on COCO. This enables us to adapt the processes conducted on a powerful 12GB NVIDIA GeForce
model’s weights and features to better suit the detection of RTX 3090 GPU. For our primary model, ConvLSTM,
bags while utilizing a limited dataset. The transfer learning we conducted training over a span of 100 epochs. The weights
approach is visualized in Figure (4). of the model were updated during backward propagation
using the Adam optimizer. We configured a batch size of 64 to
F. EVALUATION METRICS balance training efficiency and GPU memory usage effec-
The performance of the proposed methodology is evaluated tively. Our proposed model tailored specifically for the task of
using specific performance measures including accuracy, pre- abandoned detection comprises two distinct categories: aban-
cision, and recall. doned and non-abandoned objects. To optimize for the binary
classification, we employed binary Cross-Entropy losses dur-
1) ACCURACY ing training. It is important to note that yolov8 served as the
Accuracy is a metric commonly employed to provide an over- foundation for our object detection task. The model under-
all assessment of a model’s performance across all classes. went 100 epochs for training. We fine-tuned it using training
This metric is particularly valuable when all classes hold learning for our specific use case. This transfer-learning
equal significance, quantifying the correctness of predictions approach allowed us to adapt Yolov8’s pre-trained weights
by dividing the number of accurate predictions by the total to the intricacies of abandoned object localization within the
number of predictions, equation (5) demonstrates accuracy. limited dataset. In terms of data preparation, we meticulously
annotated the proposed dataset for the model to localize
TP + TN the static object. We annotated 500 images for the model
Accuracy = (5)
TP + TN + FP + FN from the different proposed datasets. The annotation pro-
cess involved defining bounding boxes around the objects of
2) PRECISION
interest, providing essential training data for our models. Our
Precision is determined by dividing the number of Positive
dataset, sourced from diverse environments and scenarios,
samples correctly classified by the total count of samples
features variations in lighting conditions, backgrounds, and
classified as Positive, whether they were classified correctly
object poses, reflecting the real-world challenges of aban-
or not. It serves as an indicator of the model’s accuracy in
doned object detection. Ethical considerations guided our
classifying samples as positive, specifically measuring its
data collection, and privacy and bias mitigation measures
ability to correctly identify positive instances. Precision is
were considered.
presented by Equation (6).
TP B. RESULTS AND DISCUSSION
Precision = (6) In this section, we embark on an empirical evaluation pro-
TP + FP
posed model. This evaluation is structured into three main
3) RECALL components. Firstly, in the Architectural Variations Analysis
Recall is computed by dividing the number of correctly clas- section, we delve into the inner model comparison, where
sified Positive samples by the total count of Positive samples. we assess the model’s performance using various deep learn-
It quantifies the model’s capability to identify Positive sam- ing architectures for temporal feature extraction. Secondly,
ples accurately, essentially gauging its sensitivity in detecting is the object detection model evaluation where we conduct
such instances. A higher recall value signifies a greater ability a detailed analysis of the object detection model integrated
to detect Positive samples within the dataset. Recall is illus- into our proposed framework, aiming to identify strengths
trated in equation (7). and potential areas for improvement Finally, the State-of-
the-art model in which, we compare the effectiveness of
TP
Recall = (7) our proposed model with other state-of-the-art models in the
TP + FN field. This comparative analysis offers valuable insights into
where True Positive (TP): The model correctly identifies the model’s performance and its standing in boarder research
something as positive, and it is indeed positive. True Neg- landscape.
ative (TN): The model correctly recognizes something as
negative, and it is indeed negative. False Positive (FP): The 1) ARCHITECTURAL VARIATION ANALYSIS OF CLASSIFIER
model mistakenly identifies something as positive when it’s To validate the effectiveness of our proposed model in
negative. False Negative (FN): The model erroneously iden- comparison to various sequential models, we conducted a
tifies something as negative when it’s positive. series of empirical experiments. Upon acquiring visual data,

35546 VOLUME 12, 2024


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

we directed this data sequence to sequential models designed


to extract temporal information, a crucial element for precise
predictions. This approach capitalizes on the understanding
that the quality of features extracted from the frames signifi-
cantly influences prediction accuracy. During the experiment,
our goal was to identify optimal model terms for extract-
ing these enhanced features, which are pivotal for precise
temporal predictions. To achieve this, we employed a range
of diverse deep learning models for features including the
Gated Recurrent Unit (GRU) model, Recurrent Neural Net-
work (RNN), Vanilla Long-Short Term Memory (LSTM)
model, and our proposed model. These experiments helped
to discern the model that outperforms others in terms of
enhancing features critical for precise temporal predictions.
FIGURE 5. Proposed classifier model confusion matrix.
We chose the best sequential model based on its superior
precision, with the parameter determining our selection. The
outcome of the different models along with the proposed is the model performance fluctuated, and the model showed
illustrated in Table 3. smooth performance. Consequently, in the end, the model
could not perform better. Resultantly, the model achieved the
TABLE 3. Sequential classifier models performance. training and validation accuracy of 52% and 41%. Finally,
we trained the ConvLSTM with 15 sequences in the data.
The model showed significant performance throughout the
training. The model generalization was remarkably improved
as we kept the training. The model outperformed all the
remaining models. Empirically we found, the model achieved
the training and validation accuracy of 99.20% and 99.50%,
respectively, with an impressive F1-score of 99.05%. Based
on these evaluated metrics led to the selection of our model
for sequential feature extraction.
The training phase is divided into two segments: train- Figure (4) shows the training and validation accuracy and
ing phase for 50 and 100 epochs respectively. In the first, loss of the proposed model throughout the training. The
we trained these models for 50 epochs. We compared five x-axis of the graphs shows the number of epochs while
sequential models for sequence learning namely (GRU, RNN, the y-axis shows the performance of the model. Figure 5(a)
LSTM, Bi-Directional LSTM, and ConvLSTM). The vanilla indicates at the beginning of training the model started with
GRU model suffered from overfitting, the RNN from van- higher training and validation accuracy. Throughout the train-
ishing gradients, and the vanilla LSTM from poor feature ing the model showed better generalization however, at the
extraction. The ConvLSTM outperformed the other mod- epoch from 20 to 30 model showed some fluctuation but
els, achieving the highest training and validation accuracy onward there is significant improvement. The graphs showed
of 80% in the first training phase. In the second phase, the model was capturing the relation very smoothly. The
we repeated experiments to train these models for 100 epochs. fluctuation portion of the model showed the adjustment for
Firstly, the Vanilla GRU model underwent training, during learning unseen data, in addition, further training could cause
training we kept the sequence of 8 in the data to cap- overfitting. Resultantly, the proposed model achieved the
ture the relation better. The model performance didn’t show highest accuracy among other different sequential models.
any significant improvement, rather the model was over- On the other hand, Figure 5(b) shows the training and valida-
fitting on the data. The result was slightly improved but tion loss of the proposed model. At the beginning of training,
not promising. The model achieved the training and valida- the drastic decrease in the model showed better general-
tion accuracy of 55% and 26%. Secondly, in the training ization, throughout the training both losses were smoothly
phase of RNN, the sequence for the model was 8 to avoid decreasing. The final portion of training showed some over-
the vanishing gradient problem. During training, we evalu- fitting therefore, we stopped the model on 100 epochs.
ated the model performance was poor on both training and The training and validation loss of the model was 0.01%
validation data. The model showed a higher under fitting and 0.02% respectively. Figure (5) shows the confusion
problem. The longer training could not improve the model matrix of the proposed model, this shows the miss prediction
performance rather than a tinny point improvement. Empiri- value and the true value for each class. It can be seen from the
cally, the training and validation accuracy of the model was figure, that the abandoned class has lower accuracy compared
40% and 30%. Thirdly, the Vanilla LSTM model underwent to the non-abandoned class, resulting in the model achieving
training with the sequence of 12. Throughout the training, an overall accuracy of 99.20%.

VOLUME 12, 2024 35547


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

FIGURE 7. Proposed Localizer model training and validation loss graphs.


(a) Training box loss. (b) Training classification loss. (c) Training
Distribution focal loss. (d) Validation box loss. (e) Validation classification
loss. (f) Validation distribution focal loss.

FIGURE 6. Proposed Classifier model learning graphs. (a) Accuracy of the


model on the training and validation sets. (b) Loss of the model on
training and validation sets.

2) ARCHITECTURAL VARIATION ANALYSIS OF LOCALIZER


In this section, we present our research in three parts. Firstly,
we discuss the selection of a robust YOLOv8 variant for
abandoned object detection. Secondly, we evaluate our pro-
posed localizer model against various YOLOv8 variants,
including YOLOv8-n, YOLOv8-m, YOLOv8-l, YOLOv8-x,
and YOLOv8-s. Our experiments involve test images of
abandoned objects, and the results are summarized in Table 4,
covering precision, recall, F1-score, mAP50, and mAP50-95
metrics. Notably, our proposed YOLOv8l-seg stands out FIGURE 8. Other evaluation graphs of the proposed localizer model.
with the highest precision 99.7%, and recall of 99.5% in (a) F1-score confidence curve. (b) Precision confidence curve.
(c) Precision recall curve. (d) Recall confidence curve.
abandoned object detection. The evaluation highlights the
influence of model size and dataset characteristics on perfor-
mance, with denser models showing fewer promising results. effectiveness across a spectrum of scenarios, including pub-
Specifically, YOLOv8n-seg exhibits the lowest precision lic transportation hubs, commercial centers, urban streets,
of 92.4% and recall of 89.1%, YOLOv8x-seg with slightly public events, smart city infrastructure, residential areas,
lower precision of 95.6% and recall of 93.2, YOLOv8m- critical infrastructure sites, and outdoor parks, showcas-
seg demonstrates precision 98.2% and recall 97.7%, and ing its robust applicability in diverse real-world environ-
YOLO8s-seg secures a lower precision score of 94.1% and ments, as depicted in Figure (9). This approach significantly
recall of 96.2%. enhances precision in localization, a crucial aspect for sub-
The proposed approach for identifying and catego- sequent classification tasks. Our precision-recall confidence
rizing stationary objects demonstrates its versatility and curve achieves an impressive 99.0% mAP for all classes,

35548 VOLUME 12, 2024


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

TABLE 4. Yolov8 models performance.

TABLE 5. Computational complexity analysis of classifier models.

FIGURE 9. Suspicious object localization results on ABODA dataset using


YOLOv8l localizer. TABLE 6. Computational complexity analysis of localizer models.

illustrated in Figure 8 (c). We conducted a comparative anal-


ysis with state-of-the-art models and refine our method for
optimal abandoned object detection performance.
The proposed model underwent a comprehensive evalu-
ation during training and validation, assessing key metrics
as shown in Figure (6). Notably, the training box loss
decreased to 0.06, indicating high confidence in predictions.
Validation results, depicted in Figure 7 (d, e, f), revealed
significant reductions in box loss (0.5), class loss (0.01),
and df1 loss (0.10), showcasing the model’s outstanding
performance.
Figure (8) presents evaluations using precision-recall ing several parameters, memory usage, training time, and
curves, precision-confidence curves, recall-confidence inference speed. Table 5 presents a comprehensive break-
curves, and f1-confidence curves for abandoned object detec- down of the computational complexity analysis for classifier
tion. The precision-recall curve consistently yields high models. Inference time for the classifiers was determined
values of 99.0%, indicating robust performance in seg- based on a sequence length of 10. Notably, our proposed
menting abandoned objects. The precision-confidence curve ConvLSTM model exhibits a lightweight architecture, result-
affirms the model’s accurate identification, while the recall- ing in faster inference time compared to other models in
confidence curve demonstrates the correct identification of evaluation.
all positive instances. The F1-confidence curve shows a Table 6 presents a complexity analysis of various YOLOv8
balanced trade-off between recall and precision scores, with models. Our proposed YOLOv8l model stands out with
an F1 score of 1.00%, emphasizing the model’s superior 102.2 million parameters, consuming 408.8 MB of memory
performance in accurately segmenting various abandoned requiring 30 minutes for training, and achieving an inference
objects. speed of 29 milliseconds.

3) COMPUTATIONAL COMPLEXITY ANALYSIS 4) COMPARATIVE ANALYSIS OF THE PROPOSED METHOD


This section provides a detailed analysis of the computa- WITH STATE-OF-THE-ART MODELS
tional complexity of our proposed method for abandoned This section provides a comprehensive breakdown of the
object detection. We have considered various factors, includ- performance analysis and a comparative assessment of both

VOLUME 12, 2024 35549


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

TABLE 7. Comparison of the proposed localizer with sota. garnered substantial attention in recent times, as it plays a
critical role in enhancing security in both public and private
domains. The global concerns of security and terrorism have
escalated to unprecedented levels over the past years, with ter-
rorist attacks claiming innocent lives, often striking crowded
locations such as markets, transportation hubs, and airports.
To address these pressing security issues effectively, the
deployment of automated surveillance technologies in public
spaces has become increasingly imperative. In conclusion,
our proposed model incorporates a sequential analysis, oper-
ating on sequences of 15 frames for initial object detection.
This sequential approach facilitates a thorough exploration
of temporal patterns and characteristics. Subsequently, the
processed data seamlessly advances to the YOLOv8 model,
renowned for its exceptional object localization capabilities.
By merging the temporal insights derived from the sequen-
tial model with YOLOv8’s precision in pinpointing object
locations, our approach offers a comprehensive and effective
solution for object detection and localization tasks. This inte-
gration represents a significant stride in enhancing the field
of security and surveillance.

REFERENCES
[1] S. Kalli, T. Suresh, A. Prasanth, T. Muthumanickam, and K. Mohanram,
‘‘An effective motion object detection using adaptive background modeling
mechanism in video surveillance system,’’ J. Intell. Fuzzy Syst., vol. 41,
no. 1, pp. 1777–1789, Aug. 2021.
[2] M. Elhoseny, ‘‘Multi-object detection and tracking (MODT) machine
learning model for real-time video surveillance systems,’’ Circuits, Syst.,
Signal Process., vol. 39, no. 2, pp. 611–630, Feb. 2020.
FIGURE 10. Proposed method and SOTA comparison graphical [3] N. Bird, S. Atev, N. Caramelli, R. Martin, O. Masoud, and
representation. N. Papanikolopoulos, ‘‘Real time, online detection of abandoned
objects in public areas,’’ in Proc. IEEE Int. Conf. Robot. Autom., Jul. 2006,
pp. 3775–3780.
the established and our proposed methods. An internal local- [4] K.-H. Jo, ‘‘Cumulative dual foreground differences for illegally parked
izer comparison, specifically focusing on YOLOv8, has been vehicles detection,’’ IEEE Trans. Ind. Informat., vol. 13, no. 5,
pp. 2464–2473, Oct. 2017.
carried out to enhance the evaluation. Table 2 illustrates the [5] E. Luna, J. San Miguel, D. Ortego, and J. Martínez, ‘‘Abandoned object
performance of various localizers, where YOLOv8l emerges detection in video-surveillance: Survey and comparison,’’ Sensors, vol. 18,
as the top performer with a precision score of 99.7%, a recall no. 12, p. 4290, Dec. 2018.
[6] M. A. Mahale, H. Kulkarni, and P. Student, ‘‘Survey on abandoned
score of 99.5%, and an impressive F1-Score of 99.1%. object detection in surveillance video,’’ Int. J. Eng. Sci. Comput., vol. 7,
For the proposed method and state-of-the-art methods pp. 15595–15599, Jan. 2017.
comparison we closely scrutinize key performance metrics, [7] S. Jha, C. Seo, E. Yang, and G. P. Joshi, ‘‘Real time object detection and
including Accuracy, Precision, and Recall. The proposed trackingsystem for video surveillance system,’’ Multimedia Tools Appl.,
vol. 80, no. 3, pp. 3981–3996, Jan. 2021.
model underwent a comprehensive evaluation, comparing [8] V. Akre, A. Rajan, J. Ahamed, A. A. Amri, and S. A. Daisi, ‘‘Smart digital
it to pre-existing methods. Empirically, the results demon- marketing of financial services to millennial generation using emerging
strated that the proposed model significantly outperformed technological tools and buyer persona,’’ in Proc. 6th HCT Inf. Technol.
Trends (ITT), 2019, pp. 120–125.
all existing state-of-the-art models by achieving a substantial [9] B. Qian, Z. Wen, J. Tang, Y. Yuan, A. Y. Zomaya, and R. Ranjan, ‘‘Osmotic-
increase in accuracy. The detailed comparison findings are Gate: Adaptive edge-based real-time video analytics for the Internet of
outlined in Table 7. Things,’’ IEEE Trans. Comput., vol. 72, no. 4, pp. 1178–1193, Apr. 2023.
[10] Y. D. Teja, ‘‘Static object detection for video surveillance,’’ Multimedia
Figure (10) presents a graphical comparison of the per- Tools Appl., vol. 82, no. 14, pp. 21627–21639, Jun. 2023.
formance metrics, including accuracy, precision, and recall [11] S. Khan and L. AlSuwaidan, ‘‘Agricultural monitoring system in video
between the existing method and the proposed one. Sig- surveillance object detection using feature extraction and classification
by deep learning techniques,’’ Comput. Electr. Eng., vol. 102, Sep. 2022,
nificantly, the proposed method distinctly outperforms the Art. no. 108201.
mentioned existing methods by a substantial margin. [12] D.-Y. Ge, X.-F. Yao, W.-J. Xiang, and Y.-P. Chen, ‘‘Vehicle detection
and tracking based on video image processing in intelligent transportation
V. CONCLUSION system,’’ Neural Comput. Appl., vol. 35, no. 3, pp. 2197–2209, Jan. 2023.
In the realm of video surveillance, a significant yet chal- [13] A. Sathesh and Y. B. Hamdan, ‘‘Speedy detection module for aban-
doned belongings in airport using improved image processing technique,’’
lenging focus lies on the realm of automatic event detection. J. Trends Comput. Sci. Smart Technol., vol. 3, no. 4, pp. 251–262,
Particularly, the detection of abandoned objects (AOD) has Dec. 2021.

35550 VOLUME 12, 2024


A. M. Qasim et al.: Abandoned Object Detection and Classification Using Deep Embedded Vision

[14] N. Dwivedi, D. K. Singh, and D. S. Kushwaha, ‘‘An approach for unat- [37] K. Chen, K. Franko, and R. Sang, ‘‘Structured model pruning of convolu-
tended object detection through contour formation using background tional networks on tensor processing units,’’ 2021, arXiv:2107.04191.
subtraction,’’ Proc. Comput. Sci., vol. 171, pp. 1979–1988, Jan. 2020. [38] B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, and T. Huang, ‘‘Revisiting
[15] B. V. V. Indhuja, V. M. V. Reddy, N. Nikhitha, and P. Pramila, ‘‘Suspi- RCNN: On awakening the classification power of faster RCNN,’’ in Proc.
cious activity detection using LRCN,’’ in Proc. 5th Int. Conf. Smart Syst. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 453–468.
Inventive Technol. (ICSSIT), Jan. 2023, pp. 1463–1470. [39] S. Zhai, D. Shang, S. Wang, and S. Dong, ‘‘DF-SSD: An improved SSD
[16] H. Su, W. Wang, and S. Wang, ‘‘A robust all-weather abandoned objects object detection algorithm based on DenseNet and feature fusion,’’ IEEE
detection algorithm based on dual background and gradient operator,’’ Access, vol. 8, pp. 24344–24357, 2020.
Multimedia Tools Appl., vol. 82, no. 19, pp. 29477–29499, Aug. 2023. [40] D. Reis, J. Kupec, J. Hong, and A. Daoudi, ‘‘Real-time flying object
[17] N. Ta, H. Chen, Y. Lyu, X. Wang, Z. Shi, and Z. Liu, ‘‘A complementary detection with YOLOv8,’’ 2023, arXiv:2305.09972.
and contrastive network for stimulus segmentation and generalization,’’ [41] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He,
Image Vis. Comput., vol. 135, Jul. 2023, Art. no. 104694. ‘‘A comprehensive survey on transfer learning,’’ Proc. IEEE, vol. 109,
[18] J. Ju and J. Xing, ‘‘Moving object detection based on smoothing three no. 1, pp. 43–76, Jan. 2021.
frame difference method fused with RPCA,’’ Multimedia Tools Appl.,
vol. 78, pp. 29937–29951, Jan. 2019.
[19] H. Park, S. Park, and Y. Joo, ‘‘Robust detection of abandoned object for
smart video surveillance in illumination changes,’’ Sensors, vol. 19, no. 23,
p. 5114, Nov. 2019.
ARBAB MUHAMMAD QASIM received the
[20] H. Lee, J. Yoon, Y. Jeong, and K. Yi, ‘‘Moving object detection and
tracking based on interaction of static obstacle map and geometric model-
master’s degree in computer science from Iqra
free approachfor urban autonomous driving,’’ IEEE Trans. Intell. Transp. National University, Peshawar, Pakistan. He is
Syst., vol. 22, no. 6, pp. 3275–3284, Jun. 2021. currently pursuing the Ph.D. degree with the
[21] A. Ben Mabrouk and E. Zagrouba, ‘‘Abnormal behavior recognition for Department of Computer Science, Islamia College
intelligent video surveillance systems: A review,’’ Expert Syst. Appl., University Peshawar, Pakistan.
vol. 91, pp. 480–491, Jan. 2018.
[22] V. Tsakanikas and T. Dagiuklas, ‘‘Video surveillance systems-current
status and future trends,’’ Comput. Electr. Eng., vol. 70, pp. 736–753,
Aug. 2018.
[23] S. Ammar, T. Bouwmans, N. Zaghden, and M. Neji, ‘‘Moving objects
segmentation based on deepsphere in video surveillance,’’ in Proc. Int.
Symp. Vis. Comput., Lake Tahoe, NV, USA, 2019, pp. 307–319.
[24] P. Grandhe, P. B. Dhanush, M. Mohammad, A. N. A. A. Lakshmi, and
C. V. S. R. Kumar, ‘‘An extensive study on unattended object detection NAVEED ABBAS received the Ph.D. degree
in video surveillance,’’ in Proc. Int. Conf. Intell. Sustain. Syst., 2023, in computer science from Universiti Teknologi
pp. 183–193. Malaysia, in 2016. He is currently an Assis-
[25] Q. Fan and S. Pankanti, ‘‘Modeling of temporarily static objects for robust tant Professor with the Department of Computer
abandoned object detection in urban surveillance,’’ in Proc. 8th IEEE Int. Science, Islamia College University Peshawar,
Conf. Adv. Video Signal Based Surveill. (AVSS), Aug. 2011, pp. 36–41. Pakistan.
[26] E. Omrani, H. Mousazadeh, M. Omid, M. T. Masouleh, H. Jafar-
biglu, Y. Salmani-Zakaria, A. Makhsoos, F. Monhaseri, and A. Kiapei,
‘‘Dynamic and static object detection and tracking in an autonomous
surface vehicle,’’ Ships Offshore Struct., vol. 15, no. 7, pp. 711–721,
Aug. 2020.
[27] P. Narwal and R. Mishra, ‘‘Real time system for unattended Baggag E
detection,’’ Proc. Int. Res. J. Eng. Technol., vol. 6, no. 11, p. 3, 2019.
[28] T. Mahalingam and M. Subramoniam, ‘‘A robust single and multiple mov-
ing object detection, tracking and classification,’’ Appl. Comput. Informat.,
AMJID ALI received the B.S. degree in com-
vol. 17, no. 1, pp. 2–18, Jan. 2021. puter science from Islamia College University
[29] M. Din, A. Bashir, A. Basit, and S. Lakho, ‘‘Abandoned object detection Peshawar, Pakistan, in 2021. He is currently an
using frame differencing and background subtraction,’’ Int. J. Adv. Comput. Assistant Researcher with Islamia College Univer-
Sci. Appl., vol. 11, no. 7, p. 3, 2020. sity Peshawar.
[30] W. Hassan, P. Birch, B. Mitra, N. Bangalore, R. Young, and C. Chatwin,
‘‘Illumination invariant stationary object detection,’’ IET Comput. Vis.,
vol. 7, no. 1, pp. 1–8, Feb. 2013.
[31] H. Park, S. Park, and Y. Joo, ‘‘Detection of abandoned and stolen objects
based on dual background model and mask R-CNN,’’ IEEE Access, vol. 8,
pp. 80010–80019, 2020.
[32] S. P. Lwin and M. T. Tun, ‘‘Deep convonlutional neural network for
abandoned object detection,’’ Int. Res. J. Mod. Eng. Technol. Sci., vol. 4,
pp. 1549–1553, Mar. 2022.
[33] R. Pulungan and K.-H. Jo, ‘‘Stationary object detection for vision-based BANDAR ALI AL-RAMI AL-GHAMDI received
smart monitoring system,’’ in Proc. Asian Conf. Intell. Inf. Database Syst., the B.Sc. degree in computer sciences from
Dong Hoi City, Vietnam, 2018, pp. 583–593. King Abdulaziz University, Jeddah, Saudi Arabia,
[34] L. H. Palivela and S. Ramachandran, ‘‘An enhanced image hashing to in 2003, the M.Sc. degree in information technol-
detect unattended objects utilizing binary SVM classification,’’ J. Comput. ogy from De Montfort University, Leicester, U.K.,
Theor. Nanosci., vol. 15, no. 1, pp. 121–132, Jan. 2018. in 2008, and the Ph.D. degree from Université
[35] Y. Samaila, H. Rabiu, and I. Mustapha, ‘‘Real-time detection of abandoned de Reims Champagne-Ardenne, Reims, France,
object using centroid difference method,’’ Arid Zone J. Eng., Technol. in 2015. He is currently an Assistant Professor
Environ., vol. 16, pp. 48–57, Aug. 2020. with Arab Open University, Riyadh, Saudi Arabia.
[36] H. Smitha and V. Palanisamy, ‘‘Detection of stationary foreground objects His research interests include sensor networks,
in region of interest from traffic video sequences,’’ Int. J. Comput. Sci. distributed systems, and eHealth systems.
Issues, vol. 9, p. 194, Dec. 2012.

VOLUME 12, 2024 35551

You might also like