0% found this document useful (0 votes)
46 views12 pages

Abandoned Object Detection Via Temporal Consistency Modeling and Back-Tracing Verification For Visual Surveillance

This document proposes an approach for detecting abandoned luggage in surveillance videos using background subtraction and temporal modeling. It combines short-term and long-term background models to extract foreground objects. Static foreground regions are identified based on temporal transitions of pixel classifications. Abandoned object candidates are then verified by analyzing back-traced trajectories of luggage owners. The experimental results on standard datasets show this approach outperforms previous methods for detecting abandoned luggage.

Uploaded by

bkdaddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views12 pages

Abandoned Object Detection Via Temporal Consistency Modeling and Back-Tracing Verification For Visual Surveillance

This document proposes an approach for detecting abandoned luggage in surveillance videos using background subtraction and temporal modeling. It combines short-term and long-term background models to extract foreground objects. Static foreground regions are identified based on temporal transitions of pixel classifications. Abandoned object candidates are then verified by analyzing back-traced trajectories of luggage owners. The experimental results on standard datasets show this approach outperforms previous methods for detecting abandoned luggage.

Uploaded by

bkdaddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO.

7, JULY 2015 1359

Abandoned Object Detection via Temporal


Consistency Modeling and Back-Tracing
Verification for Visual Surveillance
Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin, Senior Member, IEEE, and Yi-Ping Hung

Abstract— This paper presents an effective approach for regions (i.e., objects that remain static for a long time) as
detecting abandoned luggage in surveillance videos. We combine left-luggage candidates.
short- and long-term background models to extract foreground
objects, where each pixel in an input image is classified as a A. Related Works
2-bit code. Subsequently, we introduce a framework to identify
static foreground regions based on the temporal transition of code The algorithms for identifying a static foreground or
patterns, and to determine whether the candidate regions contain abandoned object can be classified into three categories.
abandoned objects by analyzing the back-traced trajectories of The first category involves constructing double-background
luggage owners. The experimental results obtained based on models for detecting a static foreground [1]–[3]. The double-
video images from 2006 Performance Evaluation of Tracking
and Surveillance and 2007 Advanced Video and Signal-based background models are constructed using fast and slow
Surveillance databases show that the proposed approach is effec- learning rates. Subsequently, the static foreground is localized
tive for detecting abandoned luggage, and that it outperforms by differentiating between the two obtained foregrounds.
previous methods. A weakness of these methods is the high false alarm rate,
Index Terms— Abandoned luggage detection, abandoned which is typically caused by imperfect background subtraction
object detection, short-term background model, long-term resulting from a ghost effect, stationary people, and crowded
background model, object detection and tracking, visual scenes. In addition, these methods involve using only the
surveillance. foreground information per single image to locate regions of
I. I NTRODUCTION interest (ROIs) of abandoned-object candidates. Consequently,
temporally-consistent information that may be useful for
I N THE visual surveillance research, detecting abandoned
luggage is referred to as the problem of abandoned-object
or left-luggage detection. It is a crucial task for public
identifying sequential patterns of ROIs may be overlooked.
The second category of methods for extracting
security, particularly for identifying suspicious stationary static foreground regions involves using a specialized
items. Because there is no object type of category that can be mixture of Gaussian (MOG) background model. In previous
assumed as having been abandoned, common object detection researches [4]–[6], three Gaussian mixtures were used to
methods such as training an object detector for a particular classify foreground objects as moving foreground, abandoned
category of objects are inappropriate for performing this objects, and removed objects by performing background
task. To address this problem, foreground/background extrac- subtraction. In addition, the approach proposed in [6] uses
tion techniques are suitable for identifying static foregrounds visual attributes and a ranking function to characterize various
types of alarm events.
Manuscript received May 30, 2014; revised December 23, 2014; accepted The third category involves accumulating a period of binary
February 12, 2015. Date of publication March 2, 2015; date of current version
May 15, 2015. This work was supported in part by the Ministry of Science foreground images or tracking foreground regions to identify
and Technology, Taiwan, under Grant MOST 103-2221-E-305-008-MY2 a static foreground. The methods proposed in [7] and [8]
and Grant MOST 103-2221-E-001-010 and in part by Taiwan Secom involved localizing the static foreground based on the pixels
Company, Ltd. The associate editor coordinating the review of this manuscript
and approving it for publication was Prof. Liang Wang. with the maximal accumulated values, which were subse-
K. Lin is with the Institute of Information Science, Academia Sinica, quently considered the candidate regions of stationary objects.
Taipei 11529, Taiwan (e-mail: [email protected]). However, this category of methods fails in complex scenes.
S.-C. Chen is with the Department of Computer Science and Information
Engineering, National Taiwan University, Taipei 10617, Taiwan (e-mail: LV et al. [9] used a blob tracker to track foreground objects
[email protected]). based on their size, aspect ratio, and location. Left luggage
C.-S. Chen is with the Research Center for Information Technology is identified when a moving foreground blob stops moving
Innovation & Institute of Information Science, Academia Sinica, Taipei 11529,
Taiwan (e-mail: [email protected]). for a long period. Li et al. [10] tracked moving objects
D.-T. Lin is with the Department of Computer Science and by incorporating principle color representation (PCR) into a
Information Engineering, National Taipei University, Taipei 23741, Taiwan template-matching scheme, and also by estimating the status
(e-mail: [email protected]).
Y.-P. Hung is with the Graduate Institute of Networking and (e.g., occluded or removed) of a stationary object.
Multimedia, National Taiwan University, Taipei 10617, Taiwan (e-mail: Rather than using a single camera, some approaches
[email protected]). use multiple cameras for detecting abandoned luggage.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. Auvinet et al. [11] employed two cameras for detecting
Digital Object Identifier 10.1109/TIFS.2015.2408263 abandoned objects, and the planer homography between
1556-6013 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1360 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

two cameras was used to regulate the foreground tracking


results.
To fulfill the semantic requirement of abandoned
luggage events where a person drops their luggage and
then leaves, some of the aforementioned methods combine a
tracker to track the involved person(s) for further verification.
Liao et al. [7] tracked luggage owners based on skin color
information and by performing contour matching with a
Hough transform. In [1], Kalman filter (KF) and unscented
KF (UKF) were used to track foreground objects (including
people and carried luggage) based on low-level features, such
as color, contour, and trajectory. Tian et al. [4] integrated
a human detector and blob tracker to track the owner of
abandoned luggage, and the corresponding trajectory was
Fig. 1. Flowchart of static foreground detection.
recorded for further analysis. Fan et al. [6] used a blob
tracker to track moving people close to the left-luggage. The subsection provides a conceptual review of background-
obtained movement information was used as an input for subtraction and the associated learning rates for updating
their attribute-based alert ranking function. a background model. Subsequently, the remaining
B. Our Approach subsections introduce our algorithm for identifying static
foreground regions.
In this paper, we propose a temporal dual-rate foreground
integration method for static-foreground estimation for single- A. Review of Background Modeling and Learning Rates
camera video images. Our approach involves constructing both Background subtraction is an essential technique for detect-
short- and long-term background models learned from an ing moving objects in surveillance systems. To apply this
input surveillance video on-line. Subsequently, we introduce technique, a pixel-based background model is typically learned
a simple pixel-based finite-state machine (PFSM) model that from preceding images. The learned background model is
uses temporal transition information to identify the static used to identify whether each pixel of the incoming image
foreground based on the sequence pattern of each object pixel. is a background pixel. When a pixel in an incoming image
Because the proposed approach involves using temporal is identified as a background pixel, the associated features
transition information, we can reduce the influence of imper- (e.g., pixel color) can subsequently be used to update the
fect foreground extractions in the double-background models, background model to more suitably represent the recently
thereby improving the accuracy of the constructed static observed pixel values. Given a sequence of images It (t ∈ N)
foreground inference. An owner-tracking procedure is also of size m × n, the principle of a general background modeling
employed in our method to semantically verify the abandoned- and updating procedure can be summarized as follows:
object event. Contributoins of the proposed method over 1) Initialize a background model B(x, y) for each pixel
previous methods are summarized as follows. (x, y), 0 ≤ x ≤ m − 1, and 0 ≤ y ≤ n − 1.
1) We introduce a dual-rate background modeling 2) For every pixel (x, y) of the incoming image It ,
framework with temporal consistency. It performs if It (x, y) ∈ B(x, y), then (x, y) is classified as a back-
considerably better than the single-image-based double ground pixel, otherwise it is considered a foreground
background models in [1]–[3]. pixel.
2) We develop a simple spatial-temporal tracking 3) For every newly identified background pixel (x, y),
method for back-tracing verification. Compared to the update B(x, y) by considering the new training
frame-by-frame tracking approaches such as the sample, It (x, y).
KF- or UKF employed in [1], our approach is superior 4) t ← t + 1, go to Step 2).
in handling temporary occlusions and is still highly In this procedure, a learning rate λ ∈ [0, 1] is typically
efficient to implement. applied to update the background in Step 3). The learning rate
3) Experimental results on benchmark datasets (PETS2006 provides a tradeoff between λB and (1 − λ)It , and thus the
and AVSS2007) show that our method performs more preceding model B is tuned toward the new training data
favorably against all of the compared methods [1]–[8]. It faster when λ is smaller in the incremental updating.
The remainder of this paper is organized as follows. For example, in the MOG method proposed in [12], the
Section II details the proposed algorithm, Section III shows the background model B(x, y) is recorded as an mixture-of-
experimental results, and finally, our conclusion and discussion Gaussian distribution in RGB color space. The learning rate
are offered in Section IV. λ is applied to update the mixture-distribution model when
the new color It (x, y) is observed and (x, y) is identified
II. T EMPORAL D UAL -R ATES F OREGROUND as a background pixel. Similar updating mechanisms exist in
I NTEGRATION M ETHOD other methods such as Codebook [13], enhanced Gaussian
The proposed abandoned-object detection method is based mixture model (EGMM) algorithm [14], and coarse-to-fine
on background modeling and subtraction. The following approach [15].

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
LIN et al.: ABANDONED OBJECT DETECTION VIA TEMPORAL CONSISTENCY MODELING AND BACK-TRACING VERIFICATION 1361

Fig. 2. Background subtraction results of PETS2006-S1 video sequence.

B. Long-Term and Short-Term Integration


Background Modeling
Figure 1 shows an overview of the integrated background
modeling method proposed in this study. First, we describe the
long- and short-term models built in our approach for static
foreground detection. The proposed algorithm starts from a
generic background modeling method operated at two learning
rates. Without loss of generality, we select the MOG method
in [12] as our background modeling method; however, other
methods equipped with learning-rate mechanisms for updating
background models can be used in our framework as well.
As aforementioned, a small learning rate λ S updates the Fig. 3. An example of object abandoned event, where the combination
background model at a faster speed. The model that learns at of long-term and short-term foreground results is well suited for abandoned
luggage detection.
this small rate is called the short-term background model B S ,
where FS denotes the binary foreground image obtained
TABLE I
via the short-term model. By contrast, a large learning
P IXEL C LASSIFICATION F ROM L ONG -T ERM AND
rate λ L yields the model that is updated at a slower speed.
S HORT-T ERM BACKGROUND M ODEL [2]
Similarly, the model that learns at this rate is referred to as the
long-term background model B L , where FL denotes the
binary foreground image obtained using the long-term model.
Figure 2 shows an example of the foreground regions obtained
using the long- and short-term background models.
The assembly of long- and short-term background models
is suitable for detecting stationary objects. Figure 3 shows
an example of an abandoned-object event. Whenever luggage Therefore, there are four states represented by the two-bit
is left by an owner, the long-term model detects it as a code Si , as shown in Table I, and they are expressed as follows:
foreground object, as shown in Figure 3(c). Moreover, because • Si = 00 indicates that pixel i is a background pixel
of the faster updating rate, the left-luggage would be classified because it is classified as background by both B L and B S .
as a background object by the short-term model, as shown • Si = 01 implies that pixel i is an uncovered background
in Figure 3(d). Accordingly, a pixel is represented as a two- pixel that has been temporarily occluded by an object and
bit code Si by concatenating the detected long- and short-term then exposed in a recent image.
foregrounds, as follows: • Si = 10 indicates that pixel i is likely to be a static
foreground pixel.
Si = FL (i )FS (i ), (1) • Si = 11 indicates that pixel i corresponds to a moving
object.
where FL (i ), and FS (i ) ∈ {0, 1} represent the binary values When detecting abandoned objects, we are primarily con-
of pixel i of the foreground images. cerned which pixels exhibiting a state value of 10, because

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1362 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

Fig. 4. PFSM for static foreground detection. M F is moving foreground, C S F denotes candidate static foreground and S F represents static foreground.
Ts is the transition time for changing state from 10 to 10.

1) these are foreground pixels that have existed for a long transition pattern are considered static foreground pixels.
time, as indicated by their long-term presence under the Otherwise, the state of pixel i would return to the initial
long-term model, and 2) they have not moved or vibrated state and restart until the initial state Si = 11. The PFSM
for a considerable period of time; thus, the short-term model model thus describes the following rule: given a two-bit
is expected to reject it soon. These properties confine the code sequence, if there is consecutive subsequence starting
aforementioned static foreground pixel and make the state by a series of 11 and followed by a sufficiently long series
codes suitable for identifying abandoned object candidates. of 10, then this subsequence is a detection of the static
However, these codes are defined for a single image foreground.
only. Because noise could result from imperfect background For each frame, those pixels accepted by the PFSM
modeling, these codes could be temporary or imprecise. model are collected. Subsequently, we perform a connected
Hence, the pixel classifications in Table I for single images component analysis to group those pixels and remove the small
are typically insufficient for identifying abandoned objects in components. If no pixel is accepted by the PFSM model, or
an uncertain environments, which is why methods for a single if all of the components in the current frame are too small,
or isolated images, such as that proposed in [2], are unreliable no further verification is performed. Otherwise, the preserved
and frequently fail in practical cases. components (i.e., the static foreground pixels) are considered
In this paper, we propose using temporal-continuity the abandoned luggage candidates in the current frame, and
information to improve the performance. We assert that the they are sent to the subsequent stage for further verification by
code pattern in an image sequence should primarily follow using the back-tracing algorithm, as detailed in the following
a temporal rule, and that the rule is representable by a very section.
simple finite-state machine (FSM) model. Details are given in
the next section. D. Back-Tracing Verification
Next, we verify whether the luggage is abandoned or simply
C. Pixel-Based Finite State Machine (PFSM) placed on the ground for a short time by using the back-tracing
Instead of recognizing the status of each pixel based on verification procedure. Accordingly, our system first verifies
only a single frame, we use temporal transition information to whether the luggage owner is close to the luggage. If the owner
identify the stationary objects based on the sequential pattern does not return to his or her luggage, the object is considered
of each pixel. A pixel is associated with only one state at abandoned. To perform the aforementioned semantic analysis
a time. Based on long- and short-term background models, of the object-abandoned event, the back-tracing verification is
the state of pixel i can be changed from one state at time t performed as follows.
to another state at time t + 1. Accordingly, we construct The static foregrounds found in Section II-C are
a simple FSM model to describe the behavior of each pixel. subsequently considered luggage candidates. When a static
We detect the static foreground by identifying a specific pattern foreground is deemed a left-luggage candidate at time t and
of transitions. Figure 4 illustrates the particular transition for no other moving foreground objects are within its neighbor
identifying the static foreground. region of radius D, we then return from the current frame t
As shown in Figure 4, the transition pattern describes the to the preceding frame t0 = t − Ts , which denotes the
static foreground in an object-abandoned event. Starting from moment that the owner has likely put down the luggage,
an initial state, the system is triggered by Si = 11, indicating where Ts is the transition-time constant employed in our PFSM
that pixel i is currently occluded by a foreground region. model (Figure 4). Let the image position of the left luggage
Hereafter, when a person abandons their luggage, the short- candidate be p at time t0 . Centered at p, we create a spatial-
term method soon updates the luggage into its background temporal window W0 of size (r 2 , δ), where r specifies the
model, whereas the long-term method does not; thus, the radius of a circle centered at p, and δ denotes the time interval
status of this site is changes to Si = 10. Finally, when [t0 , t0 + δ].
the status of Si = 10 persists for a certain duration of Subsequently, for window W0 , we consider all foreground
time (i.e., for Ts times), we then conjecture that pixel i blobs identified using the background subtraction algorithm.
has become a part of the static foreground. During this From these blobs, we then select the one that is approximate
procedure, only those pixels associated with this particular to the shape of human by using the height/width estimator

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
LIN et al.: ABANDONED OBJECT DETECTION VIA TEMPORAL CONSISTENCY MODELING AND BACK-TRACING VERIFICATION 1363

Fig. 5. (a) and (b) The trajectory construction of PETS2006-S1 in two different views. (c) Pedestrian detection result. (d) Result of Back-tracing verification.

Fig. 6. The proposed system diagram.

in [1] and the human detector in [16] and [17], which filter studies of left-luggage detection [1], our approach is more
the static foreground objects that could be human. powerful for handling temporary occlusions, and it is still
We give a brief review of the human detection method highly efficient to implement because only the foreground
below. The deformable part-based model (DPM) detector [16] blobs within a limited number of spatial-temporal windows
is one of the state-of-the-art human detection algorithms, are considered.
which employs the sliding window technique with multiple Figure 5 demonstrates our back-tracing result of the first
filter kernels to detect the object in an image. The object to sequence. Figure 5(a) and Figure 5(b) show the 3D trajectory
be detected is represented using a root filter and several part constructed based on our spatial-temporal structure. The back-
filters. The root filter describes the overall appearance of the tracing algorithm initiates the search for the owner from the
object while the part filters depict partial regions of the object. location of the luggage, and then proceeds examining similar
The object is located when the region is voted with the highest foreground patches. Figure 5(c) shows the pedestrian detection
scores by root and part filters. However, due to the number result. Figure 5(d) shows a summary of the object-abandoned
of filters adopted, the computation cost becomes extremely event in the first sequence. The regions denoting the owner are
high. To overcome this difficulty, Dubout and Fleuret [17] displayed sequentially with rainbow colors depicting various
approximate the sliding window technique as a convolutional time stamps of the event.
procedure. According to the theorem that time-domain convo- Our tracking procedure is extendable for preserving multiple
lution is equivalent to frequency-domain multiplication, the hypotheses of blobs tracked simultaneously when employing
part-based human detector is speeded up by fast Fourier a probabilistic framework such as particle filtering (PF)
transform and is employed in this work. to represent the multiple hypotheses for dynamic tracking.
The foreground region containing human is then treated as However, PF is slow and cannot fulfill the real-time verifi-
the owner blob for further tracking, and we denote its image cation requirements of most visual surveillance applications.
position as p1 . If there are more than one humans detected, Hence, we use the aforementioned single-hypothesis approach,
we simply choose the blob closest to p as the most-fit blob which can be generalized for more effective tracking when
position p1. necessary.
We extract the color distribution as a feature representation
of the foreground blobs. Next, centered at p1 , we creat a
new spatial-temporal window W1 of the size (r 2 , δ). We then E. Abandoned Object Event Analysis
employ the Bhattacharyya coefficient to identify the blob with Figure 6 shows the proposed system architecture. Once the
the color distribution most similar to that of the owner in W1 , trajectory of owner is obtained, a warning is issued that the
and then create a window W2 centered at the newly identified luggage has been abandoned in accordance with the following
blob. The aforementioned procedure is then used to track the two rules, as defined by PETS2006 [18].
blob representing the owner until the time exceed the original 1) Temporal rule: The luggage is declared an unattended
time t or until the tracked blob is outside of the neighbor object when it is left by its owner, and the luggage is
region (i.e., within radius D) centered at the candidate luggage. not reattended within time T= 30 seconds.
An advantage of the aforementioned procedure is that the 2) Spatial rule: The unattended luggage is declared an
time interval δ is used in the spatial-temporal domain, and abandoned object when it is left by its owner. When
hence it can track the target when occlusion occurs within δ. the distance between the owner and luggage is greater
Thus, unlike the frame-by-frame tracking approaches, such than a predefined distance D= 3 m, then an alarm event
as the KF- or UKF-based approaches employed in previous is triggered.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1364 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

TABLE II
P ERFORMANCE C OMPARISON ON PETS2006 V IDEO D ATASET

This reveals that our method is highly stable in the precision,


Fig. 7. Performance evaluation using different parameters on PETS2006.
We restricted the learning rates λ S and λ L in a fixed ratio λ S /λ L = 1/10. The
i.e., the abandoned objects detected are correct, but could miss
configuration, λ S = 0.0002 and λ L = 0.002, demonstrates a more favorable to detect some of the abandoned objects because of the non-
performance. Red, green, and blue bar represent F-measure, precision, and perfect recall rate for the PETS2006 dataset when different
recall, respectively.
parameters are selected. Among them, λ S = 0.0002 and
λ L = 0.002 perform more favorable against the others. Hence,
we choose this setting and use the identical parameter values
According to the PFSM, the temporal rule is
for all of the experiments conducted by this study, including
satisfied by letting Ts = 30 f frames, where f is the
the experiments for the datasets of AVSS2007 and ABODA
frames per second (fps) at which the video is captured.
(Section III.D).
The spatial rule is verified by examining the trajectory of
In addition, as the goal is to detect the abandoned object,
owner. We create a luggage-centered ROI with a radius of
considering only the region-of-interest area is a natural way
D = 3μ pixels (where μ denotes the scaling factor to
to reduce imperfect background initialization. We follow the
convert pixels into real-world distances), and investigate
previous studies (such as [2]) that manually marked the
whether the owner is within and then left the ROI. An alarm
train station platform in AVSS2007 and the waiting area in
is raised when both the spatial and temporal rules are satisfied.
PETS2006 for abandoned object detection.
III. E XPERIMENTAL R ESULTS B. Results on PETS2006 and AVSS2007
A. Implementation Details We conducted experiments using the public datasets
The proposed system was developed using the programming PETS2006 [18] and AVSS2007 [20].
language C/C++. The overall computation speed is 29 fps 1) PETS2006: The PETS2006 dataset comprises seven
when testing the video of 360×240 pixel video by using a sequences of various scenarios. Each sequence includes an
general purpose laptop with a 2.4 GHz Intel Core-i7 processor. abandoning event except the third one. In the third sequence,
Various previous studies have proposed background sub- a person puts down his bag for a short time; because the
traction algorithms, including MOG [12], Codebook [13], owner dose not abandon the luggage, no alarm should be
EGMM [14], and coarse-to-fine approach [15]. EGMM, which triggered. Table II compares the results of our approach with
is available in the OpenCV library, is used in this work because those obtained by several other state-of-the-art studies [1], [2],
of its high performance. [4], [7], [8], [10], [11]. Some previous studies have evaluated
In this study, the long- and short-term background models their methods by selecting a limited number of sequences
is constructed using EGMM, which is similar to MOG with and showing that their methods achieve high accuracy for
an additional mechanism for adapting the number of Gaussian those sequences only. By contrast, we have evaluated all
components for each pixel, instead of using a fixed number seven sequences, and our method successfully detects the
of Gaussian components for every computation. To satisfy the luggage-left events for the entire dataset of PETS2006 without
characteristics of dual-background models, the learning rate of triggering any false alarms. Figures 8 and 9 show the results
each background model should differ significantly. Based on of the 7th and 5th sequences of PETS2006, respectively.
our empirical study, we restricted the learning rates λ S and λ L Table III shows further evaluations of the precision-recall
in a fixed ratio λ S /λ L = 1/10, and find that the short- and of the proposed algorithm. The compared approaches are
long-term models can be distinguished well in practice. sorted in order of their corresponding F-measure values. The
First, we perform a preliminary experiment on the method in [11] accurately detects all abandoned objects, as
PETS2006 dataset by varying λ S and λ L , and evaluate the per- shown in Table II. However, their method results in several
formance of the abandoned object detection. Figure 7 shows false alarms in Sequence 5 and 7; consequently, their
that when λ S varies from 0.0001 to 0.0016 (and λ L varies from F-measures are lower than those of the other methods.
0.001 to 0.016, respectively), the precision value remains the Sequence 5 and 7 are challenging to solve because of the
same while the recall value becomes different, where FM is problems with crowded scenes and occlusion. However, our
the F-measure values [19] that can be expressed as a harmonic temporal consistency model robustly and successfully local-
mean between precision P and recall R with F M = 2×P×R P+R . izes the abandoned luggage. Furthermore, our back-tracing

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
LIN et al.: ABANDONED OBJECT DETECTION VIA TEMPORAL CONSISTENCY MODELING AND BACK-TRACING VERIFICATION 1365

Fig. 8. Detection results of the 7th sequence of PETS2006. Fig. 10. Detection results of the sequence AB-Easy of AVSS2007.

Fig. 9. Detection results of the 5th sequence of PETS2006. Fig. 11. Detection results of the sequence AB-Medium of AVSS2007.

TABLE III TABLE IV


C OMPARISON OF D IFFERENT M ETHODS ON PETS2006 V IDEO D ATASET C OMPARISON OF D IFFERENT M ETHODS ON AVSS2007 V IDEO D ATASET

method performs adequately and raises the alarm in a timely temporarily occluded, several methods yielded false alarms,
manner. and they were thus considered less promising. Noteworthily,
2) AVSS2007: We also tested our system by using the our method localize the abandoned objects in all three
AVSS2007 dataset. The dataset was obtained from the sequences, as shown in Table IV. The table also shows
i-LIDS video library, which includes several scenarios, such that [7] and [8] outperform several related works, as dose
as abandoned luggage and parked vehicles. We evaluated our method. However, [7] results in several false alarms when
the left-baggage scenario to fit the scope of this study. testing the PETS2006 dataset, and the method in [8] is eval-
The abandoned-luggage dataset comprises three sequences uated using the AVSS2007 dataset only, which is considered
(AB-EASY, AB-MEDIUM and AB-HARD) that are labeled limited in the context of comparative research. In the context
with various difficulty levels according to the luggage size and of evaluating both the PETS2006 and AVSS2007 datasets,
crowd density. Each sequence contains only one abandoned- the proposed method is more effective than the previous
luggage event, similar to the PETS2006 dataset. We followed studies for detecting abandoned objects, and achieves the best
the detection rules provided by i-LIDS, which stipulates that performance in general.
the detection area is restricted to the platform of the train
station. Some detection results of AVSS2007 are shown in C. Effectiveness of PFSM and Back-Tracing Verification
Figures 10 and 11. This section validates the effectiveness of the proposed
For the sake of comparison, Table IV shows the precision- PFSM model and back-tracing procedure in improving the
recall of our method and those reported by other state-of-the- performance of abandoned-luggage events. Hereafter, we
art studies [2]–[8]. The luggage-left event is easily detected define DualBG-only as the method that uses only the pixel
because of the large size of luggage and less occlusion classifications from the dual-background models shown in
in AB-EASY. By contrast, AB-MEDIUM and AB-HARD are Table I in each single image to detect the abandoned objects.
more difficult because they involve scenes with small pieces In addition, we define PFSM-only as the method that removes
of luggage and dense crowds. Because of the luggage was the back-tracing module in our algorithm.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1366 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

TABLE V TABLE VII


P ERFORMANCE C OMPARISON ON PETS2006 D ETECTION R ESULTS ON O UR O WN D ATASET ABODA

TABLE VI
P ERFORMANCE C OMPARISON ON AVSS2007

Fig. 13. Detection results of the sequence Video2.

Fig. 12. Examples in ABandoned Objects DAtaset (ABODA). The dataset


consists of different scenarios include (a) and (b) outdoor environment,
(c) indoor environment, and (d) sudden light changes condition.

For comparison, Table V and Table VI presents the


performance under various configurations. All methods attain
high recall values; thus, reducing the occurrence of false
alarms (i.e., improving the precision) is a critical problem. Fig. 14. Detection results of the sequence Video3.
DualBG-only provides unstable prediction caused of noisy
and imperfect background subtraction processes. Compared in a spatial-temporal window search. The overall computation
to the PFSM-only method, the temporal transition pattern speed of our system is 29 fps.
analysis is critical for detecting the abandoned objects. The
PFSM effectively reduces the occurrence of false alarms, and D. Realistic Environment Detection in Our Own Sequence
it improves the overall precision to 50% on the PETS2006
In this study, we constructed the ABandoned Objects
dataset and 43% on the AVSS2007 dataset. Most of the
DAtaset (ABODA) for further reliability evaluation.1 ABODA
false alarms generated by the PFSM are associated with
comprises 11 sequences labeled with various real-application
cases of a person remaining temporarily still; for example,
scenarios that are challenging for abandoned-object detection.
in Sequence 3 of the PETS2006 dataset, a person sets down
The situations include crowded scenes, marked changes in
his luggage and rests for a short period. Hence there should be
lighting condition, night-time detection, as well as indoor
no abandoned event in this case; however, the PFSM issues an
and outdoor environments. Figure 12 shows some sequences
alarm because it could not verify whether the owner had left.
from the ABODA dataset. Figure 12(a) shows a scenario
Therefore, combining the back-tracing function is assisted
of a luggage-left event. The owner places his bag down
in correctly identifying the alarm event. Although tracking
and converses with another person before leaving the scene
remains challenging in crowded scenes, however, we only need
without his bag (also shown in Figure 15). Figure 12(d) shows
to trace the owner in the luggage-centroid ROI, in accordance
a night-time scene. The stationary people stops beside the
with the temporal and spatial rules stipulated by PETS2006.
It is adequately efficient when tracking the owner by using a 1 ABODA is publicly available for scientific studies and can be downloaded
simple blob tracker with a human-detector verification method from http://imp.iis.sinica.edu.tw/ABODA/index.html

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
LIN et al.: ABANDONED OBJECT DETECTION VIA TEMPORAL CONSISTENCY MODELING AND BACK-TRACING VERIFICATION 1367

Fig. 15. Static foreground detection of our own dataset. Compare to the single-frame-based method [2], the proposed PFSM precisely localizes the static
foreground region of the left luggage, and effectively prevents false alarms generated by ghost and still people.

light rays, and shadows are thus produced behind them. In causing several false alarms. An intuitive solution would be
this case, the shadows are similar to an abandoned object that speeding up the learning rate of the long-term background
is “dropped” near the people, which is also a difficult situation model when the illumination conditions change suddenly.
for abandoned-object detection algorithms. However, this method may be unreliable because both back-
The detection results in Table VII show that the overall ground models would treat the abandoned object as a back-
precision P and recall R are 66.67% and 100%, respectively. ground object. Therefore, addressing considerable changes
The proposed method successfully recalls all of the abandoned illumination remains a challenging issue in our framework.
objects for both outdoor and indoor environments. Figure 13 The challenge of the 11th video is caused by the crowded
and Figure 14 show the ABODA detection results. scene and partial occlusion problem of small objects. In this
In the experiments, few false alarms are raised, which were video, there are a crowd of people waiting in line at the infor-
primarily caused by sudden changes in illumination conditions mation desk, and the videos were taken by a distant camera.
and crowded scenes. In general, our static foreground detection The crowded people (together with specular lightings) cause
is based on long- and short-term background modeling. When- unperfect background subtraction, as shown in Figure 17. The
ever the light was suddenly turned off, the detection scene small object size and partial occlusion also makes the owner
became completely dark. Because of its fast learning rate, the identification and tracking highly demanding. Hence, handling
short-term background model adapted this condition quickly; more complex crowded scenes with intensively partial occlu-
however, the long-term background model could not work sions still remains a challenging problem.
well, and it extracted several inaccurate foregrounds. These
inaccurate foregrounds retained a state S = 10 for a while; E. Performance Comparison With Different
consequently, the state transition of these foregrounds was Background Subtraction Methods
similar to that of the static foreground. Figure 16 shows that Background modeling plays an important role in the
this condition may have affected the FSM analysis, thereby proposed system. The performance of employing different

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1368 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

Fig. 16. Results of background subtraction in the sequence Video8 with the scenario of sudden light switching. The digital camera converts to IR mode
starts from frame 278.

Fig. 17. Background subtraction results of the sequence Video11 with the scenario of crowded scene, which remains a challenge to our approach.

background models in our framework are shown as follows. TABLE VIII


We have implemented another popular background-modeling P ERFORMANCE C OMPARISON W ITH D IFFERENT BACKGROUND
method, Codebook [13], for performance comparison. Both S UBTRACTION M ETHODS ON PETS2006
EGMM and Codebook gather a series of colors for each pixel
and then employ on-line mixture-distribution or clustering
to find the candidate colors for per-pixel model building.
Codebook has also the parameters controlling the background
updating speed. Hence, it can be used for generating long- and
short-term background models as well. are apt to be inconsistent in the background subtraction
Unlike EGMM that treats the colors inside a spherical results. We conduct the experiment on PETS2006 dataset for
region centered at a candidate color C as the background performance comparison of EGMM and Codebook. We follow
colors, Codebook treats the colors inside a cylindrical region the preliminary parameter evaluation as mentioned in
centered at C as the background colors. Because the shadow Section III.A, and select the best parameters for each
(or lighting) could cause a pixel’s color to shift toward (or far background model. Table VIII indicates that EGMM demon-
away from) the origin in the RGB color space, Codebook strates a more favorable performance than Codebook.
claims that a cylindrical region with its axis passing through Figure 18 illustrates that the foreground regions extracted from
the origin can avoid generating false foreground pixels caused Codebook have many fragmented regions. The fragments are
by shadow or lighting. getting increased when the learning rate is slower, particu-
However, a drawback of Codebook is that it tends to larly for the long-term model. The noisy foreground regions
generate fragmented foregrounds because neighbor pixels generated from Codebook make our method fail to infer the

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
LIN et al.: ABANDONED OBJECT DETECTION VIA TEMPORAL CONSISTENCY MODELING AND BACK-TRACING VERIFICATION 1369

Fig. 18. Performance comparison of background modeling with EGMM and Codebook. First row shows several sample frames in PETS2006.
Second and third rows show the long- and short-term model results of EGMM, respectively. Fourth and fifth rows show the long- and short-term model
results of Codebook, respectively.

static foreground pixels. Therefore, we recommend to employ R EFERENCES


EGMM due to its better performance for dual-rate background [1] J. Martínez-del-Rincón, J. E. Herrero-Jaraba, J. R. Gómez, and
modeling. C. Orrite-Urunuela, “Automatic left luggage detection and tracking using
multi-camera UKF,” in Proc. IEEE 9th IEEE Int. Workshop PETS,
Jun. 2006, pp. 59–66.
IV. C ONCLUSION [2] F. Porikli, Y. Ivanov, and T. Haga, “Robust abandoned object detection
using dual foregrounds,” EURASIP J. Adv. Signal Process., vol. 2008,
This paper presents a temporal consistency model combin- Jan. 2008, Art. ID 30.
ing a back-tracing algorithm for abandoned object detection. [3] R. H. Evangelio, T. Senst, and T. Sikora, “Detection of static objects
Characteristics of the proposed approach are summarized as for the task of video surveillance,” in Proc. IEEE WACV, Jan. 2011,
pp. 534–540.
follows: [4] Y. Tian, R. S. Feris, H. Liu, A. Hampapur, and M.-T. Sun, “Robust
1) The temporal consistency model is described by a detection of abandoned and removed objects in complex surveillance
very simple FSM. It exploits the temporal transition videos,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 41, no. 5,
pp. 565–576, Sep. 2011.
pattern generated by short- and long-term background [5] Q. Fan and S. Pankanti, “Modeling of temporarily static objects for
models, which can accurately identify static foreground robust abandoned object detection in urban surveillance,” in Proc. 8th
objects. IEEE Int. Conf. AVSS, Aug./Sep. 2011, pp. 36–41.
[6] Q. Fan, P. Gabbur, and S. Pankanti, “Relative attributes for large-
2) Our back-tracing algorithm iteratively tracks the luggage scale abandoned object detection,” in Proc. IEEE ICCV, Dec. 2013,
owner by using spatial-temporal windows to efficiently pp. 2736–2743.
verify left-luggage events. [7] H.-H. Liao, J.-Y. Chang, and L.-G. Chen, “A localized approach to
abandoned luggage detection with foreground-mask sampling,” in Proc.
3) The experimental results show that our approach out- IEEE 5th Int. Conf. AVSS, Sep. 2008, pp. 132–139.
performs previous approaches using the PETS2006 and [8] J. Pan, Q. Fan, and S. Pankanti, “Robust abandoned object detection
AVSS2007 datasets. using region-level analysis,” in Proc. 18th IEEE ICIP, Sep. 2011,
pp. 3597–3600.
4) In addition, we constructed a novel publicly avail- [9] F. Lv, X. Song, B. Wu, V. K. Singh, and R. Nevatia, “Left-luggage
able dataset, entitled ABODA, comprising plentiful detection using Bayesian inference,” in Proc. IEEE Int. Workshop PETS,
abandoned-object detection situations to assist validating 2006, pp. 83–90.
[10] L. Li, R. Luo, R. Ma, W. Huang, and K. Leman, “Evaluation of an
the effectiveness of various approaches for this research IVS system for abandoned object detection on PETS 2006 datasets,”
direction. in Proc. IEEE Workshop PETS, 2006, pp. 91–98.
In the future, we plan to enhance our method for handling [11] E. Auvinet, E. Grossmann, C. Rougier, M. Dahmane, and
J. Meunier, “Left-luggage detection using homographies and
more challenging situations such as sudden changes in lighting simple heuristics,” in Proc. 9th IEEE Int. Workshop PETS, 2006,
and overly crowded scenes. pp. 51–58.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.
1370 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 7, JULY 2015

[12] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models Chu-Song Chen is currently a Research Fellow with
for real-time tracking,” in Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2. the Institute of Information Science and the Research
Jun. 1999, pp. 246–252. Center for IT Innovation, Academia Sinica, Taiwan.
[13] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real- He is an Adjunct Professor with the Graduate
time foreground-background segmentation using codebook model,” Institute of Networking and Multimedia, National
Real-Time Imag., vol. 11, no. 3, pp. 172–185, 2005. Taiwan University. His research interests include
[14] Z. Zivkovic, “Improved adaptive Gaussian mixture model for back- computer vision, signal/image processing, and
ground subtraction,” in Proc. 17th ICPR, 2004, pp. 28–31. pattern recognition. He is on the Governing Board
[15] Y.-T. Chen, C.-S. Chen, C.-R. Huang, and Y.-P. Hung, “Efficient hierar- of the Image Processing and Pattern Recognition
chical method for background subtraction,” Pattern Recognit., vol. 40, Society, Taiwan. He served as an Area Chair
no. 10, pp. 2706–2715, 2007. of ACCV’10 and NBiS’10, the Program Chair
[16] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, of IMV’12 and IMV’13, the Tutorial Chair of ACCV’14, and the General
“Object detection with discriminatively trained part-based models,” Chair of IMEV’14, and will be the Workshop Chair of ACCV’16. He is
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, on the Editorial Board of the Journal of Multimedia (Academy Publisher),
Sep. 2010. Machine Vision and Applications (Springer), and the Journal of Information
[17] C. Dubout and F. Fleuret, “Exact acceleration of linear object detectors,” Science and Engineering.
in Proc. 12th ECCV, 2012, pp. 301–311.
[18] PETS 2006 Dataset. [Online]. Available: http://www.cvg.reading.ac.uk/
PETS2006/data.html, accessed Mar. 17, 2015.
[19] S. Agarwal, A. Awan, and D. Roth, “Learning to detect objects in images
via a sparse, part-based representation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 26, no. 11, pp. 1475–1490, Nov. 2004. Daw-Tung Lin (SM’12) received the B.S. degree
[20] AVSS 2007 Dataset. [Online]. Available: in control engineering from National Chiao Tung
http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html, University, Hsinchu, Taiwan, in 1985, and the M.S.
accessed Mar. 17, 2015 and Ph.D. degrees in electrical engineering from
the University of Maryland, College Park, MD,
USA, in 1990 and 1994, respectively. From 1995
to 2004, he was an Associate Professor with the
Department of Computer Science and Informa-
tion Engineering, Chung Hua University, Taiwan.
He served as the Director of the Computer Center
with Chung Hua University from 2001 to 2003,
the Dean of the College of Engineering with Chung Hua University from
2003 to 2005, the Chair of the Department of Computer Science and
Information Engineering with National Taipei University from 2006 to 2009,
the Director of the Graduate Institute of Communication Engineering with
National Taipei University from 2010 to 2011, and the Dean of Academic
Affairs with National Taipei University from 2011 to 2015. He is currently
Kevin Lin received the B.S. degree in electronics the Dean of Academic Affairs, and a Professor with the Department of
engineering from the National Taiwan University of Computer Science and Information Engineering, National Taipei University,
Science and Technology, Taipei, Taiwan, in 2012, Taipei, Taiwan. He has been with National Taipei University since 2005,
and the M.S. degree from the Graduate Institute where he became a Tenured Professor of Computer Science and Information
of Networking and Multimedia, National Taiwan Engineering in 2009. He has been a regular contributor to the literature in
University, Taipei, in 2014. He is currently a computer vision and image processing. His research interests include image
Research Assistant with the Institute of Information processing, computer vision, pattern recognition, and intelligent surveillance.
Science, Academia Sinica, Taipei. His research
interests include computer vision, pattern
recognition, and machine learning.

Yi-Ping Hung received the B.S. degree in electri-


cal engineering from National Taiwan University,
Taipei, Taiwan, in 1982, the M.S. degree from
the Division of Engineering, Brown University,
Providence, RI, in 1987, the M.S. degree from the
Division of Applied Mathematics, Brown University,
in 1988, and the Ph.D. degree from the Division
of Engineering, Brown University, in 1990. From
1990 to 2002, he was with the Institute of Infor-
mation Science, Academia Sinica, Taipei, where he
became a Tenured Research Fellow in 1997, and
is currently a Joint Research Fellow. He served as the Deputy Director of
Shen-Chi Chen received the B.S. degree in com- the Institute of Information Science from 1996 to 1997, and the Director of
puter science from National Cheng Chi University, in the Graduate Institute of Networking and Multimedia with National Taiwan
2007, and the M.S. degree in biomedical engineering University from 2007 to 2013. He is currently a Professor with the Graduate
from the College of Computer Science, National Institute of Networking and Multimedia, and the Department of Computer
Chiao Tung University, Taiwan, in 2009. He is cur- Science and Information Engineering, National Taiwan University. His current
rently pursuing the Ph.D. degree in computer science research interests include computer vision, pattern recognition, image process-
and information engineering with National Taiwan ing, virtual reality, multimedia, and human–computer interaction. He was
University. His research interests include computer the Program Cochair of ACCV’00 and ICAT’00, and the Workshop Cochair
vision, pattern recognition, surveillance system, and of ICCV’03. He has been an Editorial Board Member of the International
intelligent transportation. Journal of Computer Vision since 2004. He will be the General Chair
of ACCV’16.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on October 06,2020 at 03:08:00 UTC from IEEE Xplore. Restrictions apply.

You might also like