0% found this document useful (0 votes)
28 views13 pages

Mandellos 2011

Uploaded by

CARRILLO GABRIEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

Mandellos 2011

Uploaded by

CARRILLO GABRIEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Expert Systems with Applications 38 (2011) 1619–1631

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

A background subtraction algorithm for detecting and tracking vehicles


Nicholas A. Mandellos a,*, Iphigenia Keramitsoglou b, Chris T. Kiranoudis a
a
Department of Process Analysis and Systems Design, National Technical University of Athens, GR-15780 Athens, Greece
b
Institute for Space Applications and Remote Sensing, National Observatory of Athens, Metaxa & Vas. Pavlou, GR-15236 Athens, Greece

a r t i c l e i n f o a b s t r a c t

Keywords: An innovative system for detecting and extracting vehicles in traffic surveillance scenes is presented. This
Computer vision system involves locating moving objects present in complex road scenes by implementing an advanced
Background subtraction background subtraction methodology. The innovation concerns a histogram-based filtering procedure,
Background reconstruction which collects scatter background information carried in a series of frames, at pixel level, generating reli-
Background maintenance
able instances of the actual background. The proposed algorithm reconstructs a background instance on
Background update
Vehicle detection
demand under any traffic conditions. The background reconstruction algorithm demonstrated a rather
Traffic surveillance robust performance in various operating conditions including unstable lighting, different view-angles
Tracking and congestion.
Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction There is a wide variety of systems based on video and image


processing employing different methodologies to detect vehicles
The escalating increase of contemporary urban and national and objects. A review of such image processing methodologies,
road networks over the last three decades emerged the need of presented in Kastrinaki, Zervakis, and Kalaitzakis (2003), com-
efficient monitoring and management of road traffic. The surface prises thresholding, multi-resolution processing, edge detection,
transportation system of United States for instance, consists of background subtraction and inter-frame differencing. Thresholding
approximately 3.7 million miles of roads, estimated to increase is the simplest process of the above and it was part of the very first
by 30% over the next decade. Environmental pressures as well as automatic surveillance systems in the decades of 1970s and 1980s
socioeconomic problems are associated with this increase due to when such systems were loop detector simulators (Mahmassani,
prolonged congestions and slowing down of the average highway Haas, Zhou, & Peterman, 2001). Those systems had low accuracy
speed. To deal with this problem, one option is to increase network and they are not used nowadays. Multi-resolution processing lies
capacity and the other one is to increase efficiency by investing in on scale space theory (Lindeberg, 1996), that uses coarse and fine
Intelligent Transportation Systems (ITS) technology (Gutchess, Tra- level color pixel information to cluster the image and to separate
jkovic, Kohen-Solal, Lyons, & Jain, 2001). objects from the background. However, it is not accurate enough
Conventional technology for traffic measurements, such as for traffic problems since it breaks parts of the image (i.e. road
inductive loops, sonar or microwave detectors, suffer from serious lines, glares and shadows) and merges them with parts of vehicles
drawbacks: they are expensive to install, they demand traffic dis- having the same chromatic range. Another main drawback is that
ruption during installation or maintenance, they are not portable the system cannot effectively deal with image perspective, there-
and they are unable to detect slow or stationary vehicles. On the fore, vehicles standing away from the camera are under-seg-
contrary, video based systems are easy to install, can be a part of mented and vehicles standing near the camera are over-
ramp meters and may use the existing traffic surveillance infra- segmented. Moreover, it cannot distinguish vehicles in congestion.
structure. Furthermore, they can be easily upgraded and they offer Edge-based methodologies have the main advantage that the ex-
the flexibility to redesign the system and its functionality by sim- tracted features are scale and lighting invariant (Koller, Weber, &
ply changing the system algorithms. Those systems allow vehicle Makik, 1994), but it is quite difficult to derive vehicle shapes espe-
counting, classification, measurement of vehicle’s speed and the cially in congested scenes that vehicles stop frequently. The inter-
identification of traffic incidents (such as accidents or heavy frame differencing methodology is accurate enough to detect parts
congestion). of moving objects by comparing two consecutive frames. However,
it can identify only differences in the background and, as a result, it
detects only parts of a vehicle covering the background in the pre-
* Corresponding author. Tel.: +30 210 772 3128; fax: +30 210 772 3155.
E-mail addresses: [email protected] (N.A. Mandellos), ikeram@space. vious frame. Despite some enhancing techniques (Cucchiaraa &
noa.gr (I. Keramitsoglou), [email protected] (C.T. Kiranoudis). Piccardi, 1999) this methodology cannot satisfactory deal with

0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.07.083
1620 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

realistic traffic circumstances where vehicles might remain still for In this study we present an innovative algorithm, the back-
a long time. Finally, background subtraction detects the actual ground reconstruction algorithm, as part of a system for locating
background and extracts objects that do not belong to it. The con- and tracking vehicles through traffic video captures. The purpose
cept of this method is described below. of the present work is to overcome the two main weaknesses of
In a typical background model a prototype of the image back- the background subtraction algorithm, namely initialization and
ground (an initialization of the background) is considered first background update and to build a robust methodology, capable
and then each pixel of the prototype is compared with the actual of detecting vehicles under realistic traffic circumstances.
image color map. If the color difference exceeds a predefined The background reconstruction algorithm is a heuristic that
threshold it is assumed that this pixel belongs to the foreground. provides a periodically updated background and enhances the effi-
Consequently, raw foreground information is derived. This infor- ciency of the well known background subtraction methodology in
mation is grouped to compact pixel sets (blobs). In case of outdoor case of outdoor captures. Indeed, it is a key process for a typical
scenes, when the background is not completely static, lighting fluc- background subtraction algorithm, because it supports the weak-
tuations, shadows or slight movements (i.e. leaves and branches est part of it, which is the initialization step. This methodology
waving) can degrade the effectiveness of the foreground extrac- guarantees a fresh instance of the actual background periodically,
tion. To overcome this, a number of algorithms have modeled the which is achieved by collecting scatter color information through
aforementioned nuisances. More specifically, mixture models use a series of sequential images and assembling them to reconstruct
statistical filters to eliminate continuous slight movements on the actual background. This process is applied to each pixel sepa-
the background by grouping time evolving pixel characteristics in rately and the result is a color map of the actual image background.
clusters or color prototypes and characterizing as background the Our algorithm is presented as a part of an integrated surveil-
more populated one (Kim, Chalidabhongse, Harwood, & Davis, lance system that can be set up in existing traffic surveillance
2005; Stauffer & Grimson, 1999; Zivkovic & Van der Hijden, infrastructure. This system locates, counts and tracks vehicles in
2006), while parametric models such as the ones proposed by Hari- a variety of lighting conditions such as cloudiness and glares.
taoglu, Harwood, and Davis (2000), Horprasert, Harwood, and Da- Moreover, it adapts quickly to any changes of the background, as
vis (1999), Pless (2005) simulate the background by taking into transition between different lighting conditions (i.e. from cloudi-
account color characteristics. In the work (Horprasert et al., ness to direct sunlight and vice versa), various traffic conditions
1999) each pixel is classified in one of four classes namely: ‘Fore- including stop-and-go traffic flow as well as permanent changes
ground’, ‘Shaded background’, ‘Highlighted background’ and ‘Back- to the background (for instance, when a vehicle has pulled over).
ground’. Thus, the system can ‘recognize’ background This overcomes the weaknesses of previous systems described
discontinuities due to lightings and shadows and consequently above.
register them as background. A typical surveillance system consists of a traffic camera net-
This methodology has the great advantage of separating objects work, which processes captured traffic video on-site and transmits
by using background information even in images that comprise the extracted parameters in real time. In this study we focus on the
shadows or glares (Senior, Tian, Brown, Pankanti, & Bolle, 2001). algorithmic part of such a system.
The main drawback of the background subtraction algorithm is The innovation of this study lies on the ability of the proposed
the complexity to define the background. A common practice is algorithm to reconstruct the actual background color map without
to initialize the algorithm by employing an ‘empty scene’. Another the need of any human intervention even in harsh traffic condi-
important issue in this methodology is the difficulty to maintain tions, such as stop-and-go traffic flow, stopped vehicles (i.e. acci-
the background instance through time in outdoor captures. dent) and rain or snow. In our approach a new background
The creation of a reliable initial instance is a critical issue for the prototype is constructed every 1 or 2 min, restricting the problem
quality of the overall process. A general solution for this problem of background pollution to the interval between two consecutive
does not exist, and the common practice is to average a sequence updates. Each newly recreated background instance is assumed
of frames presenting a scene without moving objects, which in fact to be steady within the update period. Thus, the background in-
is too difficult to acquire in a crowded highway. Despite the impor- stance is used as a prototype in order to separate the foreground
tance of this issue, there has only been limited research published from the image for each image frame within the update period.
focusing on the reconstruction of a starting background instance. This paper is structured as follows: In Section 2 a description of
(Colombari, Cristani, Murino, & Fusiello, 2005; Gutchess et al., the system together with its specifications and the testing arrange-
2001), for instance, are significantly complicated to implement. ment are given. Section 3 presents the Vehicle Detection Unit.
On top of that, they are based on several restrictive assumptions. Emphasis is given to the background reconstruction algorithm
The latter work in particular refers to an inpainting technique, which is analyzed in detail. In Section 4 the Tracking Unit is pre-
(Criminisi, Perez, & Toyama, 2004) where background parts are sented. In Section 5 we present our experiments, which aim to sup-
reconstructed exploiting color and texture information. port the basic assumptions of this work and to evaluate the
In outdoor captures, the background prototype often fails to re- developed background reconstruction algorithm. Finally, in Section
flect the actual background due to lighting condition changes, sha- 6 we summarize our results and we present our conclusions.
dow casting with respect to the sun position and background
alterations with permanent effect. Moreover, the insertion of new
objects in the road scene can induce permanent or temporary 2. System conception
changes of the background (e.g. a vehicle that has been pulled over
for a long time or an object in the road deck). Common practice in The innovative algorithm of background reconstruction is part
such cases is to use adaptive update models, such as those of Toy- of a contemporary and realistic surveillance system. The integrated
ama, Krumm, Brumitt, and Meyers (1999), Gupte, Masoud, Martin, system locates, tracks and extracts traffic parameters in real time.
and Papanikolopoulos (2002), Wren, Azarbayejani, Darrell, and Furthermore, the system can utilize any existing traffic surveil-
Pentland, (1997), that keep the background template recursively lance infrastructure without further modification or tuning (except
updated, so that, the background template is adapted in forthcom- for the camera calibration that calculates image metrics).
ing image changes. Nevertheless, in most cases, after some time A typical road traffic surveillance infrastructure consists of a
the noise pollution of the background results into the degradation camera network that has the ability to transmit images in real time
of the overall process quality. to a central operational center. The processing of the images can be
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1621

carried out on-site saving valuable network bandwidth as it trans- adaptive filter that is applied in pixel level and proposed by Toy-
mits only the outcome of the calculations. Else, the whole process ama et al. (1999):
can be performed either in real time video streaming from an oper-
Bt ¼ ð1  aÞBt1 þ aIt ð1Þ
ational center or in already stored video material.
In such network installation, the cameras must be sited approx- where Bt is the color vector of the background model in the t frame,
imately 10–15 m or more above road level to minimize the effect I is the actual color vector of the same pixel in the frame t and a is
of occlusion. The system must be adaptive to a series of perturba- the coefficient that declares the rate of adaptation with values in
tions that may affect the clarity of the captured video, such as the range 0–1.
vibrations of the camera and slow changes in the background In the main flow of the detection unit, the raw foreground infor-
due to lighting conditions, (Mimbela & Klein, 2000). mation is derived by a background subtraction procedure, Fig. 1.
In order to simulate the algorithmic part of an integrated road The result of this step is a set of partly connected pixels, which
traffic surveillance system, we used the following arrangement: must be further processed in order to form compact objects (clus-
A commercial CSS DV camcorder was installed about 10 m above tering/convex hull, Fig. 1). If those pixels lie along the ‘‘Entrance
road level and was sited above the central lane of the road, facing Zone” (Fig. 2) a region growing algorithm (Davies, 2005) merges
the traffic at an angle of 65o. The characteristics of this camera are all those pixels that potentially belong to a common vehicle do-
as follows: main. Else, if the pixels from the background subtraction procedure
lie within the Main Area, then they are further processed to form
 48 mm focal length equivalent to 35 mm camera, which defines blobs (connected pixels that form a shape) using a convex hull pro-
a 27° vertical and 40° horizontal angle of view; cedure (Section 3.3). The detected blobs from this process are
 25 fps of 720  576 pixels in PAL video format. merged to form objects. Merging occurs to blobs that lie partly or
totally within the frontiers of a recognized vehicle shape from
Some predefined spots, on each test scene, were chosen in order the previous frame t  1, whose position has been appropriately
to calibrate the camera according to DLT (direct linear transforma- corrected for frame t (vehicle matching – diagram of Fig. 1). Candi-
tion), a method originally reported in Abdel-Aziz and Karara date vehicles are recognized by a cognitive clustering procedure
(1971). The calibration of the camera defines the relationship be- (classifier – diagram of Fig. 1). This cognitive clustering process
tween the ‘real-world’ and the pixel matrix of the digital image. has the following concept: candidate object is considered to be a
The architecture of the proposed system is described in Fig. 1. vehicle only if its location is consistent with its prior calculated tra-
The system consists of two units namely the Vehicle Detection Unit jectory and detected object dimensions remain unchanged through
and the Tracking Unit, the latter being indicated in gray color. Fig. 1 frames. A vehicle that does not match the previous frame, or seems
shows that first, a series of frames (raw traffic capture) enters the to have an irregular trajectory must be rejected (i.e. a backward
Vehicle Detection Unit (presuming that an initial background tem- vehicle movement that cannot be explained by its trajectory).
plate has been created). Subsequently, the stream of frames feeds Occlusion can be handled by simple rules of merging and split-
the background reconstruction algorithm in order to create the ting vehicle domains regarding their trajectory. Each detected
next background template that replaces the current after a prede- vehicle belongs to one of the following classes: ‘vehicle’, ‘large
fined number of frames. While a new background template is cre- vehicle’ or ‘non-vehicle object’. ‘non-vehicle’ objects are not further
ated, the background in use is maintained using the simple tracked and are ignored by the system. Finally, if a previously de-

Fig. 1. Flow diagram of the algorithm.


1622 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

Our approach is different from previously published works, in


terms of the background handling during the initialization and
maintenance step. The proposed method focuses on the calculation
and reconstruction of the background template, based on pixel-le-
vel information obtained scatterly from a series of consecutive im-
age frames. On one hand, this mechanism allows the background
subtraction process to periodically obtain an updated background
instance regardless the probable presence of foreground objects,
and on the other hand, it guarantees the initialization of the back-
ground subtraction algorithm by providing an initial background
instance under any circumstances. Hence, our approach unifies
the first and last step of a typical background subtraction proce-
dure. The following subsections present in detail the main algorith-
mic procedures proposed in this article for vehicle detection.

3.1. Background reconstruction

We propose a probabilistic algorithm to reconstruct the back-


ground of a traffic scene by eliminating the moving object informa-
Fig. 2. The calculations take place in the pixels included in the active image area to tion. According to this, the background color information of
save calculation time and increase efficiency. The active image area is divided to
crowded scenes is dynamically retrieved by assessing color varia-
Main Area and to the entrance and exit zones.
tion per pixel through a series of frames. The overall idea is based
on the notion that a specific location is occupied by moving objects
tected vehicle touches the exit zone it is totally omitted from any
for a time period shorter than that for which it remains unoccupied.
further processing.
The implementation of the algorithm has been done applying
Regarding the Tracking Unit, the vehicles are identified through
the Luv color system (the first uniform color space adapted by
sequential frames (vehicle matching, Fig. 1) by maximizing the
the International Commission on Illumination in 1976), whose
overlapping surface of the shape corresponding to frame t and that
coordinates are related to the RGB values by non linear transforma-
of t  1 (Criminisi et al., 2004). An array of the detected locations
tions. The L parameter is the lightness coordinate and the chro-
for each vehicle per frame is derived, which is then used to derive
matic information is carried in the u, v parameters. This color
the vehicles trajectory. However, the calculated trajectories are ex-
system has been chosen because it defines a uniform color space
pected to be distorted due to image noise. For this reason, a set of
with the perceived color differences measured by Euclidian dis-
2D-motion Kalman filter equations is applied to the extracted tra-
tances (Comaniciu & Meer, 1997).
jectories to make them smooth and coherent. The results from the
Loosely speaking, if a specific pixel in a series of frames would
tracking procedure feed the next step of the detection unit by pro-
‘vote’ for its color property, the majority of the ‘votes’ is expected
viding the necessary information for the clustering and classifica-
to be concentrated in the chromatic neighborhood of the actual
tion procedures.
background. In mathematical terms, this ‘voting’ schema is de-
scribed in detail in the following paragraphs.
3. Vehicle detection Let U denote the discrete color space, U  fl ¼ ½L =h; u ¼ ½u =h;
v ¼ ½v  =hg 2 Z3 (where ‘[]’ is the floor operator and h is the chro-
The system is based on the well known algorithm of back- matic distance defined by the bin dimensions), which is generated
ground subtraction, as mentioned in previous paragraphs. Typi- from the continuous Luv color space U  fL ; u ; v  2 R3 g, by
cally a background subtraction algorithm is carried out in three considering cubic bins bl,u,v, fl; u; v g 2 N 3 , all edges of which have
steps: length equal to h. Each discretization element bl,u,v is responsible
for a continuous chromatic range of colors, where l 6 L/h < l + 1,
 Initialization of the background. u 6 u*/h < u + 1 and v 6 v/h < v + 1, with the value that corresponds
 Foreground extraction. to the discrete color parameters l = [L/h], u = [u/h], v = [v/h].
 Background maintenance. Given a video capture that consists of F sequential frames of res-

olution n  m, let Iij ðtÞ ¼ ðILij ðtÞ; Iuij ðtÞ; Ivij ðtÞÞ, denote the color vector
 


at the (i, j) pixel of the frame at time t, ILij ðtÞ; Iuij ðtÞ and Ivij ðtÞ denote
 
The most popular algorithms for background extraction found in
  
the literature and used for comparison purposes in the current work the L , u , v elements of Iij(t) respectively, and B = {Bij} the back-
are the Mixture Of Gaussians Model (MOG, Stauffer & Grimson, ground color map. The pixel (i, j) color variation with respect to
1999, (Zivkovic & Van der Hijden, 2006) and the Codebook model time is estimated by a sampling procedure, where the color values
(Kim et al., 2005). The MOG methodology models each pixel history Iij(t) obtained by T consecutive frames, starting from t0, are col-
as a cluster of Gaussian type distributions and uses an on-line lected. Thus, the temporal sample Sij ðt 0 Þ ¼ ðIij ðt0 Þ; Iij ðt0 þ 1Þ; . . . ;
approximation to update its parameters. According to this, the Iij ðt 0 þ T  1ÞÞ of pixel (i, j) defines the frequency ^f ij ðl; u; v Þ of the
background is found as the expected value of the distribution corre- examined pixel having color value belonging into the bl,u,v bin:
sponding to the most populated cluster (Stauffer & Grimson, 1999). " L #! " u #! " v  #!
t0X
þT1
This methodology is greatly improved on grounds of performance ^f ij ðl; u; v Þ ¼ Iij ðtÞ Iij ðtÞ I ðtÞ
d l d u d v  ij
by considering recursive equations to adaptively update the param- t¼t0
h h h
eters of the Gaussian model (Zivkovic & Van der Hijden, 2006).
ð2Þ
According to the Codebook model (Kim et al., 2005), sample back-
ground values at each pixel are quantized into codebooks that rep- where l; u; v 2 N, d() is the kronecher delta function.
resent a compressed form of background model for a long image The frequency ^f ij ðlm ; um ; v m Þ within the mode bin blm ;um ;v m cor-
sequence. The codebook is enriched in new codewords in the pres- responds to the most persistent color Im ¼ ðlm ; um ; v m Þ in a se-
ence of a new color that cannot be assigned to the existing groups. quence of T frames for pixel (i, j). For this reason, our
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1623

approach assumes that this color represents the actual back- X


umax vX
max
^f ðlÞ ðlÞ ¼ ^f ðl; u; v Þ;
ij
ground Bij of point (i, j). Thus, the reconstruction of the back- ij
u¼umin v ¼v min
ground is the problem of extracting the mode for each of the
n  mSij samples: X
lmax X
umax
^f ðuÞ ðuÞ ¼ ^f ðl; u; v Þ;
ij ij ð4Þ
l¼lmin u¼umin
Bij ¼ arg maxb^f ij ðl; u; v Þc ¼ ðlm ; um ; v m Þ ð3Þ
X
lmax vX
max

3
^f ðv Þ ðv Þ ¼ ^f ðl; u; v Þ;
The methodology described is of O(n ) complexity in terms of ij ij
l¼lmin v ¼v min
memory and O(sn3) in terms of calculations involved (where s
denotes the total number of frames in the sample and n repre-
where lmax, lmin, umax, umin, vmax and vmin are the maximum and
sents the magnitude of discretization for each color parameter).
minimum values of the discrete l, u, v parameters respectively.
In normal traffic conditions a 100–200-frames sample corre-
The calculation of the frequencies above Eq. (4) provides infor-
sponding to 4–8 s of traffic observation is adequate for the iden-
mation on the reconstruction of the background. According to the
tification of the actual background. However, in general
proposed methodology, the most persistent color value in a se-
conditions where the vehicle flow is dense having low speed
quence of frames, for a specific pixel, is the one that is most likely
and/or involving stop-and-go behavior, the demanded sample
to represent the actual background, and can also be calculated by
size is expected to be higher. Our tests in such conditions
maximizing Eq. (2). Alternatively, it can be approximated by com-
showed that an average of 1250 frames, corresponding to
posing an artificial color, which is composed by each one of the fre-
1 min capture length may be required to reliably reconstruct
quency modes (lmode, umode, vmode) maximizing Eq. (4):
the actual background. In these cases the sample size includes
a vast volume of information and therefore demands an in-  h i h i h i
ðv Þ
Bij ¼ arg max ^f ij ðlÞ ; arg max ^f ij ðuÞ ; arg max ^f ij ðv Þ
ðlÞ ðuÞ
creased memory capacity, which may be prohibitive for the de-
sign and operation of the system.
¼ ðlmode ; umode ; v mode Þ ð5Þ
The limitations posed by the hardware motivated us to seek for
a more efficient way to solve the problem while keeping the mem- The whole idea is implemented on the reconstruction of the ac-
ory usage and the required amount of calculations within accept- tual background based on the clustering of pixel temporal color
able limits. Towards this goal, our research focused on a different values into two basic classes: ‘Background’ and ‘non-background’.
approach of managing the discrete temporal chromatic informa- One of the most efficient methodologies for clustering color infor-
tion l, u, v of the chromatic space U. Thus, we calculated the fre- mation is the popular methodology of mean-shift, introduced by
ðv Þ
quencies ^f ij ðlÞ; ^f ij ðuÞ; ^f ij ðv Þ for each l, u, v parameter separately,
ðlÞ ðuÞ
Abdel-Aziz (1971). However, this methodology involves a vast
through the following summations: amount of calculations for the set of color values that correspond

Fig. 3. Illustrative example of the principles of the background reconstruction methodology: in a 2D distribution (vu plane) of image color values, the majority is
concentrated around a value (mode) that represents the background color.
1624 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

to a single pixel, making a solution for the whole image not realis- In our work the chromatic difference between the current frame
tic. Additionally, the memory required is of the same complexity as and the background model Bij is defined by a norm that combines
the one required for Eq. (2). the difference in lightness L with the chromatic difference of the
To overcome this obstacle we propose to use Eq. (4) to carry out u, v parameters in Luv color space. The foreground mask Mij
this clustering in a more flexible manner that is alleviated by the is calculated then by the following relation:
main characteristics of the problem: (i) the distribution of sam- (   
 
1; ILij  BLij  > threshold ^ jjIuij ;v  Buij ;v jj > threshold
   

pling in color space U is extremely sparse as each sample involves, M ij ¼


say, in a very extreme case, 20,000 color values compared to a total 0; elsewhere
of 414,720 color values involved in a common PAL 720  576 pix-
ð7Þ
els image, (ii) the majority of these values is concentrated in the
jjIuij ;v  Buij ;v jj
   
vicinity of the background color cluster (see Fig. 3), while color val- where is the Euclidean Norm in terms of chromatic
 
ues representing other objects are scattered throughout the color parameters u , v of the current frame and the background model.
space. The pixels belonging to the foreground mask are grouped to-
The main goal is to detect the background color cluster and to gether to form connected components. Usually the connected com-
locate its mode (the local maximum of a distribution of values). ponents are further processed in order to remove holes or other
In this case the concentration of values in the background cluster irregularities. Although, the most common practice is application
can be roughly estimated by the integration of frequencies Eq. of a morphological filter (Davies, 2005), it demands valuable calcu-
(4) because the overall contribution of other color clusters to the lation time. Thus, we have utilized a convex hull algorithm for
calculations is negligible. shaping and forming objects (see next Section 3.3).
To better illustrate the mode location based on our methodol-
ogy we shall employ the example of Fig. 3. The background color 3.3. Shaping and clustering
cluster is the dominant in the distribution and this becomes obvi-
ous in this example: The color values are accumulated around a At this step, the extracted foreground segments belonging to a
core, where the density of color values forms a steep peak. On common object are grouped and shaped. The grouping process is
the contrary, the foreground colors tend to be distributed following a complex procedure, especially for vehicles that have just passed
the universal distribution and for that reason they do not form the entrance zone (see Fig. 2) for which there is no prior informa-
remarkable value concentration around a color (attractors). tion on their trajectory. In this case, the segments are grouped
ðv Þ
The calculation of frequencies ^f ij ðlÞ; ^f ij ðuÞ; ^f ij ðv Þ of Eq. (4) re-
ðlÞ ðuÞ
based on their spatial characteristics via a region growing algo-
quires first the calculation and storage of the overall frequency rithm. The vehicles that have already passed the entrance zone
function ^f ij ðl; u; v Þ of Eq. (2), which is of O(n3) complexity in terms prior information is available and can be appropriately utilized in
of memory and O(sn3) in terms of calculations involved. For that order to group segments that belong to the same vehicle. The
ðv Þ grouped segments are further processed to form compact and con-
reason the frequencies ^f ðlÞ; ^f ðuÞ; ^f ðv Þ can be equally derived
ðlÞ ðuÞ
ij ij ij

vex vehicle shapes via an appropriate convex hull algorithm, as de-
from the histograms HLl ; Huu ; Hvv :
 
scribed below.
" L #!
t 0X
þT1
 Iij 3.3.1. Convex hull
HLl  ^f ij ðlÞ ¼
ðlÞ
d l
t¼t 0
h The extracted objects (connected components) usually are not
" u # ! compact and their shapes are likely to be non-convex with a bro-
t0X þT1
u ^ ðuÞ Iij
Hu  f ij ðuÞ ¼ d u ð6Þ ken surface having holes and cavities or/and broken into two or
t¼t0
h more pieces. In many cases the extracted objects are just artifacts
" v  #!
t 0X
þT1
Iij of noise. This can lead to miscalculation of the amount of vehicles
ðv Þ
Hvv  ^f ij ðv Þ ¼

d v existing in the image frame and inconsistency with previous
t¼t
h
0 images. For instance an artifact of noise can be perceived as a vehi-
cle that only appears in one or more frames and suddenly vanishes
where l, u, v e N and d() is the kronecher delta function.
in the next one (‘ghosts’). A popular technique to repair this kind of
The calculations involved in Eq. (6) are of O(n) complexity in
problem is morphological filters (erosion, dilution and combina-
terms of memory and O(tn) in terms of calculations involved (t de-
tions of them). We have chosen to use a convex hull technique to
notes the time size of the sample and n represents the magnitude
deal with doughnut-like object that are commonly encountered
of discretization for each color parameter). Hence, the proposed
in scenes where vehicles participate. In addition the O(nlog n) com-
methodology is realistic in design and operational level because
plexity of the convex hull algorithm allows very fast computations.
it does not depend on a predefined empty scene (initialization
To avoid such undesirable events a filtering procedure is repeat-
step), but it dynamically calculates the background template. Fur-
edly applied that is based on a convex hull algorithm. In our ap-
thermore, the complexity reduction of problem Eq. (2) to problem
proach a Graham Scan convex hull algorithm (Graham & Yao,
Eq. (6) makes possible the real time operation of the system under
1983) is employed in order to plot the convex hull for each set of
typical hardware infrastructure.
pixels and finally to form a connected component. This algorithm
is applied repetitively until there are no more sets of pixels to be
3.2. Foreground extraction merged. Furthermore, after clustering has grouped all convex seg-
ments a hull algorithm is applied to form convex compact objects.
The foreground extraction is one of the standard procedures of a The outcome of this process is a mask of convex polygons (first and
typical background subtraction algorithm. In this stage, the fore- second row of Fig. 4). These polygons represent compact vehicles,
ground is being extracted by comparing each frame with the in- but in many cases these polygons correspond to vehicle segments
stance of the background. The simplest way to perform this that can be merged through the following clustering procedure.
operation is to calculate the chromatic difference for each pixel be-
tween the current frame and the background template. Thus, each 3.3.2. Clustering
pixel for which the chromatic difference is greater than a prede- The outcome of the foreground extraction procedure is more
fined threshold is classified as the foreground mask Mij. likely to be vehicle segments than compact vehicle shapes. This
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1625

whether the vehicle position is consistent with vehicle motion


and recorded trajectory. The new position of the object should sat-
isfy the motion model of the vehicle, which is derived from the re-
corded trajectory. Same as before, its dimensions should match.
Otherwise, the detected object is rejected.
In case of considerable discrepancies of dimensions it is exam-
ined whether the vehicle under consideration occludes another
one. The rules for the occlusion are adopted from Criminisi et al.
(2004), where a graph is constructed that associates the nodes
Ci,t1 (vehicle i at frame t  1) with the detected objects Pi,t1 (ob-
ject i at frame t). Subsequently, the objects can be merged or split-
ted. A merging is the occlusion of two (or more vehicles) in the
current frame, whereas a split is when two previously occluded
vehicles are separated. If the detected objects are consistent with
the association graph, then the detected object is approved to be
a vehicle.
However, the classification of the vehicle in one of the afore-
mentioned classes is performed only after its first contact with
the exit zone. It is then that the length of the detected vehicle is
compared with the length of a prototype; if their ratio is much lar-
ger than one then it is classified as a ‘large vehicle’, else as a
Fig. 4. Clustering procedure: first row: group of pixels extracted from the ‘vehicle’.
background subtraction procedure; second row: the convex hull of the first row;
and third row: clustering using the expected positions of the vehicles regarding
their trajectory and application of the convex hull algorithm to form compact
4. Tracking
objects.

Tracking is a very important issue in computer vision. Recently,


is a result of shadows, glares and vehicle colors that are quite sim- there is a profound interest in surveillance applications. The aim of
ilar to the background. Thus, the clustering procedure aims to tracking in computer vision is to recognize and locate a prototype
group such segments to form a unique and compact vehicle. To in a series of sequential frames. A lot of applications are based on
achieve this goal prior information is utilized. The trajectories of al- tracking such as video processing, security, surveillance and auto-
ready detected vehicles in previous frames are used to calculate matic procedures. In our case, we need to track multiple vehicles to
vehicles motion and consequently to estimate their locations in record their trajectory and derive relevant information such as
the current frame. Then, we merge the segments of this frame that vehicle speed, direction and driver behavior. Such tracking meth-
contact the traces of the estimated vehicle locations (see Fig. 4). odologies are the mean-shift algorithm and template matching.
When a vehicle is entering the image (entrance zone – Fig. 2) no Mean-shift algorithm is originally introduced by Comaniciu as a
prior information is available (e.g. dimensions and trajectory). For segmentation methodology (Abdel-Aziz, 1971) before it was
that reason, the entering segments are simply clustered by a pop- appropriately modified to a robust tracking system (Comaniciu, Ra-
ular clustering algorithm namely, region growing. Region growing mesh, & Meer, 2000). The main idea for a mean-shift tracking algo-
algorithm is a heuristic for clustering segments based on their loca- rithm is to plot a 2D probability space where an object template
tions. It lies on the simple assumption that two segments are can be located. Similarly, the template matching algorithm aims
merged if the eventually formed shape can belong to a vehicle to locate the maximum in a 2D probability space in order to specify
according to its dimensions. the location of a predefined template. Although, the two method-
ologies have the same principles: the mean-shift accelerates the
procedure by comparing histograms instead of template pixel by
3.3.3. Shadows and glares
pixel comparison. Both template matching and mean-shift algo-
We have used the approach of Horprasert et al. (1999) imple-
rithms are robust in tracking a predefined template. The main
mented in LUV space. According to this we compare the chromatic-
weakness of these algorithms is the lack of flexibility when track-
ity and brightness of a pixel with the corresponding pixel of the
ing is influenced by image perspective.
background model. The chromaticity of a pixel is defined by the
More specifically, the template of a vehicle changes both in size
UV component, while the brightness is determined by the lightness
and resolution while passing through the image active area. More-
component L. In case that the chromaticity is similar but the
over, a template does not consist of the vehicle figure only, but
brightness differs, we distinguish two cases: the brightness differs
usually part of the background is also present in the template.
significantly from the background, in that case this pixel cannot be
The template becomes less accurate when the vehicle is located
classified as a shadow or glare, and otherwise if the brightness is
in the depth of the image. This deteriorates the efficiency and the
less than the background it is classified as a shadow else if the
accuracy of this process. Empirically, we found that our simple
brightness is higher than the background it is classified as a glare.
matching algorithm is more efficient than such a complex tracking
algorithm.
3.4. Classification and occlusion handling We tested template matching and mean-shift in order to en-
hance the vehicle matching procedure but found that both algo-
The system classifier distinguishes the detected vehicles into rithms presented problems. The first drawback was that they
two classes, namely ‘vehicle’ and ‘large vehicle’, by assessing their demanded a large amount of calculations and they suffered insta-
dimension and trajectory. The classifier rules are based on the fol- bility problems. These result in loss of the tracking object. The
lowing simple assumptions: Firstly, it is certified that the detected main cause for instability is the template update: while the vehicle
object was present in the previous frame and the two shapes (pre- moves towards the camera its shape becomes larger, and as a con-
vious and current) match (see Section 4). Then, it is examined sequence its resolution becomes better than the template’s. Hence,
1626 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

a simple matching methodology was found to meet better the having mask M1 matches with vehicle V 01 having mask M 01 from
needs of this problem. previous frame only if M 1 \ M 01 –£. If there is a conflict between
The matching procedure adopted in this study is similar to two vehicles then the matching vehicle is the one that maximizes
(Criminisi et al., 2004) and is based on the assumption that the the common surface.
next position for a vehicle can be estimated by its motion. Accord- Even with the most accurate algorithm for locating templates in
ing to this, we estimate the positions of previous frame vehicles an image, small drifts and miscalculations due to conversion of dis-
and we draw their traces in the current frame. Then a vehicle V1 tances in the image discrete space into real conditions result in

Fig. 5. The snapshots above were taken from the four captures used for evaluation. In each snapshot, five pixel positions (1–5) have been chosen (three pixels at the front of
the image and two at the back). Note that some of the pixels are sited over the white stripe of the road in order to study the behavior of the system in a color different from the
asphalt. Evaluation results are presented in Tables 1 and 2 as well as at Fig. 6.

Fig. 6. The 2D-topology of the pixel series (PCk, k = 0, 1, . . .) of Scene I at pixel 1 of Fig. 5: top, middle and bottom rows present ‘‘background”, ‘‘foreground” and all elements
respectively. The side bar graphs in each topology correspond to the color parameter histogram.
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1627

small errors in measurements. In order to form a more accurate three at the front. For each pixel position of a specific scene two ar-
and smooth trajectory for each vehicle a Kalman filter algorithm rays were constructed, as described below:
is employed. The first array is a collection of color values PC pk;s (PC = [pixel col-
Kalman filter employs a procedure that a state variable is repet- or, class], p = testing pixel, k = array index = 1, 2, . . . , s = scene)
itively predicted according to a theoretical model and is subse- appropriately classified as one of the following classes: ‘Fore-
quently corrected by an actual measurement. The state variable ground’ or ‘Background’. This array comprises a sampling of the
of the described system is a vector of vehicle location, speed and color values collected at the pre-selected pixels for each testing
length. In our approach we assumed simple constant straight mo- scene taken in a 500 frames interval. According to this, the 1st,
tion along the direction of the road, a rational approach for traffic 500th, 1000th, . . . frames (of each testing scene and each pre-se-
in avenues and national roads. In addition, we assume constant lected pixel position) were manually extracted and classified to
vehicle length for our kinematic model along its trajectory. construct the PC pk;s array.
The second array consists of the background color values at the
pre-selected pixels of the testing scenes BGpk;s (BG = background col-
5. Experimental results – evaluation or, p = testing pixel, k = array index = 1, 2, . . . , s = scene). As in the
first array, the background color values of the 1st, 500th,
In order to validate the effectiveness of the background recon- 1000th, ... frame (of each testing scene and each pre-selected pixel)
struction algorithm we created an evaluation process which aims were recorded in order to form the BGpk;s array whenever this was
to provide evidence supporting the color that is more frequent at possible (when the testing pixel was not obstructed by foreground
a specific pixel in a series of frames is more likely to belong to objects). If the testing pixel was obstructed by a foreground object
background rather than foreground. we sought for the nearest frame where the testing pixel could
For that reason, we created a testing group of four video cap- clearly be defined.
tures (Scene I-79,000 frames, Scene II-29,000 frames, Scene III- The graphical representation of the first array PC pk;s (testing
28,000 frames and Scene IV-7000, see Fig. 5). In each scene five pix- scene I, pixel 1) is presented in Fig. 6. The topology is analyzed into
el positions have been chosen, two at the back of the image and the three combinations of planes: vu, uL and vL and for each

Table 1
Comparison with other models.

Scenes MOG (Stauffer & Grimson, 1999; Zivkovic & Codebook Kim et al., This work
Van der Hijden, 2006) 2005
FG (%) BG (%) FG (%) BG (%) FG (%) BG (%)
Scene I
E94 – direction Elefsina stop-and-go traffic conditions (79,000 frames 74.3 99.6 92.5 93.6 97.1 98.3
capture)
Scene II
E94 – direction El.Venizelos normal traffic conditions (29,000 frames 82.4 98.7 94.9 92.2 95.0 99.0
capture)
Scene III
E94 – direction El.Venizelos normal and dense flow traffic conditions 80.7 98.7 93.4 91.1 94.2 98.7
(28,000 frames capture)
Scene IV
E75 – direction lamia dense normal and dense flow traffic conditions (7000 77.0 96.6 88.1 93.8 91.2 97.1
frames capture)

Fig. 7. Visual presentation of compared methodologies (see Table 1).


1628 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

Table 2
Background Reconstruction process outcome.

Scenes Test pixel 1 of Fig. 5 Test pixel 5 of Fig. 5


Mean color value (first row), standard deviation (second row) Mean color value (first row), standard deviation (second row)
L* u* v* L* u* v*
Experimental Our Experimental Our Experimental Our Experimental Our Experimental Our Experimental Our
work work work work work work
500-frames sample background reconstruction
Scene I
E94 – direction 65.7 66.6 0.5 0.4 4.3 4.5 73.9 75.2 1.1 0.5 3.1 3.5
Elefsina stop-and- 2.1 8.7 1.3 1.8 2.0 2.8 2.1 2.0 12.8 3.7 8.7 2.8
go traffic
conditions
(79,000 frames
capture)
Scene II
E94 – direction 60.0 60.4 3.9 4.0 2.5 2.9 71.0 71.5 3.1 3.2 0.1 0.2
El.Venizelos 1.7 0.7 2.1 1.2 2.3 1.1 1.7 2.3 0.7 1.1 0.7 1.1
normal traffic
conditions
(29,000 frames
capture)
Scene III
E94 – direction 59.7 60.1 0.8 1.6 2.0 1.7 74.2 74.4 3.8 3.7 6.2 6.2
El.Venizelos 1.7 1.2 1.4 1.2 2.2 2.0 1.7 2.2 1.2 2.0 1.2 2.0
normal and dense
flow traffic
conditions
(28,000 frames
capture)
Scene IV
E75 – direction lamia 63.7 63.8 0.3 0.3 3.5 3.9 81.0 87.8 1.9 2.7 12.0 14.3
dense normal and 0.6 0.5 1.3 0.3 0.8 0.3 0.6 0.8 0.5 0.3 0.5 0.3
dense flow traffic
conditions (7000
frames capture)

2500-frames sample background reconstruction


Scene I
E94 – direction 66.8 66.0 0.7 0.5 4.8 4.5 75.6 75.1 1.4 0.6 4.2 3.4
Elefsina stop-and- 1.2 1.2 1.1 0.6 1.7 1.4 1.2 1.7 15.2 2.0 1.2 1.4
go traffic
conditions
(79,000 frames
capture)
Scene II
E94 – direction 59.5 60.3 3.0 4.1 1.9 2.9 71.7 71.4 2.9 3.2 0.7 0.3
El.Venizelos 2.0 0.8 1.3 1.3 2.2 0.9 2.0 2.2 0.9 0.7 0.8 0.9
normal traffic
conditions
(29,000 frames
capture)
Scene III
E94 – direction 59.5 60.0 0.7 1.6 3.0 1.9 73.3 74.2 3.9 3.6 5.7 6.2
El.Venizelos 0.7 1.4 0.5 1.1 1.7 2.0 0.7 1.7 1.3 1.9 1.4 2.0
normal and dense
flow traffic
conditions
(28,000 frames
capture)
Scene IV
E75 – direction lamia 63.0 64.3 1.4 0.3 3.0 4.0 85.7 87.6 1.8 2.7 14.3 14.3
dense normal and 0.4 0.7 0.4 0.4 0.6 0.0 0.4 0.6 0.7 0.4 0.7 0.0
dense flow traffic
conditions (7000
frames capture)

parameter L*,u*,v* the corresponding histogram HL, Hu, Hv is gen- second row respectively. In the last row the PC pk;s series are illus-
erated. In each diagram the darkest areas represent high concen- trated independent of their classification.
tration of values, which also corresponds to high values at the It can be clearly seen that in the first row the distribution of the
side histograms. background color values is densely populated around a central va-
The distribution of PC pk;s array elements that have been classified lue, which is the mode of this distribution. The mode can also be
as ‘Background’ and ‘Foreground’ are presented in the first and located from the side bar graphs, where the bars are steeply
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1629

increased around a narrow interval. On the contrary, in the second flat with multiple modes dispersed uniformly in the 2D parameter
row (foreground values distribution), the side bar graphs tend to be space. When the two distributions are mixed, the color that is

Fig. 8. Background reconstruction process in a 500 frames interval.


1630 N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631

present in the majority of frames (so its color value is distributed in innovation of this study lies on a new algorithm for reconstruction
a narrow zone of a central value) is the background color. More- of the actual background. This algorithm is based on statistical col-
over, the side 1D histograms HL, Hu, Hv can precisely locate the or sampling per pixel over time. This algorithm is robust in recon-
background color (using Eq. (5) providing almost the same results structing the actual background, even in real time. This was
as in the 2D distributions. achieved due to algorithm’s low complexity: O(n) complexity in
In order to test further our methodology we carried out the fol- terms of memory and O(tn) in terms of calculations involved (t de-
lowing test: We processed the data of Scenes I–IV by applying the notes the time size of the sample and n represents the magnitude
background reconstruction algorithm to implement the 3D and 1D of discretization for each color parameter). The experiments car-
problem – Eqs. (3) and (5), respectively for a specific test frame, ried out showed that the proposed algorithm is capable of real time
chosen for each scene (I–IV, Fig. 5). Moreover, the proposed back- operational working due to its low complexity.
ground subtraction algorithm was tested against two of the most The reconstruction of a new background instance, wherever this
popular algorithms found in literature, that is to say MOG and is required, enhances the typical background subtraction algo-
codebook. For the case of MOG model the Mahalanobis distance rithm. Thus, in our approach the implementation of the back-
was used to account for problems where the standard deviation ground subtraction does not depend on an initial background
of the Gaussian distribution is high. The results are presented in instance and for that it has broadened its applicability. One of
Table 1. For all scenes, the percentage of successfully detecting the main advantages of the proposed system is that it can be ap-
the foreground and background pixels is given for all methods plied in an existing traffic surveillance system without substantial
tested. The methodology proposed outperforms all previous algo- modifications and the background reconstruction algorithm allows
rithms. Visual presentation of the results is given in Fig. 7 (Scene the unobstructed operation of the system without human inter-
III was left out since the actual scene data were similar to scene vention. The system works well either in real time mode or in al-
II – same location). In addition, the performance of the suggested ready stored video.
algorithm was faster since it did not involve the computational The testing arrangement used, which simulates the operation of
burden of adopting the MOG cluster parameters or the enrichment a traffic surveillance system, was found to work satisfactory in out-
of codebook codewords. door diverse lighting conditions. In all cases background recon-
Background reconstruction is a statistical methodology, there- struction algorithm managed to accurately reconstruct the actual
fore the identification of the sample size is important. In general, background in various harsh conditions including heavy conges-
the sample size should be large enough to carry enough informa- tion and changes in the lighting. This methodology added robust-
tion for extracting the background color in each image pixel. To ness to the traditional background subtraction algorithm and
achieve this goal the sample size, in terms of time, should exceed overcame known instability issues.
the average time that a passing vehicle occupies any pixel in the In future work, we aim to focus on night surveillance, where
image. In our test we chose a 500 frames and a 2500 frames sam- some primary tests leave space for improvement on the existing
ple, translated in terms of time to a 20 and 100 s exposure algorithms reported in literature. However, the other modules of
correspondingly. our proposed system should be improved, focusing on the occlu-
The 500 frames sample is quite satisfactory for a highway sion handling and vehicle matching procedure. Moreover, it re-
where vehicles’ average speed is about 80 km h1 (22.22 ms1) mains a challenge to utilize the capabilities of the proposed
and the occupation of a specific pixel close to the camera is ex- algorithm to other kind of machine vision problems, such as secu-
pected to be less than a second. We chose the 2500 frames sample rity, remote sensing, ship surveillance and a plethora of surveil-
in order to examine if overexposure can improve the reconstruc- lance applications.
tion procedure. We observe that the choice of a larger sample size
tends to decrease the accuracy (Tables 1 and 2) for two reasons:
first, as the sample size increases, the background color changes
References
following the diminutive change of lighting (this also explains
the differences on the measured values in different sample sizes Abdel-Aziz, Y.I., Karara, H.M. (1971). Direct linear transformation from comparator
for the same pixel) and second, the larger the sample size the more coordinates into object space coordinates in close-range photogrammetry. In
Proceedings of the symposium on close-range photogrammetry (pp. 1–18). VA:
the inserted noise and the subsequent degradation of the back-
American Society of Photogrammetry.
ground reconstruction performance. Colombari, A., Cristani, M., Murino, V., Fusiello, A. (2005). Exemplar-based
The proposed system was implemented and tested as it is background model initialization. ACM workshop on video surveillance and
shown in Fig. 8, where the result of the Scene I experiment is pre- sensor networks VSSN, pp. 29–36.
Comaniciu, D., Ramesh, V., Meer, P. (2000). Real-time tracking of non-rigid objects
sented. This result is also published on the internet corresponding using mean shift. In Proceedings of the IEEE conference on computer vision and
author’s personal page http://www.users.ntua.gr/nmand/BGRe- pattern recognition, Hilton Head, SC. Vol. 1. pp. 142–149.
construction.htm. Moreover, the background reconstruction proce- Comaniciu, D., & Meer, P. (1997). Robust analysis of feature spaces: Color image
segmentation. IEEE Conference on Computer Vision and Pattern Recognition,
dure for the test scenes I–IV is also included in the same web page. 750–755.
Overall, the system was found to work satisfactorily and the Criminisi, A., Perez, P., & Toyama, K. (2004). Region filling and object removal by
background reconstruction algorithm added robustness to the pro- exemplar-based image inpainting. IEEE Transactions on Image Processing, 13,
1200–1212.
cess. In normal traffic conditions the system responded well and Cucchiara, R., Piccardi, M. (1999). Vehicle detection under day and night
the outcome results regarding vehicle speed and trajectory were illumination. In Proceedings of 3rd international ICSC symposium on intelligent
accurate enough. The maximum number of vehicles detected and industrial automation.
Davies, E. R. (2005). Machine vision (3rd ed.). San Francisco, US: Elsevier Inc.. p. 104.
tracked simultaneously for the heavy traffic instances of scene 1, Davies, E. R. (2005). Mathematical morphology. In Machine vision (3rd ed.,
was 10. pp. 233–261). San Francisco, US: Elsevier Inc.
Graham, R., & Yao, F. (1983). Finding the convex hull of a simple polygon. Journal of
Algorithms, 4, 324–331.
Gupte, S., Masoud, O., Martin, R., & Papanikolopoulos, N. (2002). Detection and
6. Conclusion classification of vehicles. IEEE Transactions on Intelligent Transportation Systems,
3, 37–47.
In this study we presented a system that implements a classical Gutchess, D., Trajkovic, M., Kohen-Solal, E., Lyons, D., Jain, A. 2001. A background
model initialization algorithm for video surveillance. In Proceedings of the eighth
computer vision algorithm, the background subtraction, appropri- international conference on computer vision (Vol. 12, pp. 733–740). Vancouvier,
ately modified for the purposes of a traffic surveillance system. The Canada.
N.A. Mandellos et al. / Expert Systems with Applications 38 (2011) 1619–1631 1631

Haritaoglu, I., Harwood, D., & Davis, L. (2000). W4: Real-time surveillance of people Mimbela, L., Klein, (2000). Non-intrusive technologies. In A summary of vehicle
and their activities. IEEE Transactions on Pattern Analysis and Machine detection and surveillance technologies used in intelligent transportation
Intelligence, 22(8), 809–830. systems (1st ed., pp. 5.1–5.27). Federal Highway Administrations (FHWA)
Horprasert, T., Harwood, D., Davis, L. (1999). A statistical approach for real-time Intelligent Transportation Systems Joint Program Office: Washington, DC, US.
robust background subtraction and shadow detection. IEEE ICCV FRAME-RATE Pless, R. (2005). Spatio-temporal background models for outdoor surveillance.
workshop. EURASIP Journal on Applied Signal Processing, 14, 2281–2291.
Kastrinaki, V., Zervakis, M., & Kalaitzakis, K. (2003). A survey of video processing Senior, A., Tian, H.A.Y., Brown, L., Pankanti, S., Bolle, R. (2001). Appearance models
techniques for traffic applications. Image and Vision Computing, 21, 359–381. for occlusion handling. In 2nd IEEE workshop on performance evaluation of
Kim, K., Chalidabhongse, T., Harwood, D., & Davis, L. (2005). Real-time foreground– tracking and surveillance PETS.
background segmentation using codebook model. Real-Time Imaging, 11(3), Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-
172–185. time tracking. Computer Vision Pattern Recognition, 246–252.
Koller, D., Weber, J., Makik, J. (1994). Robust multiple car tracking with occlusion Toyama, K., Krumm, J., Brumitt, B., & Meyers, B. (1999). Wallflower: Principles and
reasoning. In Proceedings of the third European conference on computer vision (pp. practice of background maintenance. ICCV99, 255–261.
189–196). Stockholm, Sweden, May 2–6. Wren, C., Azarbayejani, A., Darrell, T., & Pentland, A. (1997). Pfinder: Real-time
Lindeberg, T. (1996). Scale-space: A framework for handling image structures at tracking of the human body. IEEE Transactions on Pattern Analysis and Machine
multiple scales. CERN School of Computing, 695–702. Intelligence, 19, 780–785.
Mahmassani, H., Haas, C., Zhou, S., Peterman, J. (2001). Evaluation of incident Zivkovic, Z., & Van der Hijden, F. (2006). Efficient adaptive density estimation per
detection methodologies. CTR report: http://www.utexas.edu/research/ctr/ image pixel for the task of background subtraction. Pattern Recognition Letters,
pdf_reports. 55(5), 773–780.

You might also like