0% found this document useful (0 votes)
44 views6 pages

Moving Object Detection For Real-Time Applications

Uploaded by

Saeideh Oraei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views6 pages

Moving Object Detection For Real-Time Applications

Uploaded by

Saeideh Oraei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Moving Object Detection for Real-Time Applications

Lucia Maddalena Alfredo Petrosino


National Research Council University of Naples Parthenope
Institute for High-Performance Department of Applied Science
Computing and Networking Via A. De Gasperi 5, 80133 Naples, Italy
Via P. Castellino 111, 80131 Naples, Italy [email protected]
[email protected]

Abstract dynamic environments, but it is strictly dependent on the ve-


locity of moving objects in the scene and it is subject to the
Detection of moving objects in video streams is the first foreground aperture problem. In contrast, optical flow tech-
relevant step of information extraction in many computer niques aim at computing an approximation to the 2D mo-
vision applications. Aside from the intrinsic usefulness of tion field (projection of the 3D velocities of surface points
being able to segment video streams into moving and back- onto the imaging surface) from spatio-temporal patterns of
ground components, detecting moving objects provides a fo- image intensity [1]. They can be used to detect indepen-
cus of attention for recognition, classification, and activity dently moving objects in the presence of camera motion,
analysis, making these later steps more efficient. but most optical flow computation methods are computa-
We present some extensions to the method for moving ob- tionally complex, and cannot be applied to full-frame video
ject detection presented in [4]. Our main contributions are streams in real-time without specialized hardware.
related to the pre-processing of intermediate results (tran- Surely background subtraction is the most common and
sience maps), aimed at enhancing the accuracy of detec- efficient method to tackle the problem (e.g. [6]). It is based
tion results, and to the parallelization of some of the most on the comparison of the current sequence frame with a
computationally intensive steps using SSE2 instructions, in reference background, including information on the scene
order to enhance efficiency and allow for real-time applica- without moving objects. It is independent on the velocity of
tions. moving objects and it is not subject to the foreground aper-
ture problem, but it is extremely sensitive to dynamic scene
changes due to lighting and extraneous events. Although
1. Introduction these are usually detected, they leave behind holes where
the newly exposed background imagery differs from the
Detection of moving objects in video streams is the first known background model (ghosts). While the background
relevant step of information extraction in many computer model eventually adapts to these holes, they generate false
vision applications, including traffic monitoring, automated alarms for a short period of time.
remote video surveillance, and people tracking [3]. Aside Among different approaches, the one proposed in [4] al-
from the intrinsic usefulness of being able to segment video lows disambiguation of moving objects that stop for a while,
streams into moving and background components, detect- are occluded by other objects, and then resume motion.
ing moving objects provides a focus of attention for recog- Layered detection is based on two processes: pixel analy-
nition, classification, and activity analysis, making these sis and region analysis. Pixel analysis determines whether
later steps more efficient, since only moving pixels need be a pixel is stationary or transient over time, while region
considered [2]. The problem is known to be significant and analysis detects stationary regions of stationary pixels cor-
difficult [8]. Conventional approaches to moving object de- responding to stopped objects. We adopted the layered
tection include temporal differencing [7], background sub- approach, including a pre-processing of transience maps
traction [8], and optical flow [1]. aimed at suppressing shadows and reducing noise, and clus-
Temporal differencing takes into account differences in tering non-background pixels using region growing.
consecutive sequence frames, which allow to discern static Moreover, the need for real-time systems imposes very
objects (having null differences) from moving objects (hav- low computation times. We focused on most computation-
ing non-null differences). This approach is very adaptive to ally intensive steps of the proposed approach, obtaining par-

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007
allel modules for several tasks. Specifically, we adopted background (BG), transient (TR), or stationary (ST). For
the SIMD approach. SIMD architectures operate concur- each pixel, the corresponding map value is updated to TR
rently in a single instruction on multiple data. Their usage if it was ST or BG and if motion trigger is greater than a
is especially suited for applications where huge amount of given threshold (there has been a step change in intensity).
data must undergo the same processing, such as multime- Moreover, if it was TR and stability measure is lower than
dia applications. Therefore we traduced in assembler most a given threshold (intensity has been stabilized), it is up-
time consuming routines, taking advantage of SIMD archi- dated to BG if its stabilized intensity value is equal (within
tectures. a threshold T h) to the background intensity value, and to
The paper is organized as follows. In Section 2 we give ST otherwise.
a brief description of the approach adopted for moving ob- In order to allow for adaptivity of the background model
ject detection. In Section 3 we illustrate the basics of paral- to slow lighting changes, we update the background B by
lelization of computationally demanding steps. In Section 4 running average with selectivity. Specifically, background
we present results obtained with the above mentioned paral- model Bt is initially set to the first image (B0 (x) = I0 (x)
lelization, while Section 5 includes conclusions and further for every pixel x), and then updated as:
research directions. 
αBt (x) + (1 − α)It (x), x non − moving
Bt+1 (x) =
Bt (x), otherwise
2. Approach to moving object detection (2)
where α is a time constant that specifies how fast new infor-
A robust detection system should be able to recognize mation supplants old observations, usually chosen in [0.9,
when objects have stopped and even disambiguate overlap- 1].
ping objects - functions usually not possible with traditional
motion detection algorithms. The approach for moving ob- 2.2. Region analysis
ject detection based on layered adaptive background sub-
traction [4] allows quite efficiently the detection of overlap- Non-background pixels in the transience map M are
ping objects. Layered detection is based on two processes: clustered into regions. In [2, 4] clustering is obtained us-
pixel analysis and region analysis. ing a nearest neighbor spatial filter; in our implementation,
after a pre-processing of the transience map (see §2.3), we
2.1. Pixel analysis cluster non-background pixels using a connected compo-
nent labeling algorithm based on region growing.
Pixel analysis determines whether a pixel is stationary or Each spatial region is then analyzed and classified as
transient by observing its intensity value over time. Mov- moving or stopped object on the basis of the number of tran-
ing objects passing through a pixel cause an intensity pro- sient or stationary pixels it includes. According to the algo-
file step change, followed by a period of instability; then rithm reported in [4], regions that consist of stationary pix-
the profile stabilizes, in a manner dependent on the kind of els are added as a new layer over the background. A layer
event. To capture the nature of changes in pixel intensity management process is used to determine when stopped ob-
profiles, a gradient based approach is applied. jects resume motion or are occluded by other moving or sta-
Let It (x) be the intensity of pixel x at a time t occurring k tionary objects.
frames in the past. The motion trigger T prior to the frame
of interest t is the maximum absolute difference between the 2.3. Transience map pre-processing
pixel intensity It (x) and its value in the previous l frames:
Prior to performing region analysis, we pre-process the
T = max {|It (x) − It−j (x)|} , (1) transience map, in order to suppress shadows and reduce
j=1,...,l
noise.
where suggested value for l is 5 [2]. Let us also introduce Shadows in a scene represent a problem for video sur-
the stability measure as the variance of the pixel intensity veillance systems, especially those operating outdoor. In
profile from time t to the present: fact, the cast shadow of an object (the shadow area projected
k  2 on the scene by the object) alters the shape of the object it-
2 k
(k + 1) j=0 It+j (x) − j=0 I t+j (x) self, leading to errors in the measurement of its geometrical
S= , properties. This affects both the classification and the as-
k(k + 1)
sessment of moving object position, making uncertain the
where k is set to correspond to one second of video [2]. subsequent moving object tracking. Moreover, cast shad-
Once T and S have been computed, a transience map M ows of two or more objects can create a false adjacency be-
can be defined for each pixel, taking three possible values: tween the objects, which leads to detecting them as a single

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007
els); (c) the transience map M pre-processed as described
in §2.3; (d) the output image, obtained as described in §2.2
(the green rectangle indicates moving objects, while the yel-
low rectangle indicates stationary objects).

Figure 1. Example of shadow suppression.

object. This affects many higher level surveillance tasks,


such as counting and classifying individual moving objects (a)
in a scene. Instead, shadows in the background do not pose
big problems, as long as the background is correctly up-
dated during time.
Among the many approaches to moving cast shadow
suppression, we have adopted the one reported in [3], which
proved to be quite accurate and suitable for moving object
detection. The approach is based on the HSV color model,
which closely correspond to human color perception and (b)
has been proved more suitable for detecting shadows com-
pared to RGB model, being able to separate chrominance
and intensity information.
For each pixel belonging to the foreground (either sta-
tionary or transient), a binary mask is constructed, indicat-
ing shadow pixels (value 1). Let IkH (x, y), IkS (x, y), and
IkV (x, y) be the hue, saturation, and value components of
pixel (x, y) of image I at time k, and assume analogous
notation for components of the background image B. The (c)
shadow mask is defined as:

 IkV (x,y)
 1 ifα ≤ BkV (x,y) ≤ β and


SPk (x, y) = IkS (x, y) − BkS (x, y) ≤ τS and

 |IkH (x, y) − BkH (x, y)| ≤ τH


0 otherwise
(3)
with 0 < α < β < 1, where all parameter values are empir- (d)
ically tuned and proved stable under environment changes.
The three conditions for identifying a foreground pixel as Figure 2. The complete detection process:
shadow derive from the observation that in a shadowed area (a) input image; (b) Transience map; (c) Pre-
there is a significant illumination variation, but only a small processed transience map; (d) output image.
color variation.
An example of application of the algorithm for shadow
reduction is given in Figure 1.
The output of shadow suppression, after being binarized, For quantitative detection results we adopted the usual
is filtered with a morphological opening using a 3 × 3 struc- Recall and Precision functions computed over tp (true pos-
turing element, in order to reduce noise due to sudden illu- itives), f n (false negatives) and f p (false positives):
mination changes or small camera movements. 
An example of the complete detection process is given in tp
Recall =   ,
Figure 2, where we show: (a) an input image; (b) the tran- tp + f n
sience map M computed as described in §2.1 (white pixels  
are stationary pixels, while green pixels are transient pix- where ( tp + f n) is the total number of objects in the

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007
ground truth, and SSE (SIMD Streaming Extension) and SSE2 (SSE ex-
 tended) extensions. While MMX instructions operate on
tp 64-bit MMX registers physically stored onto usual FPU reg-
P recision =   ,
tp + f p isters, SSE and SSE2 instructions use a different set of eight
  128-bit XMM registers. XMM registers can simultaneously
where ( tp + f p) is the total number of detected ob- store two 64-bit integers or floats, four 32-bit integers or
jects. floats, eight 16-bit integers, or sixteen 8-bit integers. SSE2
Several image sequences have been considered; here, for instructions (for data movement, data format conversion,
space constraints, we only report results obtained for the arithmetical and logical operations) operate concurrently on
Walk1 sequence of the CAVIAR Project [5]. The sequence, XMM registers. Compared with analogous sequential in-
which is labeled and comprise 600 frames of 384×288 spa- structions, this approach leads to a speedup which is upper
tial resolution, presents critical factors such as light change bounded by the number of data items loaded into registers
and mimetics. Setting the motion trigger threshold to 70 (which depends on the data type chosen).
(the stability measure is not taken into account due to the
For our experiments we adopted color RGB images; each
absence of stationary objects in the sequence), the method
pixel is usually represented by 3 bytes (one for each color
correctly detected 1217 over 1550 objects, achieving as per-
component) and an image is represented as an array of
formance Recall = 0.79 and Precision = 0.73. In Figure 3
bytes. The general parallelization strategy consists in work-
the Recall and Precision values are reported in accordance
ing separately on the three RGB channels, using different
to the variation of the motion trigger threshold. We re-
XMM registers, each containing 16 pixels of one of the
mark the role of the threshold, since for values less than 70
color bands. In the case we need a representation with real
the method detects a lot of objects, also not present in the
values (e.g. shadow suppression), we load 4 adjacent pixels
ground truth, although the number of the relevant ones is not
for each of the colour bands into the XMM registers. The
significantly greater. This corresponds to smallest Precision
assembler SSE2 code has been introduced into the ANSI-C
values, while letting stable the Recall value.
source code, using directives that are extensions of ANSI-
1
Precision
C (and therefore implemented in different ways in different
Recall
0.9
compilers).
0.8

0.7
Some of the tasks taken into account for implementation
using SSE2 instructions are briefly described in the follow-
Precision and Recall

0.6

0.5 ing.
0.4

0.3

0.2
3.1. Motion trigger
0.1

0
20 40 60 80 100 120 140 160 180
Motion Trigger Threshold

The basic idea of the SSE2 implementation of the al-


Figure 3. Precision and Recall in terms of the gorithm for computing the motion trigger T (see §2.1) con-
motion trigger threshold. sists in applying eqn. (1) concurrently for 16 adjacent pixels
loaded into XMM registers.
Since there is no absolute value function that operates
on SWAR registers, in order to compute the absolute value
differences appearing in eqn. (1) we had to use different
3. Parallelization instructions. Specifically, the instruction psubusb oper-
ates a subtraction “with saturation” of two registers, mean-
SWAR (SIMD Within a Register) architectures realize ing that if the difference value is negative, than it is set to
concurrency using special registers (SWAR registers) of di- 0 (it is saturated). Computing two subtractions with sat-
mension wider than that of general registers, and suitable uration between the image and the background and or-ing
instructions on such SWAR registers that allow to operate the result, we obtain the absolute value of the difference.
concurrently on several data loaded into such registers. In Moreover, since there is no compare function that operates
order to directly control SWAR registers, it is necessary to on unsigned bytes (but only for signed ones), in order to
program in assembler, and the algorithm must be suitably compare the previous result with the threshold, we first sum
modified in order to take into account their use. the quantity 128 to both the threshold and the previous re-
For our experiments we adopted an Intel Pentium 4 sult, and then compare the obtained signed bytes (using the
processor, which supports MMX (MultiMedia eXtension), instruction pcmpgtb).

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007
3.2. Background difference 2. summing three bytes at a time along the register result-
ing from step 1, to obtain the number of white pixels
Background difference, used in the construction of the for each neighborhood of “center pixels”, by:
transience table M (see §2.1), can be implemented in SSE2
(a) right shifting and left shifting the register by one
applying the sequential algorithm concurrently on 16 adja-
byte (see Fig. 4-(c));
cent pixels loaded into XMM registers.
Absolute value differences between image and back- (b) summing the three resulting registers along the
ground intensity values are computed and compared to the columns (see Fig. 4-(d)).
background threshold similarly to the case of motion trig- 3. comparing the result with the threshold Te or Td (see
ger computation. Results obtained for the three bands of Fig. 4-(e) for erosion with Te =6).
each image are then or-ed for obtaining the final difference
image.

3.3. Shadow suppression

The shadow suppression algorithm adopted for the pre- (a)


processing of the transience map (see §2.3) consists, for
each non background pixel, in: (b)

1. converting from RGB space to HSV space the pixel in


the current image and the corresponding pixel in the
background image;
(c)
2. computing the shadow mask using equation (3) and
updating the corresponding element in the transience (d)
map.
(e)
Since input and output data are real (pixel intensities for
RGB channels must be normalized in [0,1], and HSV com- Figure 4. Example of erosion: (a) 16 adjacent
ponents are real as well), we can load at most four adjacent pixels of 3 consecutive image rows loaded
values into each XMM register. We therefore developed into XMM registers xmm0, xmm1, xmm2 to
SSE2 implementations for steps 1 and 2 above, working compute the result for central (dark) pix-
concurrently on four pixels at a time. In both cases, the els; (b) result of step 1 over xmm0, xmm1,
main concern of SSE2 implementation is in the combina- xmm2; (c) register xmm3 together with reg-
tion of results depending on if statements. isters xmm4 and xmm5 obtained by left and
right shifting of xmm3; (d) result of step 2
3.4. Erosion and dilation over xmm3, xmm4, xmm5; (e) result of step
3 over xmm6 (threshold Te =6).
Erosion and dilation have been used for the final pre-
processing of the transience map (see §2.3). Erosion (dila-
tion) with a 3 × 3 structuring element is obtained assigning
to each pixel of the binary image the value 0 (the value 1) if 3.5. Background update
the number of 8-connected adjacent pixels having value 1 is
less (greater) than a given threshold Te (Td ), and the value As usual, the basic idea of the SSE2 implementation of
1 (the value 0) otherwise. the algorithm for the update of the background B (see §2.1)
To implement erosion and dilation using SSE2 instruc- consists in loading 16 adjacent pixels into XMM registers
tions we load 16 adjacent pixels of three consecutive image and applying the sequential algorithm concurrently on such
rows into three XMM registers (see Fig. 4-(a)). Such data data, according to eqn. (2).
allow to compute the result for “center pixels” (dark pixels The main concern here is the need of dealing at the same
in central row of Fig. 4-(a)). The result is obtained by time with byte data (image and background) and real data
(the α parameter). The problem has been afforded unpack-
1. summing along the columns (byte by byte) the three ing bytes to float, performing the necessary computations
registers, obtaining a register that contains the number four floats at a time, and packing back to bytes the results.
of white pixels column by column (see Fig. 4-(b));

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007
4. Experimental results noticeably different: not all instructions are present in both
sets and the number of clock cycles for the same instruc-
We implemented the above described procedures on a tion is different (generally SWAR instructions require more
Pentium 4 with 2.40 GHz and 512 MB RAM, running Win- clock cycles).
dows XP operating system. We tested them on several color
image sequences; for space constraints, here we report re- 5. Conclusions
sults for just one sequence of 400 color images with size 320
× 240. All timings have been obtained using the Windows We presented a method for moving object detection that
high-resolution performance counters (QueryPerformance- allows disambiguation of moving objects that stop for a
Counter and QueryPerformanceFrequency). while, are occluded by other objects, and that then resume
In Table 1 we compare times (in msecs) obtained as the motion. Such method strongly relies on the work pre-
mean execution times for each frame on the whole video sented in [4]. Our main contributions are related to the pre-
with the implementation of the sequential and the SSE2 al- processing of transience maps, aimed at suppressing shad-
gorithms for all tasks reported in §3. Moreover, we report ows and reducing noise, and clustering non-background
speedup values, obtained as the ratio of sequential and SSE2 pixels using region growing. Moreover, being concerned
execution times. Results for dilation are not shown, since with real-time applications, we focused on some of the most
they perfectly agree with those obtained for erosion. computationally intensive steps of the proposed approach,
obtaining SIMD parallel modules for several tasks: motion
Task (1) Seq. (2) SSE2 Speedup trigger, background difference, background update, shadow
time time (1)/(2) suppression and morphological operations. First experi-
Motion trigger 3.49 0.39 8.95 mental results show that in most cases we could achieve
Backg. difference 1.30 0.10 13.00 speedup values close to the ideal speedup. Future research
Shadow suppr. 0.61 0.29 2.10 will be devoted to the parallelization of other tasks of the
Erosion 1.11 0.12 9.25 moving object detection process.
Backg. update 7.98 0.50 15.96
References
Table 1. Mean execution times (in msecs) of
the sequential and the SSE2 implementations [1] J. Barron, D. Fleet, and S. Beauchemin. Performance of opti-
for all tasks reported in §3 on color images of cal flow techniques. IJCV, 12, no. 1:42–77, 1994.
size 320 × 240, and related speedup. [2] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Dug-
gins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt,
and L. Wixson. A system for video surveillance and moni-
Here we can notice that in most cases good speedup val- toring. The Robotics Institute, Carnegie Mellon University,
ues could be achieved. Specifically, in all the cases where CMU-RI-TR-00-12, 2000.
we could operate solely on byte data (motion trigger, [3] R. Cucchiara, M. Piccardi, and A. Prati. Detecting moving
objects, ghosts, and shadows in video streams. IEEE Transac-
background difference, erosion and dilation), we achieved
tions on Pattern Analysis and Machine Intelligence, 25(10):1–
speedup values not too distant from the ideal speedup of 16 6, 2003.
(16 bytes into a XMM register). In the case of shadow sup- [4] H. Fujiyoshi and T. Kanade. Layered detection for multiple
pression, where we had to operate on float data, achieved overlapping objects. IEICE Trans. Inf. & Syst., E87-D, no.
speedup is more than half of the ideal speedup of 4 (4 floats 12:2821–2827, 2004.
into a XMM register); this result can be considered appre- [5] http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/.
ciable if we consider that available parallelism is very lim- [6] M. Piccardi. Background subtraction techniques: a review.
ited here, since computations apply only to non background Proc. of IEEE SMC 2004 International Conference on Sys-
pixels (see §3.3). Finally, in the case where we had a mix- tems, Man and Cybernetics, 2004.
[7] P. Rosin and T. Ellis. Image difference threshold strategies
ture of byte and float data (background update) the adopted
and shadow detection. Proc. British Machine Vision Confer-
parallelization strategy has led to an extremely high speedup ence, pages 347–356, 1995.
value. [8] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wall-
In the general case, it should be observed that, even if the flower: Principles and practice of background maintenance.
number of operations and memory accesses in the parallel Proceedings of the Seventh IEEE Conference on Computer
implementation decrease by a factor of 16 (for byte data) Vision, 1:255–261, 1999.
or 4 (for float data) compared to the sequential case, execu-
tion times reduce by a smaller factor. This is mainly due to
the fact that the standard and the SWAR instruction sets are

14th International Conference on Image Analysis and Processing (ICIAP 2007)


0-7695-2877-5/07 $25.00 © 2007

You might also like