0% found this document useful (0 votes)
189 views4 pages

Webcam Motion Detection 2007

This paper proposes a novel approach to motion detection and estimation based on visual attention. The method uses two different thresholding techniques and comparisons are made with Black's motion estimation technique. Results show that the new method can extract motion information.

Uploaded by

anon-773497
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views4 pages

Webcam Motion Detection 2007

This paper proposes a novel approach to motion detection and estimation based on visual attention. The method uses two different thresholding techniques and comparisons are made with Black's motion estimation technique. Results show that the new method can extract motion information.

Uploaded by

anon-773497
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MOTION DETECTION USING A MODEL OF VISUAL ATTENTION

Shijie Zhang and Fred Stentiford

Department of Electronic and Electrical Engineering


University College London, Adastral Park Campus, Ross Building
Martlesham Heath, Ipswich, IP5 3RE, UK
{j.zhang, f.stentiford}@adastral.ucl.ac.uk

ABSTRACT estimation that addressed violations of both brightness


constancy and spatial smoothness assumptions caused by
Motion detection and estimation are known to be important multiple motions. It was applied to two common techniques
in many automated surveillance systems. It has drawn for optical flow estimation: the area-based regression
significant research interest in the field of computer vision. method and the gradient-based method. To cope with
This paper proposes a novel approach to motion detection motions larger than a single pixel, a coarse-to-fine strategy
and estimation based on visual attention. The method uses was employed in which a pyramid of spatially filtered and
two different thresholding techniques and comparisons are sub-sampled images was constructed. Separate motions were
made with Black’s motion estimation technique [1] based on recovered using estimated affine motions, however, the
the measure of overall derived tracking angle. The method is method is relatively slow. Viola and Jones [4] presented a
illustrated on various video data on and results show that the pedestrian detection system that integrated both image
new method can extract motion information. intensity (appearance) and motion information, which was
. the first approach that combined motion and appearance in a
Index Terms— visual attention, motion analysis, object single model. The system works relatively fast and operates
tracking on low resolution images under difficult conditions such as
rain and snow, but it does not detect occluded or partial
human figures. In [5] a method for motion detection based
1. INTRODUCTION on a modified image subtraction approach was proposed to
determine the contour point strings of moving objects. The
The demand for automated motion detection and object proposed algorithm works well in real time and is stable for
tracking systems has promoted considerable research illumination changes. However, it is weak in areas where a
activity in the field of computer vision [2-6]. This paper contour appears in the background which corresponds to a
proposes a method to detect and measure motion based upon part of the moving object in the input image. Also some of
tracking salient features using a model of visual attention. the contours in temporarily non-moving regions are
Bouthemy [2] proposed a novel probabilistic parameter-free neglected in memory so that small broken contours may
method for detecting independently moving objects using appear. In [6] the least squares method was used for change
the Helmholz principle. Optical flow fields were estimated detection. The proposed approach is efficient and successful
without making assumptions on motion presence and on image sequences with low SNR and is robust to
allowed for possible illumination changes. The method illumination changes. The biggest shortfall is that it can only
imposes a requirement on the minimum size for the detected cope with single object movements because of the averaging
region and detection errors arise with small and low nature of least square method.
contrasted objects. Black and Jepson [3] proposed a method The use of visual attention (VA) methods [7-10] to
for optical flow estimation based on the motion of planar define the foreground and background information in a static
regions plus local deformations. The approach used image for scene analysis has motivated this investigation.
brightness information for motion interpretation by using We propose in this paper that similar mechanisms may be
segmented regions of piecewise smooth brightness to applied to the detection of saliency in motion and thereby
hypothesize planar regions in the scene. The proposed derive an estimate for that motion. The visual attention
method has problems dealing with small and fast moving approach is presented in Section 2. Results are shown in
objects. It is also computational expensive. Black and Section 3 along with some discussion. Finally, Section 4
Anandan [1] then proposed a framework based on robust outlines conclusions and future work.

1-4244-1437-7/07/$20.00 ©2007 IEEE III - 513 ICIP 2007


2. MOTION ESTIMATION BASED ON VISUAL Define the 2nd fork
ATTENTION
{ }
S y = y'1 , y'2 ,..., y'm where x 0 − x 'i = y 0 − y 'i (5)
Regions of static saliency are identified using the attention and y0 − x0 ≤ V .
method described in [9]. Those regions which are largely
different to most of the other parts of the image will be S y is a translated version of S x . The fork centered on
salient and are likely to be in the foreground. The
x0 is said to match that at y0 (Sx matches Sy) if all the colour
discrimination between foreground and background can be
components of corresponding pixels are within a threshold
obtained using features such as color, shape, texture, or a
combination. The concept has been extended into the time δk ,
domain and is applied to frames from video sequences to
detect salient motion. The approach does not require a Fk ( x 'i ) − Fk ( y 'i ) ≤ į k , k = r , g , b, i = 1,2, ..., m. (6)
specific segmentation process and depends only upon the N attempts are made to find matches and the
detection of anomalous movements. The method estimates corresponding displacements are recorded as follows:
the shift by obtaining the distribution of displacements of For the jth of N1 < N matches define the corresponding
corresponding salient features.
Computation is reduced by focusing on candidate
displacement between x0 and y0 as ı tj +1 = σ p , σ q where( )
regions of motion which are detected by generating the σ p = x0 p − y 0 p , σ q = x0 q − y 0 q , (7)
intensity difference frame from adjacent frames and
applying a threshold. and the cumulative displacements Δ and match counts Γ
I x = { r2 − r1 + g 2 − g1 + b2 − b1 } / 3 , (1) as
Δ ( x 0 ) = Δ ( x 0 ) + ı tj + 1 ½°
where parameters (r1 , g1 , b1 ) & (r2 , g 2 , b2 ) represent the rgb ¾ j = 1,..., N l < N , (8)
colour values for pixel x in frames 1 and 2. The intensity Ix Γ( x 0 ) = Γ( x 0 ) + 1 °¿
is calculated by taking the average of the differences of rgb where N1 is the total number of matching forks and N is the
values between the two frames. total number of matching attempts.
The candidate regions R1 in frame 1 are then identified
The displacement ı xt +0 1 corresponding to pixel x0
where I x > T . T is a threshold determined by an analysis of
averaged over the matching forks is
the image.
Let a pixel x = (x, y,) in Rt correspond to colour Δ( x0 )
ı xt +0 1 = . (9)
components a = (r, g, b). Let F(x) = a. and let x0 be in Rt Γ(x0 )
in frame t. Consider a neighbourhood G of x0 within a A similar calculation is carried out between Rt and Rt −1
window of radius İ where
(swapping frames) to produce ı xt −0 1 and the estimated
{x '
i ∈G iff x0 − x ′ ≤ ε . } (2)
t +1 t −1
displacement of x0 is given by {ı x0 − ı x0 } / 2 . This
Select a set of m random points Sx in G (called a fork)
where estimate takes account of both trailing and leading edges of
{ ' '
S x = x , x ,..., x
1 2
'
m }. (3) moving objects.
This process is carried out for every pixel x0 in the candidate
Forks are only generated which contain pixels that motion region Rt and M attempts are made to find an
mismatch each other. This means that forks will be selected
in image regions possessing high or certainly non-zero VA internally mismatching fork S x .
scores, such as on edges or other salient features as observed
in earlier work [9]. 3. RESULTS AND DISCUSSION
In this case the criteria is set that at least one pixel in the
fork will differ with one or more of the other fork pixels by 3.1 Road Scene
more than δ in one or more of its rgb values i.e.
A pair of 352x288 frames from a traffic video was analyzed
Fk ( x 'i ) - Fk ( x 'j ) > į k , for some i, j , k . (4)
with results shown in Figure 1. The intensity difference
Define the radius of the region within which fork indicates the areas of candidate motion for subsequent
comparisons will be made as V (the view radius). Randomly analysis. Motion vectors were calculated as above for each
pixel in the car region and plotted in Figure 2. A map for
select another location y0 in the adjacent frame Rt +1 within a
motion magnitudes in the y direction is shown in which
radius V of x0.

III - 514
colors represent magnitudes as indicated in the colour bar. δ for pixel matching was also compared based on the joint
The directions of pixel motion is also represented by colors Euclidean distance of RGB colour channels rather than
in the bar where angles are measured anticlockwise from the treating the channels separately as in (6).
vertical e.g. yellow indicates a direction towards the top left. In the case of pixel mismatching, there will be at least
Motion vectors are not assigned if no internally mismatching one pixel in the fork that differs by more than δ with one or
forks can be found e.g. in areas of low saliency. The more of the other pixels in the fork according to
processing took 0.23 seconds in C++.
¦ (F ) 2
The parameters of the experiment were M = 100, N = ( x 'i ) - Fk ( x 'j ) > į, for some i, j. (11)
k
100, ε = 3 , m = 7, V = 10, į = (40,40,40), T = 12.6. T is set k
to be twice the standard deviation of the values in the matrix The fork centered on x0 is said to match that at y0 (Sx
I x. matches Sy) if the Euclidean distance between corresponding
fork pixels are all less than δ

¦ (F ( x i′ ) - Fk ( y i′ ) ) ≤ į, ∀ i.
2
k
(12)
k

Figure 4 illustrates a comparison between the weighted


Fig. 1. Two adjacent frames and their intensity difference tracking angles derived from Black’s software and those
derived from the motion VA algorithm using the separate
and joint metrics, . The parameters of the experiment were
M = 100, N = 1000, ε = 3 , V = 10, į = (40,40,40), T = 12.6,
and the calculations were repeated for different numbers of
fork pixels (m) on the same frame from 2 to 49 (fully filled
Fig. 2. Motion vector map, y direction magnitude map, and angle 7x7 fork). The ground truth angle of the car was measured to
map corresponding to frames in Fig. 1. be 36° ± 5° . As is shown in the figure, both angle results
produced by the motion VA algorithm give closer estimates
A second pair of frames from the same traffic video was than Black (= 24.6° ). In both cases the weighted angle
analyzed with results shown in Figure 3. The figure includes increases as extra pixels are added into the fork with the
a magnitude map and an angle map for 5-car scenario. separate colour channel metric performing better. Increased
accuracy and precision are achieved at the expense of
addition computation which increases with m. The
improvements diminish beyond 15 fork pixels.

Tracki ng angles vs Fork pixel s

40

35

30
Tracking angles (degrees)

Fig. 3. Y direction magnitude map and angle map (5-car) 25

separate metric
20
joint metric
15
3.2 Object Tracking 10

The method was compared with Black’s motion estimation 0

technique based on the weighted object tracking angle θ


2 5 9 15 20 25 30 35 40 45 49
no. fork pixe ls (m )

defined as follows Fig. 4. Weighted tracking angles for Motion VA algorithm against

θ= ¦ (MI × AI ) ,
2
(10)
numbers of fork pixels and separate (6) and joint (12) distance
metrics. Black’s estimate is = 24.6° .
¦ MI 2

where MI is magnitude of the motion of a pixel, and AI is the Fork radii of ε = 2,3,4,5,6 (fork sizes of 5x5 to 13x13)
angle of the direction of motion of the pixel. A squared were then used with other parameters fixed to compare the
weighting was used so that motion vectors with higher performance on the angle estimates for motion VA algorithm
values have a bigger influence on the weighted angle as they using the separate channel metric. The results illustrated in
are likely to be more reliable. Figure 5 shows that a slightly better estimate can be obtained
Colour images are used in contrast to the grayscale with bigger fork radii but there is little improvement above
images used by Black because they provide more 15 pixels. Also the results converge as the number of fork
information for the comparison of regions. A new threshold pixels increase.

III - 515
Tracking angle s vs Fork pixe ls
The method was illustrated on various video data and
40 different thresholding criteria. Compared to Black’s
35 technique the attention method was shown to obtain a better
30 estimate of motion direction. The stability of the results can
be improved by increasing the volume of processing. The
Tracking angles (degrees)

5x5 fork
25
7x7 fork

20 9x9 fork
11x11 fork
simple elements are amenable to parallel implementation. In
15 13x13 fork addition, the method does not require a training stage or
10 prior knowledge of the objects to be tracked.
5 Future work will be carried out on wider range of data
0 to establish threshold values with more certainty with
2 5 9 15 20 25
no. fork pixe ls (m ) particular emphasis on addressing noise arising from
background motion and changes in illumination. Camera
Fig. 5. Weighted tracking angles for Motion VA algorithm using
the separate channel metric (6) motion involved with football data in Fig. 6 and 7 can also
be estimated and used to improve motion estimation
3.3 Football Data accuracy on players.

The algorithm was also illustrated on football data. Figure 6 5. ACKNOWLEDGEMENT


shows a pair of 352x288 frames used to estimate the motion
vector map for one player in the scene. The weighted The project is sponsored by European Commission
Framework Programme 6 Network of Excellence MUSCLE
tracking angle θ was calculated to be 98.7° using separate
(Multimedia Understanding through Semantics,
metric as compared to the actual angle of 100° .
Computation and Learning) [11].
The parameters were M = 100, N = 10000, ε = 3 , m =
2, V = 40, į = (40,40,40), T = 30. The processing took 17 6. REFERENCES
seconds in C++. The number of fork pixels (m) was set to 2
to maximize the number of matches. V was increased to [1] M.J. Black, P. Anandan, “The robust estimation of multiple
accommodate larger movement between the frames. A motions: parametric and piecewise-smooth flow fields,” CVIU,
lower limit for N was determined by an empirical formula Vol. 63, Issue 1, pp. 75-104
which increases with both the radius of the fork ε and the [2] T. Veit, F. Cao, and P. Bouthemy, “Probabilistic parameter-
view radius V given by free motion detection,” in Proc. of CVPR, Washington, DC, USA,
vol. 1, pp. 715-721, June 27-July 2, 2004.
N ≥ [2 × (ε + V )] .
2
(13) [3] M.J. Black and A.D. Jepson, “Estimating optical flow in
segmented images using variable-order parametric models with
Figure 7 shows second pair of frames used to estimate local deformations,” IEEE Trans. on PAMI, vol. 18, Issue 10, pp.
the motion vector map for one player in the scene using the 972-986, Oct. 1996.
same parameters. It should be noted that the motion [4] P. Viola, M.J. Jones, and D. Snow, “Detecting pedestrians
estimates include that of the camera. using patterns of motion and appearance,” in Proc. of ICCV, Nice,
France, vol. 2, pp. 734-741, Oct. 2003.
[5] M. Kellner and T. Hanning, “Motion detection based on
contour strings,” in Proc. of ICIP, Singapore, vol. 4, pp. 2599-
2602, Oct. 2004.
[6] M. Xu, R. Niu, and P.K. Varshney, “Detection and tracking of
moving objects in image sequences with varying illumination,” in
Proc. of ICIP, Singapore, vol. 4, pp. 2595-2598, Oct. 2004.
Fig. 6. Football frames and motion vector map
[7] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based
visual attention for rapid scene analysis,” IEEE Trans. on PAMI,
vol. 20, Issue 11, pp. 1254-1259, Nov. 1998.
[8] L. Itti and P. Baldi, “A principled approach to detecting
surprising events in video,” in Proc. of CVPR, San Diego, CA,
USA, vol. 1, pp. 631-637, June 2005
Fig. 7. Football frames and motion vector map [9] F. W. M. Stentiford, “An estimator for visual attention through
competitive novelty with application to image compression,”
4. CONCLUSIONS AND FUTURE WORK Picture Coding Symposium, Seoul, pp. 101-104, April 2001
[10] F.W.M. Stentiford, “Attention based similarity,” Pattern
An attention based method has been proposed for motion Recognition (40), pp. 771-783, 2007
detection and estimation. The approach extracts the object [11] Multimedia Understanding through Semantics, Computation
and Learning, 2005. EC 6th Framework Programme, FP6-507752,
displacement between frames by comparing salient regions.
http://www.muscle-noe.org/

III - 516

You might also like