Chapter 3
Chapter 3
CHAPTER 3
High precision
Consider the input video consisting of ‘n’ frames f1, f2…, fn.
Initially the first incoming frame fi is assumed as the initial background
model bi.
bi fi , i 1 (3.1)
fi 1 fi bi 1 , i 1, 2..n (3.2)
fi bi gi (3.3)
1, gi T
Ri (3.4)
0, gi T
gi < T gi > T
frames based on the background model with a learning rate ‘α’. The
thresholding process is done based on the mean intensity value of the
background model generated for ‘n’ frames of the input video. The concept of
adaptive mean background subtraction is described in Figure 3.4.
bi fi , i 1 (3.5)
bi 1 fi (1 )bi (3.6)
gi fi bi 1 (3.7)
1, gi T
Ri (3.8)
0, gi T
46
fi α bi BG Update gi
T mean(bi 1 ) (3.9)
For the set of S elements, It(p) is the object pixel, Bt(p) is the
background pixel and wb is the weight.
t t
Bst ( p), p 0in MVOt MVO t
sh
B ( p) (3.12)
t t
Bs ( p), p 0in G G
t t
sh
48
Csum< hn Csum> hn
bi fi, i 1 (3.13)
fi α bi BG Update gi
1
Compare with Threshold T T i
n i 1n
gi < T gi > T
bi 1 fi (1 )bi (3.14)
gi fi bi (3.15)
1, gi T
Ri (3.16)
0, gi
1
T i
n i 1n
(3.17)
i 1 ( fi bi )2 (1 ) i 2 (3.18)
G
Ei
α
E Ii
Di
The expected RGB color value for the pixel ‘i’ in the background
image is defined as
52
Ei E X i n ;1 n N (3.20)
Ii I Ri , I Gi , I Bi (3.21)
(i ) ( Ii i Ei )2 (3.22)
i n X i n Ei U y (3.23)
where
X iR n EiR
R n (3.24)
iR
53
X iG n EiG
G n (3.25)
iG
X iB n EiB
B n (3.26)
iB
I max R , G , B (3.27)
X i n Ei
I n (3.28)
i
Standard Tuning
Video Mean Variance
Deviation factor
Video 1 135.091 5123.039 71.575 0.602
Video 2 137.864 2772.981 52.659 0.507
Video 3 125.727 1084.398 32.930 0.498
Video 4 124.955 1786.522 42.267 0.528
Mean Shift Tracking plays a vital role in the area of target tracking
due to its robustness and computational efficiency. However, the traditional
mean Shift Tracking assumes that the target differs significantly from
background. But, in cases like aerial videos it is difficult to discriminate the
background and the target. Thus, the traditional technique cannot adaptively
catch up to the dynamic changes and thus results in failure. The concept of
mean shift clustering is depicted in Figure 3.8.
Frame n
Frame 1
The traditional mean shift tracking tracks the target region which is
the region of interest of the entire image. It is a simple iterative procedure that
shifts the position of the data point to the mean position of the data cluster.
Here the target model is the density of previous region in a frame and
56
candidate model is the density of the region in the next frame. Target model
and candidate model and defined by same kernel function to define the
region. The tracking procedure involves definition of target and candidate
model, calculation of similarity, defining new position to the target model,
calculation of the distance from current position to the mean position in next
frame and shifting to the new mean position in the next frame. When the
distance between the position of target in current frame and next frame
exceeds a threshold, the current region becomes the new previous region and
the procedure is repeated till it converges to the mean position of the target.
Let point ŷ0 be the initial position of the previous frame. The
respective model is defined as qˆu u 1..m for m bins of color histogram. The
candidate model is defined as { pˆ u ( yˆ0 )} for position ŷ0 and the distance
measure is evaluated as
m
[ pˆ ( yˆ0 ), qˆ ] pˆ u ( yˆ0 ), qˆu (3.29)
u 1
The weight vector wi i 1..nh for ‘n’ frames of bandwidth ‘h’ is
derived as
57
m
qˆu
wi [b( xi ) u ] (3.30)
u 1 pˆ u ( yˆ0 )
yˆ x
nh 2
xi wi g ( 0 i )
h
yˆ1 i 1 (3.31)
yˆ 0 xi
nh 2
i 1
w i g (
h
)
SIMILARITY
DISTANCE MEASURE
m
[ pˆ ( yˆ1 ), qˆ ] pˆ u ( yˆ1 ), qˆu (3.32)
u 1
1
yˆ1 ( yˆ0 yˆ1 ) . (3.33)
2
threshold, then the incoming frame is assumed as previous frame ( yˆ0 yˆ1 )
and the procedure is repeated.
Calculate Moments
n
qˆu [c( xi* u )] (3.34)
i 1
M 00 x y I ( x, y) (3.35)
M10 x y xI ( x, y) (3.36)
M 01 x y yI ( x, y) (3.37)
M 20 x y x 2 I ( x, y) (3.38)
M 02 x y y 2 I ( x, y) (3.39)
M 10
xc (3.41)
M 00
M 01
yc (3.42)
M 00
60
[c(x u)]
n
qˆu k xi*
2 *
i (3.43)
i 1
1 r , r 1
k ( x) (3.44)
0, otherwise
ar ,1 r h
k ( x) (3.45)
0, otherwise
[c(x u)]
n
qˆu wˆ u k xi*
2 *
i (3.46)
i 1
The orientation (θ) and scale is defined and the length (l) is defined
based on the intermediate values as follows
M 20
a xc2 (3.47)
M 00
M
b 2 11 xc yc (3.48)
M 00
61
M 02
c yc2 (3.49)
M 00
1 b
tan 1 (3.50)
2 ac
( a c) b 2 ( a c) 2
d1 (3.51)
2
(a c) b 2 (a c) 2
d2 (3.52)
2
Case 2: The position is not a target, but possesses same color properties of
target (Vehicle B in Figure 3.11)
62
Case 3: The current position is not a target and it does not have any
similarity with target (Vehicle C in Figure 3.11)
Figure 3.11 Target Definition - The mark A is the target, B is not target
but with similar properties of target, C is totally different
from target
Assume the target model with center position ‘y’. The bin size, area
and the color system of the target model is defined. The initialization is done
with Epanechnikov kernel shown in Figure 3.13.
1 x 2 , if 0 x 1
E ( x) (3.53)
0, otherwise
Similarity
Yes No
The target model denoted as qu* and the candidate model denoted as
pu* are generated. The initial window is stored. The target model q
*
u u 1,2,..m is
with ‘m’ number of bins where each bin depicts the number of pixels in same
color.
n
y xi
qu* N k u f ( xi ) (3.54)
i 1 h
1, x u 0
u ( x) (3.55)
0, otherwise
1
k y xi
N (3.56)
i 1 h
65
generated. Since the candidate model varies with each frame an index is
assigned for each frame. Let f ( x p ) be the index of the bin in previous frame
n y xi
E u ( f ( xc )) g ( xi ) u ( x p ), f ( xc ) f ( x p )
i 1 h
pu ( y0 )
* *
(3.57)
E y xi ( f ( x )), otherwise
n
u c
i 1 h
pu* ( y0* )
pu* ( y0* ) m
(3.58)
p (y )
u 1
*
u
*
0
N N N
cos( ) p(i) p* (i) p(i ) p(i ) p(i ) 1 (3.59)
i 1 i 1 i 1
d ( p, p* ) 1 ( p, p* ) (3.60)
m
[ pu* ( y0* ), qu* ] pu* ( y0* ), qu* (3.61)
u 1
d ( y) 1 ( p* ( y), q* ) (3.62)
After determining the distance between the target model qu* and
candidate model pu* , the new position y1* for the ‘n’ pixels in the current
window is defined.
n m
qu*
xi
i 1 u 1 pu* ( y0* )
( f ( xi ))
y1* (3.63)
n m
qu*
i 1 u 1 pu* ( y0* )
( f ( xi ))
Let J(i) be the new background position for a pixel index ‘i’.
Consider an area Axy in the new window y1* . For the pixels ranging from 1 to
n (i=1…..n), if the area of window y1* and y0* are equal, then J(i) = 0. Else if
they are not equal, then J(i) =1.
67
y1*x y0* x
x (3.65)
2
y1*y y0* y
y (3.66)
2
l x 2 y 2 (3.67)
New Background
Next Fame
Next Fame
y2* y y1*y y
m (3.68)
y y
*
2x
*
1x x
y y
2 2
l *
2y y1*y *
2x y1*x (3.69)
* l lm
ynx y(*n 1) x *
, yny yny
*
, if x 0
m 1 2
m2 1
* y(*n 1) x , yny
*
y(*n 1) y l , x 0, y 0
yn ynx
*
(3.70)
y* y *
,y y
* *
l , x 0, y 0
nx ( n 1) x ny ( n 1) y
ynx
*
y(*n 1) x , yny
*
y(*n 1) y , x 0, y 0
m
[ p1* ( y1* ), q* ] pu* ( y1* ), qu* (3.71)
u 1
1 *
y1* ( y1 y0* ) (3.72)
2
Thus in general,
1 *
yn* ( yn yn*1 ) (3.73)
2