0% found this document useful (0 votes)
4 views29 pages

Chapter 3

This chapter discusses target detection and tracking algorithms for moving objects in video streams, essential for applications like surveillance and traffic monitoring. It introduces a novel Running Gaussian background subtraction technique and an Adaptive background mean shift tracking algorithm, comparing their performance with existing methods. The chapter emphasizes the importance of high precision, flexibility, and efficiency in real-time processing for effective object detection and tracking.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views29 pages

Chapter 3

This chapter discusses target detection and tracking algorithms for moving objects in video streams, essential for applications like surveillance and traffic monitoring. It introduces a novel Running Gaussian background subtraction technique and an Adaptive background mean shift tracking algorithm, comparing their performance with existing methods. The chapter emphasizes the importance of high precision, flexibility, and efficiency in real-time processing for effective object detection and tracking.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

40

CHAPTER 3

TARGET DETECTION AND TRACKING

Detection and tracking of moving objects in a video stream is the


first relevant step of information extraction in computer vision applications
including people tracking, video surveillance, traffic monitoring and semantic
annotation of videos. In these applications, the algorithms for detection and
tracking of objects should be characterized by some important features.

 High precision

 Flexibility in different scenarios (indoor, outdoor)

 Adaptivity in different lighting conditions

 Efficiency to make feasible for real time

The fast execution and flexibility in different scenarios should be


considered as basic requirements to be met. Object detection makes tracking
more reliable (the same object can be identified more reliably from frame to
frame if its shape and position are accurately detected) and faster.

The input video data is obtained from unmanned aerial vehicle. A


dataset of aerial videos are of varying circumstances and properties is
collected by deploying the UAV in real time environment and fed as input
data to the designed algorithms. The target in the video frame is defined by
the user. The respective target is detected by the proposed Running Gaussian
41

background subtraction technique (RGBS). The technique is compared with


few other existing techniques like Temporal frame differencing (TMF),
Running average background subtraction (RABS) and Temporal median
filtering (TMF) techniques. The same video is fed to the target tracking
algorithm. An Adaptive background mean shift tracking (ABMST) technique
is proposed and implemented to track the target of interest. The algorithm is
compared with traditional mean shift tracking technique (TMST) and
Continuously adaptive mean shift technique (CAM Shift) to justify the
efficiency of the proposed algorithm. The work flow diagram is shown in
Figure 3.1.

INPUT VIDEO DATA FROM UAV

TARGET DETECTION TARGET TRACKING


ALGORITHM ALGORITHM

OBJECT BASED FRAME BASED


PERFORMANCE METRICS PERFORMANCE METRICS

Figure 3.1 Work flow diagram

In this chapter the proposed target detection and tracking


algorithms are discussed in detail along with other target detection and
tracking techniques which are used for comparison.
42

3.1 TARGET DETECTION

Detection of moving object from a video stream is a fundamental


but critical task for surveillance application. A generic approach in detection
of moving objects is background subtraction where each incoming video
frame is compared with a generated background model which is assumed as
reference. The pixel of the current video frame that significantly deviates
from the reference frame is considered as elements of moving object. In case
of aerial surveillance, the algorithm should adapt to various issues like change
in illumination, cluttered background, cast shadow, snow and fog. Also, to
accommodate real time feasibility the algorithm should be computationally
inexpensive with low memory requirements.

The basic idea of background subtraction is to subtract the image


from a reference image. This reference image is a model of the background
scene. The basic step shown in Figure 3.2 involves background modeling,
thresholding and subtraction operation.

 Background modeling step involves construction of a


reference image that represents the background.

 In threshold selection step, determination of threshold value


for subtraction operation is done.

 In subtraction step classification of the pixels into background


or moving object is done.

Background Thresholding Background


Modeling Subtraction

Figure 3.2 Steps in Background subtraction


43

Traditional background subtraction is a method typically used to


segment moving regions in image sequences taken from a static camera by
comparing each new frame to a model of the scene background. In this thesis,
a novel non-parametric background modeling and a background subtraction
approach is proposed and implemented. The model can handle situations
where the background of the scene is cluttered and not completely static but
contains small motions such as trees and bushes.

3.1.1 Temporal Frame Differencing

Temporal or adjacent frame differencing is the basic technique for


segmentation of background from the target. It involves subtraction of
consequent frames to generate a background model. It assumes initial frame
as background model and the next incoming frame is subtracted with the
assumed background model. The resultant frame becomes the new
background model and it is compared with the next frame. This procedure is
iteratively done till ‘n’ frames of video data. The concept is explained in
Figure 3.3.

Consider the input video consisting of ‘n’ frames f1, f2…, fn.
Initially the first incoming frame fi is assumed as the initial background
model bi.

bi  fi , i  1 (3.1)

The final background model bi+1 is obtained as the difference


between two consequent frames fi and fi+1.

fi 1  fi  bi 1 , i  1, 2..n (3.2)

where, fi is the ith frame and fi+1 is i+1th frame.


44

The update gi is obtained by subtracting the frame fi with the


resultant background model bi

fi  bi  gi (3.3)

The resultant binary image Ri is obtained by applying a fixed


threshold T. The threshold is predefined ranging from 100 to 120.

1, gi  T
Ri   (3.4)
0, gi  T

Initial frame Initial BG model


(A1)
Bi

Next Frame (fi) bi New BG model


Bi+1

Frame fi+1 bi+1 BG Update gi

Compare with Threshold T

gi < T gi > T

Background (0) Foreground (1)

Figure 3.3 Concept of Temporal Frame differencing

3.1.2 Running Average Background Subtraction

Running average background segmentation also known as Adaptive


mean background subtraction method involves subtraction of consequent
45

frames based on the background model with a learning rate ‘α’. The
thresholding process is done based on the mean intensity value of the
background model generated for ‘n’ frames of the input video. The concept of
adaptive mean background subtraction is described in Figure 3.4.

The first frame of the input video is assumed as background model


bi initially.

bi  fi , i  1 (3.5)

The final background model bi+1 is obtained by the difference


between the frame fi and the initially assumed background model bi blended
with a proportionate fraction α named as tuning factor. The value of α decides
the extent of contribution of the two frames to be subtracted to produce final
background model. The value of α ranges between 0 and 1.

bi 1   fi  (1   )bi (3.6)

The update gi is obtained by subtracting the subsequent frames fi


with the background model bi+1

gi  fi  bi 1 (3.7)

The resultant binary image Ri is obtained based on the value of


threshold T.

1, gi  T
Ri   (3.8)
0, gi  T
46

Initial frame (A1) Initial BG model Bi

Next Frame (fi) bi New BG model


Bi+1

fi α bi BG Update gi

Compare with Threshold T Mean


(Bi+1)
gi < T gi > T

Background (0) Foreground (1)

Figure 3.4 Concept of Running Average background subtraction

Here the threshold is not fixed. It is an adaptive measure obtained


by calculating the mean of all the pixels in each frame of the background
model.

T  mean(bi 1 ) (3.9)

3.1.3 Temporal median filtering

The Temporal median filtering technique (TMF) evolves


classification of target and shadows and definition of background model
accordingly. The concept of TMF is described in Figure 3.5. They define
different constraints as follows
47

 Moving visual object (MVO): A set of connected points


belonging to object.

 Uncovered Background: A set of visible scene points


currently not in motion.

 Background (B): The computed model of the background.

 Ghost (G): A set of connected points detected as in motion by


means of background subtraction, but not corresponding to
any real moving object.

 Shadow: A set of connected background points modified by a


shadow cast over them by a moving object. Shadows can be
classified as a shadow connected with an MVO and ghost
shadow (GSH), being a shadow not connected with any real
MVO.

The known object is defined as

Kot  MVOt   MVO  G  G 


t
sh
t t
sh
(3.10)

For the set of S elements, It(p) is the object pixel, Bt(p) is the
background pixel and wb is the weight.

S  {I t ( p), I t t ( p),..., I t nt ( p)} Wb{Bt ( p)} (3.11)

The final background model is

t t  
 Bst ( p), p  0in MVOt MVO  t
sh
B ( p)   (3.12)

t t
 Bs ( p), p  0in G   G 
t t
sh
48

Initial frame (A1) Store data

Median (D) Histogram Model

Calculate Cumulative Histogram Csum BG Update gi

Compare with Threshold hn

Csum< hn Csum> hn

Background (0) Foreground (1)

Figure 3.5 Concept of Temporal Median filtering

Temporal Median Filter computes the median intensity for each


pixel from all the stored frames in the buffer. Considering the complexity in
computation and storage limitations it is not possible to store all the incoming
video frames and make the decision accordingly. Hence the frames are stored
in a limited size buffer. The estimated background model will be closer to the
real background scene as we grow the size of the buffer. However the speed
of the process will get reduced and also higher capacity storage devices will
be required.

3.1.4 Running Gaussian Background Subtraction

In general, in region based Background subtraction techniques, the


regions other than the object of interest are stationary. Thus obtaining
difference in each frame is easier. But, in aerial surveillance an adaptive
49

measure is required to update the background frame periodically to achieve


effective object detection. The adaptive measure is done in thresholding step
of the proposed technique. The concept is described in Figure 3.6. The basic
steps include background modeling, update, subtraction and thresholding. The
‘n’ number of frames f1, f2…., fn, in video sequence is considered for analysis.
The input frames are converted into Gray scale. Background modeling is the
first step in background subtraction. It constructs the reference image
representing the background. Let bi be the ith background model. It is obtained
from the frame fi.

bi  fi, i  1 (3.13)

The final background model bi+1 is obtained by the resultant input


frame fi and the assumed background model bi. The value of α decides the
extent that both the frames can influence in the next background model. In
general α varies from 0 to 1.

Initial frame (A1) Initial BG model Bi

Next Frame (fi) bi New BG model Bi+1

fi α bi BG Update gi

1
Compare with Threshold T T  i
n i 1n
gi < T gi > T

Background (0) Foreground (1)

Figure 3.6 Concept of Adaptive Gaussian background subtraction


50

bi  1   fi  (1   )bi (3.14)

If α = 0, then bi  1  bi , thus there will be no updation in background


model resulting in ghosting effect. If α = 1, then bi  1  fi . Now, the
background model and the input frame are the same and the result reduces to
Binary 0. Thus it is essential to update the tuning factor to get the desired
result. Let the background update be gi. It is obtained by subtracting the
subsequent input frames fi with the tuned background model bi

gi  fi  bi (3.15)

Thresholding is a process used to classify the input pixels into


background or object based on the value assumed as threshold ‘T’. We
consider an adaptive approach to update the threshold value based on the
statistical properties of the input frame and background model. The resultant
binary image Ri is obtained based on the value of threshold T.

1, gi  T
Ri   (3.16)
0, gi 

The adaptive threshold is decided as the average standard deviation


of each video frame. The value of σ is updated for each frame.

1
T  i
n i 1n
(3.17)

 i 1   ( fi  bi )2  (1   ) i 2 (3.18)

3.1.4.1 Tuning factor optimization

Optimization of the tuning factor α is very important step in


background modeling and updation. The need for optimization of the tuning
51

factor α is that, since the new background model is based on previous


background models and incoming frames, if the value of α is updated
periodically, the noise due to distortion which arises with motion is added
with the image parameters. This reduces generation of background model
effectively. Thus there is a necessity of a optimal value for α that suits for all
cases of aerial platform. Optimization is done with the assumption that
statistical properties of the object differ widely from the background.

Consider a 3 dimensional RGB color space of a pixel wise


statistically modeled background image as in Figure 3.7. Consider a pixel ‘i’
in the color space. Ei be the expected color of the pixel ‘i’ and Ii be the color
intensity value for the pixel ‘i’. The difference between Ii and Ei is
decomposed into brightness (α) and chromaticity (D) components.

G
Ei
α
E Ii

Di

Figure 3.7 RGB color space of a pixel ‘i’

The expected RGB color value for the pixel ‘i’ in the background
image is defined as
52

Ei   ERi , EGi , EBi  (3.19)

Generalizing the expectation parameter for n pixels,

Ei  E  X i  n ;1  n  N (3.20)

The line OE is the expected chromaticity line. The RBG intensity


value of the pixel ‘i’ is

Ii   I Ri , I Gi , I Bi  (3.21)

The brightness distortion is scalar that brings the observed color


nearer to the expected chromaticity line and denoted by  ( i ) ,

 (i )  ( Ii  i Ei )2 (3.22)

Let’s compute distortion of X i  n from its mean Ei by considering

orthogonal distortion parameter  i  n 

i  n    X i  n  Ei U y (3.23)

where

U y is the Unit vector in Y axis

 i  n  is the brightness parameter

Thus the factor α for RGB space is defined as,

X iR  n   EiR
R  n  (3.24)
 iR
53

X iG  n   EiG
G  n   (3.25)
 iG

X iB  n   EiB
B  n  (3.26)
 iB

Since RGB color space is highly correlated, the values of α


obtained in RGB scale are converted into HSI color space. This is done by
normalizing the equations 3.24, 3.25 and 3.26.

I  max  R , G ,  B  (3.27)

I is the intensity value in HSI space which contains the


predominant information for processing. Usually H, S values are eliminated
(Chowdhury et al 2011). Thus the final tuning factor ‘α’ is obtained as,

X i  n   Ei
I  n  (3.28)
i

If the value of α is 1, it means that the brightness of the sample


pixel is same as the reference pixel. If it is lesser than 1, it means a reasonable
change in the brightness between two pixels considered.

The value of α is optimized and varies from 0 to 1 for both static


and dynamic background models. Four videos are used for analysis and in
each video 100 frames are considered. In each frame 500 pixels at random
locations are considered and their intensities are taken for processing. Based
on the considered pixel values the statistical calculations are made to
determine mean, standard deviation and variance. Based on these three factors
the tuning factor is determined and the results are shown in Table 3.1.
54

Table 3.1 Optimization of tuning factor

Standard Tuning
Video Mean Variance
Deviation factor
Video 1 135.091 5123.039 71.575 0.602
Video 2 137.864 2772.981 52.659 0.507
Video 3 125.727 1084.398 32.930 0.498
Video 4 124.955 1786.522 42.267 0.528

In Table 3.1, for the algorithm Adaptive Gaussian Background


subtraction the average tuning factor for video 1 to 4 are 0.602, 0.507, 0.498
0.528 respectively. The overall tuning factor is approximately around 0.53.
Thus the value for α may be assigned from varying range of 0.5 to 0.6 for
efficient segmentation.

3.2 TARGET TRACKING

Target tracking is a method to track single or multiple objects in a


sequence of frames. Tracking can be done in forward or backward motion.
Mean Shift Tracking is a forward tracking method. It estimates the position of
object in the current frame based on the previous frame. It is gradient ascent
approach that models the image region to be tracked as histogram. It finds the
local maxima of the density function from the data samples. Mean Shift
Tracking is a non-parametric, real time user interfaced kernel based tracking
technique which provides accurate localization and matching of target without
expensive search. It works on iterative passion. It computes the mean shift value
for the current position of pixel and shifts to new position which has the mean
shift value and continues till it satisfies the conditions.

Here, a kernel function is defined which calculates the distance


between sample points considered and the mean shift point. A weight co-
55

efficient is included which is inversely proportional to the distance. As the


distance is closer, the weight co-efficient is larger.

Mean Shift Tracking plays a vital role in the area of target tracking
due to its robustness and computational efficiency. However, the traditional
mean Shift Tracking assumes that the target differs significantly from
background. But, in cases like aerial videos it is difficult to discriminate the
background and the target. Thus, the traditional technique cannot adaptively
catch up to the dynamic changes and thus results in failure. The concept of
mean shift clustering is depicted in Figure 3.8.

Frame n

Frame 1

Mean Shifted center point

Initial center point

Figure 3.8 Concept of mean shift clustering

3.2.1 Traditional mean shift tracking

The traditional mean shift tracking tracks the target region which is
the region of interest of the entire image. It is a simple iterative procedure that
shifts the position of the data point to the mean position of the data cluster.
Here the target model is the density of previous region in a frame and
56

candidate model is the density of the region in the next frame. Target model
and candidate model and defined by same kernel function to define the
region. The tracking procedure involves definition of target and candidate
model, calculation of similarity, defining new position to the target model,
calculation of the distance from current position to the mean position in next
frame and shifting to the new mean position in the next frame. When the
distance between the position of target in current frame and next frame
exceeds a threshold, the current region becomes the new previous region and
the procedure is repeated till it converges to the mean position of the target.

Mean Shift Tracking undergoes two steps – target appearance


description and tracking. The color histogram description is obtained by
classifying all the pixels in the area of target and estimating the probability of
each color. Then, in the next frame the most similar pixels are found by Mean
Shift Tracking. This is done by similarity measure function. Then calculating the
shift vector is done which maximizes the similarity between target histogram and
candidate histogram. Then it converges to the position with maximum similarity.
The classical MST is insensitive to non rigid transformations target location and
overlap. The work flow is described in Figure 3.9.

Let point ŷ0 be the initial position of the previous frame. The

respective model is defined as qˆu u 1..m for m bins of color histogram. The

candidate model is defined as { pˆ u ( yˆ0 )} for position ŷ0 and the distance
measure is evaluated as

m
[ pˆ ( yˆ0 ), qˆ ]   pˆ u ( yˆ0 ), qˆu (3.29)
u 1

The weight vector wi i 1..nh for ‘n’ frames of bandwidth ‘h’ is
derived as
57

m
qˆu
wi    [b( xi )  u ] (3.30)
u 1 pˆ u ( yˆ0 )

The next location for the candidate model is defined as ŷ1 ,

yˆ  x
nh 2

 xi wi g ( 0 i )
h
yˆ1  i 1 (3.31)
yˆ 0  xi
nh 2


i 1
w i g (
h
)

TARGET MODEL CANDIDATE MODEL


INITIALIZATION AND LOCALIZATION INITIALIZATION AND LOCALIZATION

NEXT LOCATION OF TARGET NEXT LOCATION OF CANDIDATE


MODEL MODEL

SIMILARITY

DISTANCE MEASURE

SHIFT TO NEW POSITION

Figure 3.9 Traditional Mean shift tracking workflow diagram

The similarity is calculated by Bhattacharya coefficient (Equations


3.59-3.62). For the new position ŷ1 , the distance metric is calculated as

m
[ pˆ ( yˆ1 ), qˆ ]   pˆ u ( yˆ1 ), qˆu (3.32)
u 1

The distance metrics for current frame and incoming frame is


compared. If [ pˆ ( yˆ1 ), qˆ ]  [ pˆ ( yˆ0 ), qˆ ] , then the new position ŷ1 is,
58

1
yˆ1  ( yˆ0  yˆ1 ) . (3.33)
2

And, if yˆ0  yˆ1   , the iteration is concluded. If the value exceeds

threshold, then the incoming frame is assumed as previous frame ( yˆ0  yˆ1 )
and the procedure is repeated.

3.2.2 Continuously adaptive mean shift tracking

The Continuously Adaptive Mean Shift Algorithm (CAM Shift) is


an adaptation of the Mean Shift algorithm for object tracking in an arbitrary
number and type of feature spaces. The algorithm is based on modification of
mean shift concept, it calculates the probability density of the image of the
distribution and iterates in the direction of the maximum probability density
(mode).. The concept is described in Figure 3.10.

Define ROI of target model

Define Color Histogram

Calculate Moments

Calculate ratioed Histogram

Calculate orientation and scale

Shift to new position

Figure 3.10 Concept of CAM Shift algorithm


59

The first step in CAM Shift algorithm is the definition of


histogram. It is done by associating the pixel value to the corresponding
histogram bin. The m-bin histogram defined for a pixel location xi is
computed as

n
qˆu    [c( xi*  u )] (3.34)
i 1

After generation of m-bin histogram, the mean location (centroid)


within the search window is defined. The zero, first and second order
moments for the pixel (x,y) with intensity I are computed as follows

M 00   x  y I ( x, y) (3.35)

M10   x  y xI ( x, y) (3.36)

M 01   x  y yI ( x, y) (3.37)

M 20   x  y x 2 I ( x, y) (3.38)

M 02   x  y y 2 I ( x, y) (3.39)

M11   x  y xyI ( x, y) (3.40)

Thus the mean search window location is

M 10
xc  (3.41)
M 00

M 01
yc  (3.42)
M 00
60

Thus the target model based on the histogram is defined by

  [c(x  u)]
n
qˆu   k xi*
2 *
i (3.43)
i 1

Here, k(x) is convex, monotonically decreasing kernel profile that


assigns higher value to pixels near to center of search window

1  r , r  1
k ( x)   (3.44)
0, otherwise

The weighted histogram is not sufficient to localize the target. Thus


a ratio histogram solves the issue by assigning color features to the
background with lower weight. With ‘a’ as scaling factor and ‘h’ as
bandwidth,

ar ,1  r  h
k ( x)   (3.45)
0, otherwise

Thus the background weighted histogram becomes

  [c(x  u)]
n
qˆu  wˆ u  k xi*
2 *
i (3.46)
i 1

The orientation (θ) and scale is defined and the length (l) is defined
based on the intermediate values as follows

M 20
a  xc2 (3.47)
M 00

M 
b  2  11  xc yc  (3.48)
 M 00 
61

M 02
c  yc2 (3.49)
M 00

From these intermediate values the orientation and scale is defined


as

1  b 
  tan 1   (3.50)
2  ac 

The distance d1 and d2 from the distribution is defined as

( a  c)  b 2  ( a  c) 2
d1  (3.51)
2

(a  c)  b 2  (a  c) 2
d2  (3.52)
2

Based on these values of scale and orientation, the new position is


determined and shifted.

3.2.3 Adaptive background Mean shift tracking

The accuracy in mean shift tracking depends on generation of


candidate model which is used to compare with the position of target in
incoming frames. The position of target may fall into three cases as shown in
Figure 3.11.

Case 1: The current position is a target (Vehicle A in Figure 3.11)

Case 2: The position is not a target, but possesses same color properties of
target (Vehicle B in Figure 3.11)
62

Case 3: The current position is not a target and it does not have any
similarity with target (Vehicle C in Figure 3.11)

The method of defining the background position depends on the


information from the previous frame. First frame detects the target and creates
the target model. In next frame candidate model is created. Based on the
difference between target model and candidate model the window is shifted.
In the CAM shift technique the new position is calculated from the scale and
orientation of the centroid position. It is suitable only for the cases where the
frame movement is linear and stable.

Figure 3.11 Target Definition - The mark A is the target, B is not target
but with similar properties of target, C is totally different
from target

In cases of unmanned aerial vehicle based videos, there will be


shear and angular displacement. Thus the new position of a single point does
not help in exact localization of the target in the incoming frame. A total
model of the background of target which includes the distance and angle at
each pixel in the defined target window is required. This requirement is
63

fulfilled in adaptive background mean shift tracking technique. The concept is


depicted in Figure 3.12.

Step 1: Initialization Phase:

Assume the target model with center position ‘y’. The bin size, area
and the color system of the target model is defined. The initialization is done
with Epanechnikov kernel shown in Figure 3.13.

1  x 2 , if 0  x  1
E ( x)   (3.53)
0, otherwise

CREATE TARGET MODEL CREATE CANDIDATE MODEL


(Color system, Bin size, Object definition)

Similarity

Yes No

Target Candidate model Define Area, New Background model

Shift to new position

Figure 3.12 Concept of Adaptive background mean shift tracking

Step 2 : Generation of Models:

The target model denoted as qu* and the candidate model denoted as

pu* are generated. The initial window is stored. The target model q 
*
u u 1,2,..m is

generated for all bins of an m-bin histogram. An m-bin histogram is a chart


64

with ‘m’ number of bins where each bin depicts the number of pixels in same
color.

Let f ( xi ) be the color index in position xi , ‘m’ be the number of


bins, ‘h’ be the bandwidth and ‘n’ be the pixels defined in the current
window,

n
 y  xi 
qu*  N  k    u  f ( xi )  (3.54)
i 1  h 

Here,  u ( x) is the Kronecker function or delta function is a function


of two variables which is 1 if they are equal. It is defined as,

1, x  u  0
 u ( x)   (3.55)
0, otherwise

Figure 3.13 Epanechnikov Kernel

N is the normalization constant, since the sum of all pixel


probabilities yield 1. Thus, the constant N becomes,

1
 k  y  xi  
N     (3.56)
 i 1  h  
65

After creation of target model, the candidate model pu* is

generated. Since the candidate model varies with each frame an index is
assigned for each frame. Let f ( x p ) be the index of the bin in previous frame

and f ( xc ) be the index of the bin in current frame.

 n  y  xi 
 E    u ( f ( xc ))  g ( xi ) u ( x p ), f ( xc )  f ( x p )
 i 1  h 
pu ( y0 )  
* *
(3.57)
 E  y  xi   ( f ( x )), otherwise
n

   u c
 i 1  h 

For pu* ( y0* )  0 ,

pu* ( y0* )
pu* ( y0* )  m
(3.58)
 p (y )
u 1
*
u
*
0

Step 3: Similarity Definition and Transition to new position:

The similarity function defines the distance between the target


model and candidate model. The maximum multiplication between
probabilities of target model qu* and candidate model pu* will give least error.
For an m x m window in an m-bin histogram, the probability of histogram is
calculated for target model qu* with each candidate model pu* .

The similarity window is calculated by Bhattacharya Co-efficient.


Bhattacharya measures the dissimilarity between distributions of features
such as color and texture. It is a simple geometric interpretation as the cosine
of the angle between the N dimensional vectors. For two identical
distributions
66

N N N
cos( )   p(i) p* (i)  p(i ) p(i )  p(i ) 1 (3.59)
i 1 i 1 i 1

Thus the distance axiom is defined as

d ( p, p* )  1   ( p, p* ) (3.60)

By the axiom of Bhattacharya coefficient,

m
[ pu* ( y0* ), qu* ]   pu* ( y0* ), qu* (3.61)
u 1

d ( y)  1   ( p* ( y), q* ) (3.62)

After determining the distance between the target model qu* and

candidate model pu* , the new position y1* for the ‘n’ pixels in the current
window is defined.

n m
qu*
 xi
i 1 u 1 pu* ( y0* )
 ( f ( xi ))
y1*  (3.63)
n m
qu*

i 1 u 1 pu* ( y0* )
 ( f ( xi ))

Step 4: Creating new Background position:

Let J(i) be the new background position for a pixel index ‘i’.
Consider an area Axy in the new window y1* . For the pixels ranging from 1 to

n (i=1…..n), if the area of window y1* and y0* are equal, then J(i) = 0. Else if
they are not equal, then J(i) =1.
67

1, Axy1  Axy 0  0


J (i)   (3.64)
0, otherwise

If J(i) is 1, then there is a reasonable shift in position of


background. Thus the shift vector ‘l’ is defined as follows,

y1*x  y0* x
x  (3.65)
2

y1*y  y0* y
y  (3.66)
2

l  x 2  y 2 (3.67)

The concept of defining background is described in Figure 3.14.

New Background

Next Fame

Next Fame

Length & Slope


Current Frame Current Frame

Figure 3.14 Generation of new Background

Thus the new temporary center y2* is obtained by defining the


distance and slope,
68

y2* y  y1*y y
m  (3.68)
y y
*
2x
*
1x x

y  y 
2 2
l *
2y  y1*y *
2x  y1*x (3.69)

Generalizing for all frames,

 * l lm
 ynx  y(*n 1) x  *
, yny  yny
*
 , if x  0
 m 1 2
m2  1
 *  y(*n 1) x , yny
*
 y(*n 1) y  l , x  0, y  0
yn   ynx
*
(3.70)
 y* y *
,y  y
* *
 l , x  0, y  0
 nx ( n 1) x ny ( n 1) y

 ynx
*
 y(*n 1) x , yny
*
 y(*n 1) y , x  0, y  0

Step 5: Reassigning the position of window:

The position of window y1* for the candidate model is

m
[ p1* ( y1* ), q* ]   pu* ( y1* ), qu* (3.71)
u 1

if [ p1* ( y1* ), q* ]  [ p1* ( y0* ), q* ] , then the position y1*

1 *
y1*  ( y1  y0* ) (3.72)
2

Thus in general,

1 *
yn*  ( yn  yn*1 ) (3.73)
2

The similarity is detected between target model and candidate


model and adjusted till the difference is matched.

You might also like