0% found this document useful (0 votes)
45 views6 pages

Multiple Video Object Tracking Using Variational Inference: Dmitry Kangin, Denis Kolev and Garik Markarian

This document proposes a Bayesian filter approximation for simultaneous multiple target detection and tracking using variational inference on Gaussian mixtures. The filter is capable of real-time processing and can be used for data fusion. It was tested on video with a dynamic background, using velocity relative to the background to distinguish objects. The framework is not dependent on feature space, allowing different feature spaces while maintaining the filter structure.

Uploaded by

Arthas8890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views6 pages

Multiple Video Object Tracking Using Variational Inference: Dmitry Kangin, Denis Kolev and Garik Markarian

This document proposes a Bayesian filter approximation for simultaneous multiple target detection and tracking using variational inference on Gaussian mixtures. The filter is capable of real-time processing and can be used for data fusion. It was tested on video with a dynamic background, using velocity relative to the background to distinguish objects. The framework is not dependent on feature space, allowing different feature spaces while maintaining the filter structure.

Uploaded by

Arthas8890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multiple Video Object Tracking Using Variational

Inference

Dmitry Kangin, Denis Kolev and Garik Markarian


School of Computing and Communications, Infolab21,
Lancaster University, Lancaster LA1 4YW, U.K.
and
R&D department
Rinicom Ltd.
Riverway House, Morecambe Road, Lancaster, Lancashire LA1 2RX
email: [email protected], [email protected], [email protected]

Abstract—In this article a Bayesian filter approximation is Here X k denote hidden variables, or states, of the filter
proposed for simultaneous multiple target detection and tracking on the k-th step, and Z k are the visible variables, or mea-
and then applied for object detection on video from moving surements, Z 1...(k) denote the measurements up to the step
camera. The inference uses the evidence lower bound optimisation k. Generally, both hidden variables and measurements can
for Gaussian mixtures. The proposed filter is capable of real time contain the sets of variables, because we can consider many
data processing and may be used as a basis for data fusion.
The method we propose was tested on the video with dynamic
measurements and many targets described by the state. The
background,where the velocity with respect to the background is update stage is usually carried out using either Maximum A
used to discriminate the objects. The framework does not depend Posteriori (MAP) or Minimum Mean Square Error (MMSE)
on the feature space, that means that different feature spaces can estimate [6].
be unrestrictedly used while preserving the structure of the filter.
The stated problem can also be considered as a time-
Keywords—Multiple object tracking, Bayesian filtering, Varia- consistent clustering. Given the set of measurements, we assign
tional Gaussian mixtures each of the measurements label, which is either some object
or clutter, and consistently update the clustering with the same
I. I NTRODUCTION labels on the same data during the algorithm operation. We
Up to date, the approximations of the Bayesian filter emphasise here that in this problem statement we do not
models gained widely renowned popularity for object tracking. assume here point targets, as it is done in many multiple object
However, the difficulties appear when fully automatic object tracking methods like [1], [2] but perform time consistent
detection is needed, and even worse, we do not know the clustering.
number of the objects. In this case, we need to solve multiple In section II state-of-the-art methods for multiple object
object tracking problem with clutter which can be formulated tracking are reviewed. In section III the proposed Bayesian
as follows. filter is formulated for general, domain unspecific case. After
Let us have measurements which can be assigned either to then, in section IV, the algorithm, capable of unsupervised
one of the objects or to clutter, and each object may include object detection and tracking, is proposed for video tracking.
several measurements. The aim is to find the object mea- After then, in section V, some experiments were carried out in
surements within the measurement set, to assign them to the order to show the capabilities of the algorithm for unsupervised
objects given pre-defined dependency model, and to determine object detection and tracking of the objects from moving
the characteristics of the objects using the measurements. For camera, followed by the conclusion.
each object we assume that the measurements, corresponding
to this object, are close to each other in terms of some feature II. S TATE - OF - THE - ART
space. The noise measurements are close to each other but
not homogeneous, and they are supposed to have substantial The variability of the state-of-the-art solutions arises from
differences comparing to the objects’ measurements. Also we the different models behind the motion detection, and on
assume that the noise measurements constitute majority in the miscellaneous approximations even for the same or similar
measurement set. models. For example, for well-known PHD filter there are a
lot of recursive approximations based on particle filters and
The Bayesian filter recursion is decomposed into two steps:
Gaussian mixtures [3], [4], [5].
p(X k |Z 1...(k−1) ) =
=
R
p(X |X k−1 )p(X k−1 |Z 1...(k−1)) dX k−1 ,
k (1) Classical Bayesian approaches to approximate inference
(Prediction), for multi-target tracking include different implementations of
Multiple Hypothesis Tracking (MHT) approach [7] and Joint
Probabilistic Data Association (JPDA) filter [8]. The problem
p(X k |Z 1...k ) ∝ p(Z k |X k )p(X k |Z 1...(k−1) ) of tracking in the case of heavy clutter can be solved by R-
(2)
(Update). RANSAC algorithm [9], which extends RANSAC algorithm
[10] to tackle with a case when most of the measurements are A. Prediction step
clutter.
The prediction step do not use optimisation techniques and
MHT approach looks for the most probable assignment relies on the assumption of the factorisation of the probability
of the targets to the measurements. It provides a natural QK
solution of the simultaneous tracking and detection problem p(X k |Z 1..k−1 ) = i=1 p(µki |Z 1..k−1 )×
QK (5)
for the unknown number of targets. However, as the number × i=1 p(Σki |Z 1..k−1 ) × p(π k |Z 1..k−1 ).
of hypotheses grow exponentially for each stage, the ways to
restrict the count of the hypotheses is needed as well to avoid Then, let us consider all these probabilities separately.
solving NP-hard problem. One of the possible solutions is to
We assume the prediction model for means is given by the
prune the least probable hypotheses [11].
transition probability
JPDA differs from MHT approach in terms of handling the p(µki |Z 1..k−1 ) = N (µki |Ui µk−1 + Ti , Ψki ), k > 1. (6)
i
data association, i.e. unveiling the relation between the targets
and measurements. JPDA approach [8] uses the weighted sum Here Ui is the between-frame rotation matrix, Ti is the
of the hypotheses on the association. To make this procedure between-frame transition, both parameters are determined us-
feasible, gating is applied, which helps to factor out abrupt ing Kabsch algorithm [16] over the subset of Dk , previously
target state changes which are usually impossible for many assigned to the k-th cluster, and the details of its application
problems. are described further in the object detection section. Ψki is
the covariance matrix over the L2 -errors of the subset of
the features set Dk , previously assigned to the k-th cluster,
III. T HE MULTIPLE OBJECT TRACKING FILTER calculated as in formula (12).

The multiple object tracking filter proposed in this article The probabilities for the covariance matrices have more
propagates time-consistent mixture of Gaussians between the convenient representation in terms of the precision matrices.
video frames. The clutter and the targets are treated the same We denote precision matrices Λki = [Σki ]−1 and assume the
way in this framework. Distinguishing between these types of following (heuristical) transition
cluster occurs on the detection stage and is not carried out by p(Λki |Z 1...k−1 ) = W(Λki |Wik−1 , νik−1 ), (7)
the tracker.
where W is the Wishart distribution, Wik−1 , νik−1 is its
Here the model is defined within the Bayesian filter frame- parameters, derived from the previous stage, non-negative
work and then describe the solution based on the variational definite scale matrix and degrees of freedom, correspondingly,
approximate inference. Λk−1 = (νik−1 − l − 1)Wik−1 , where l is the feature space
i
First, we define hidden and visible varibles for the formulae dimensionality. The form of the distribution allows to preserve
(1) and (2). the mean of covariance matrix.
For Gaussian weights, the prediction step is performed as
The visible variables set is the features set Dk =
{d1 , d2 , . . . , dknk }, built upon the feature point tracks on the
k k
!
k−1
k nk α

k 1...k−1
k-th frame. p(π |Z ) = Dir π PK k−1 (8)
i=1 αi

We assume that these visible variables are generated from
the mixture of K Gaussians, where the number K is pre- Here Dir(·) is a Dirichlet distribution, and nk is a number of
defined, and the Gaussians are described by the sets µk measurements on the k-th stage.
for Gaussian means, Σk for Gaussian covariance matrices,
and π k for Gaussian weights within the Gaussian mixture B. Update step
for the k-th frame, where µk = {µk1 , µk2 , . . . , µkK }, Σk = On the update step, we need to solve the problem of MAP
PK
{Σk1 , Σk2 , . . . , ΣkK }, π k = {π1k , π2k , ... . . . πK
k
}, i=1 πik = 1: distribution approximation.
K
X We consider
dki ∼ p(dki |µk , Σk , π k ) = πjk N (dki |µkj , Σkj ). (3)
j=1
p(X k |Z 1...k ) ∝
(9)
∝ p(Z |µ , Λ , π k )p(µk , Λk , π k |Z 1...(k−1) ).
k k k
We substitute these quantities into the Bayesian recursion
as: To derive this posterior probability, we use approximate
p(Z k |X k ) = p(dki |µk , Σk , π k ). (4) inference according to [15]. We consider joint distribution
p(Z k , V k , π k , µk , Λk ) =
(10)
The quantity p(X k |Z 1...(k−1) ) is calculated on the pre- = p(Z |V , µk , Λk )p(V k |π)p(π k )p(µk |Λk )p(Λk ).
k k

diction state and relies on the probability p(X k |X k−1 ) =


p(µk , Σk , π k |µk−1 , Σk−1 , π k−1 ) and the previous stage pos- Here V k , referred as latent variables, are the set of nk
terior probability p(Z k−1 |X k−1 ). binary vectors of the size 1 × K, each summing up to
Algorithm 1 Variational Bayesian filter approximation algo-
rithm
1: procedure M AIN
2: Tracks = ∅;
3: while fetch frame Ik from video stream do
4: Tracks = calculateAndTrackFeaturePoints (Ik , Pre-
viousTracks);
5: Features = calculateFeaturesFromTracks (Tracks);
6: FeaturesClusters = clusterTracks (Features, Fea-
turesClusters);
7: detectObjects(Tracks, Features, FeaturesClusters);
8: end while
9: end procedure
10: procedure N EW T RACKS =
CALCULATE A ND T RACK F EATURE P OINTS (Ik ,
Fig. 1. The algorithm workflow P REV T RACKS )
11: [FBErr, trackedPoints] = trackLu-
casKanadeFB(PrevTracks, Ik );
1, showing which component of the Gaussian mixture the 12: NewTracks are PrevTracks(FBErr < FBErrThreshold)
observation is sampled from. concatenated with corresponding trackedPoints;
13: NewTracksTmp = Detect new points using non-
After this stage, one can formulate the variational ap- maximum suppression and create new tracks;
proximation for the posterior probability factorising between 14: NewTracks = union (NewTracks, NewTracksTmp);
the parameters and latent variables according to [15], which 15: end procedure
allows to obtain the equations for the iterative update of the 16: procedure FEATURES = CALCULATE F EATURES F ROM -
parameters. T RACKS ( TRACKS )
p̃(V k , π k , µk , Λk ) = p(V k )p(π, µ, Λ). (11) 17: Initialise MatureTracks as tracks with NMature points.
18: [Rotation, Shift] = Kabsch
The solution is provided using Variational Expectation- (MatureTracksk−N M ature+1 , MatureTracksk ), where
Maximisation [15] framework. MatureTracksk are the points in mature tracks from the
k−th frame;
IV. T HE VIDEO TRACKING ALGORITHM DESCRIPTION 19: Features = ∅;
20: for each track from MatureTracks do
The proposed video tracking filter was applied to multiple 21: features (track) = [KabschDifference (track, Rota-
target tracking on video. There exist different techniques tion, Shift), trackk ];
for video tracking, mostly based on distinction between the 22: end for
clutter and targets, which is a part of tracking model, and 23: end procedure
featuring data association techniques. Instead of this, here 24: procedure CLUSTER I DS = CLUSTERT RACKS ( FEATURES ,
these techniques are avoided, but Mixture of Gaussians model FEATURES C LUSTERS )
is propagated in a time-consistent way, as a time-consistent 25: for each row in features do
clustering. This method allows to decompose object tracking 26: PredictStep (features, featuresClusters); (sec. III.A)
and object detection stages. 27: UpdateStep (features, featuresClusters); (sec. III.B)
The video processing algorithm workflow is depicted in 28: end for
figure 1. First, the feature points detection is carried out, using 29: end procedure
well known Harris algorithm [12]. Then, the tracks are com- 30: procedure DETECT O BJECTS (T RACKS , F EATURES , F EA -
posed based on the feature points optical flow tracking. Then, TURES C LUSTERS )
the Bayesian filter tracking is carried out, which supports the 31: Initialise MatureTracks as tracks with NMature points.
mixture of Gaussians model update from frame to frame. Then, 32: [Rotation, Shift] = Kabsch
the object detection is used to factor out the objects. We use (MatureTracksk−N M ature+1 , MatureTracksk );
the criterion that the object should have discernible movement 33: diff = ∅;
within the frame using the background velocity model. This 34: for all t in tracks do
model is estimated from the clusters with the largest support. 35: diff(t) = KabschDifference (t, rotation, Shift);
This criterion is pretty straightforward and can be replaced 36: end for
depending on the practical applications. Then, the detected 37: VarianceDiff = calculate variance (diff);
objects are outputted as an outcome from the algorithm. The 38: diffSorted = sortAscending(diff);
algorithm 1 shows the overall video analysis and tracking 39: calculate velocity threshold T using the formula (14).
algorithm based variational Bayesian filter approximation. 40: for each cluster k mark it as detected if for most of the
cluster’s points estimated velocity overcomes the threshold
T.
A. Feature point detection
41: end procedure
In this research we used well-known Harris corner point 42: procedure DIFF = K ABSCH D IFFERENCE ( TRACK , ROTA -
detector [12], combined with sparse pyramidal Lukas-Kanade TION , S HIFT )
43: calculate difference between trackk−N M ature+1 and
trackk according to the formula (13).
44: end procedure
C. Object detection
Unlike many state-of-the-art algorithms for object detection
with clutter, the proposed algorithm treats clutter within the
tracking framework absolutely the same way as the objects
itself. The objects are distinguished from the clutter only a
posteriori using object model.
To perform the detection of the object, we can rely on
the velocity criteria to distinguish the clutter and the object’s
measurements. The more distinguishable is the velocity of the
points cluster from the background the more likely it is to be
an object. However, other approach can also be considered,
like background model in case for the static camera.
The full list of the criteria is as follows:
• cluster velocity more than some dynamically adjusted
Fig. 2. Track building process threshold, the estimation is described further in this
section;

tracker [13]. This combination is reasonable because Harris • cluster stability more than some pre-defined threshold,
detector provides points according to Hessian matrix condi- i.e. how many frames the cluster changed less than
tioning number, which is inverted when using Lukas-Kanade some threshold (50%) of its points;
algorithm for feature point tracking [12], [13]. • cluster age above some pre-defined threshold, i.e. how
many frames the cluster has a support greater than
The quality assessment of the feature point tracking is
some pre-defined quantity of the points (typically, 0
carried out using forward-backward error concept [14].
or 1);
To make sure that we can discover new objects we need • the cluster size is not greater than some pre-defined
to prevent the feature points agglomeration in some particular threshold (i. e. not larger than w/2 × h/2, where w
area of the image. For this purpose, non-maximum suppression and h are the width and height of the video frames,
technique is used over the Harris corner point detector area respectively).
which allows to eliminate the local maxima close to the highest
one. All the criteria but the first, velocity, look straightforward.
Therefore, we concentrate on the description of the first
B. Track formation criteria. Consider two matched point sets Gk−1 and Gk having
nk elements, i. e. the sets with indexed elements where the
Using the information based on Forward-Backward Lukas- matching points from the different frames have the same
Kanade point tracking, we can build tracks representing the indices. Then we state the least squares problem
movement of the same points between frames. Pnk k k 2
i=0 |Gi − Ĝi |L2 → minU,T , (12)
Tracks are defined as the point sequences tki = k−1
Ĝki = Gi U + T, U U T = 1.
{fjssi , . . . , fjkk },
where si is a frame index where the i-th track
i
starts, jk is the index of the point in the frame feature points Here U is an orthogonal rotation matrix, T is a translation
list f k . At each of the stages the points matched by the Lucas- vector, Ĝki considered as the linear approximation of the
Kanade tracker are attached to the tracks. movement law from frame to frame.
In figure 2 all possible track development variants are This problem is widely known and is solved analytically
described. using Kabsch algorithm [16]. This algorithm is deterministic,
and to improve this solution, we repeat this procedure several
The track 1 contains the points matched on every stage.
times for the given percentage (e.g. 50%) of the best matched
The points of the track 2 were matched on the (k + 1)-th data.
stage, but there were no matches after this stage.
After this stage, we suppose that the rotation and shift
The tracks 3 and 4 have difference below some pre- model is obtained for the background (we do not consider
defined threshold after the stage (k + 2), therefore they were issues caused by the perspective or other non-linear trans-
merged. Tracks 5 and 6 have appeared from the newly formations here). Therefore, we need to distinguish between
detected points on the stage (k +2) and then were successfully subtle movement of the background clusters (i.e. clutter)
matched. from the significant movement of the object clusters. For this
purpose, the following heuristic was developed (figure 3):
After each frame the tracks are trimmed to the last
N M ature points, where N M ature is a positive parameter. • all matched points are sorted by their L2 error mag-
The tracks which have N M ature points are referenced as nitude
mature. k = |Gki − Ĝki |L2 ; (13)
The results of the experiment presented in table I are
compared with those from the article [17], which method is
based on the set of Kalman filters for each of multiple targets
accompanied with data association techniques. The detection
is provided by estimating the background movement model
based on optical flow.
Fig. 3. Threshold adjustment heuristic
One can see stable pattern localisation in the proposed
algorithm. While the rival algorithm gives only 60% on the
EgTest04 data set, given algorithm yields 90%. One of the
output samples with marked bounding boxes is shown in figure
5.

VI. C ONCLUSION
The method proposed in this article unites Bayesian fil-
tering approach to simultaneous object detection and tracking
with variational approximation. The result shown in the ex-
perimental section show the stability and robustness of the al-
gorithm outcome. The proposed Bayesian filter approximation
Fig. 4. VIVID data set has a good generalisation power. It can be used with different
feature spaces, and also can be accompanied with different
object detection algorithms.
• the variance S is calculated for the points;
The research leading to these results has received funding
• for j = 1 . . . nk at the first time when the difference from the EUs Seventh Framework Programme under grant
between the neighbouring points’ error scalars k and agreement N607400. The research has been carried out within
k+1 exceeds τ S, the error threshold T is initialised the TRAX project.
as
T = (k + k+1 )/2. (14) R EFERENCES
After this stage, the clusters are selected if the number of [1] M. Schikora, A. Gning, L. Mihaylova, D. Cremers, W. Koch, Box-
Particle Hypothesis Density Filter for Multi-Target Tracking, IEEE Trans-
the points within the cluster with the estimation error above actions on Aerospace and Electronic Systems, Vol. 50, No. 3, July, pp.
the threshold is sufficiently large (e.g. > 50%). 1660 - 1672, 2014.
[2] D. Salmond. Tracking and guidance with intermittent obscuration and
V. T HE ALGORITHM PERFORMANCE EVALUATION association uncertainty. FUSION 2013: 691-698
[3] Mahler R.; ”A theoretical foundation for the Stein-Winter Probability
To prove that the method gives good results comparing to Hypothesis Density (PHD) multi-target tracking approach,” Proc. MSS
the previous ones, the tests with VIVID PETS 2005 data set Nat’l Symp. on Sensor and Data Fusion, Vol. I (Unclassified), San Antonio
TX, June 2000.
were carried out [19]. The data set depicts multiple vehicles
which are being tracked and contains marked positions of one [4] Vo, B.-N.; Ma, W.-K., ”The Gaussian mixture Probability Hypothesis
Density filter,” IEEE Trans. Signal Processing, IEEE Trans. Signal Pro-
vehicle for each 10-th frame. The sample data frames from cessing, Vol. 54, No. 11, pp. 4091-4104, 2006.
VIVID data set are depicted in the figure 4. [5] Clark, D.; Ruiz, I.T.; Petillot, Y.; Bell, J.; Particle PHD filter multiple
target tracking in sonar images Aerospace and Electronic Systems, IEEE
Transactions on Volume 43, Issue 1, January 2007 Page(s):409 - 416
[6] Jaward, M., L. Mihaylova, N. Canagarajah, and D. Bull. ”Multiple object
tracking using particle filters.” In Aerospace Conference, 2006 IEEE, pp.
8-pp. IEEE, 2006.
[7] Reid D. An algorithm for tracking multiple targets, IEEE Trans. on
Automatic Control, vol. 24, issue 6, pp. 423-432, Dec.1979.
[8] T. E. Fortmann, Y. Bar-Shalom, M. Scheffe, Sonar Tracking of Multiple
Targets Using Joint Probabilistic Data Association, IEEE J. Oceanic Eng.,
OE-8 (3), 1983
[9] Niedfeldt, Peter C., and Randal W. Beard. ”Multiple target tracking using
recursive RANSAC.” In American Control Conference (ACC), 2014, pp.
Fig. 5. Output sample 3393-3398. IEEE, 2014.
[10] Martin A. Fischler and Robert C. Bolles. ”Random Sample Consensus:
TABLE I. R ESULTS OF THE EXPERIMENTS A Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography” (PDF). Comm. of the ACM 24 (6), June 1981:
EgTest01 EgTest02 EgTest03 EgTest04 EgTest05 pp. 381-395.
Match 0.9717 0.9225 0.8466 0.9005 0.8642
Size ratio 2.59 3.13 1.12 3.63 0.54 [11] Roy, Jean. ”Towards multiple hypothesis situation analysis.” In Infor-
Match (method 0.9500 0.9302 0.8588 0.6000 0.8889 mation Fusion, 2007 10th International Conference on, pp. 1-8. IEEE,
[17]) 2007.
Size ratio 1.00 1.23 0.78 1.19 0.88 [12] C. Harris and M. Stephens. ”A combined corner and edge detector”.
(method [17]) Proceedings of the 4th Alvey Vision Conference, 1988, pp. 147-151.
[13] B. D. Lucas and T. Kanade, An iterative image registration technique
with an application to stereo vision. Proceedings of Imaging Understand-
ing Workshop, 1981, pages 121-130.
[14] Kalal, Zdenek, Krystian Mikolajczyk, and Jiri Matas. ”Forward-
backward error: Automatic detection of tracking failures.” In Pattern
Recognition (ICPR), 2010 20th International Conference on, pp. 2756-
2759. IEEE, 2010.
[15] Bishop, C. M. Pattern Recognition and Machine Learning, pp. 474-486,
Springer, 2006
[16] Kabsch, W. ”A solution for the best rotation to relate
two sets of vectors”, 1976, Acta Crystallographica 32:922.
doi:10.1107/S0567739476001873 with a correction in Kabsch, W.,
”A discussion of the solution for the best rotation to relate two
sets of vectors”, ”Acta Crystallographica”, ”A34”, 1978, 827828
doi:10.1107/S0567739478001680.
[17] Mao, H., Yang, C., Abousleman, G. P., & Si, J. Automatic detection and
tracking of multiple interacting targets from a moving platform. Optical
Engineering, 2014, 53(1), 013102-013102.
[18] D.G. Kolev D. Kangin, G. Markarian. Data Fusion for Unsupervised
Video Object Detection, Tracking and Geo-Positioning In Fusion 2015
Conference, Washington D.C., 2015
[19] Collins, R.T., Zhou, X., and Seng, K. T. An Open Source Tracking
Testbed and Evaluation Web Site. IEEE International Workshop on Per-
formance Evaluation of Tracking and Surveillance (PETS 2005), January
2005. http://vision.cse.psu.edu/data/vividEval/main.html

You might also like