18. Visual Object Tracking
18. Visual Object Tracking
Initial frame
In learning perspective:
● Classification problem with a single object class (= target vs distractors)
● Labeled data is given at only the initial frame
● Optionally requires online learning to adapt the variations in a video
● Online learning is driven by a self-supervision (training data = tracking results)
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.
Two sub-categories:
● Single target tracking
○ Tracking only one object in an video
○ Single-class classification (target vs. distractors)
● Multi target tracking
○ Tracking multiple objects in a video
○ Multi-class classification (target 1 vs. target 2 vs. target 3 vs. … vs. distractors)
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.
Two sub-categories:
● Single target tracking
○ Tracking only one object in an video
○ Single-class classification (target vs. distractors)
● Multi target tracking
○ Tracking multiple objects in a video
○ Multi-class classification (target 1 vs. target 2 vs. target 3 vs. … vs. distractors)
Approaches in single object tracking
● Probabilistic tracking
○ Formulate the localization task as a sequential probabilistic inference problem
○ Given a probability of the initial target location, propagate it over the remaining frames
Approaches in single object tracking
● Probabilistic tracking
○ Formulate the localization task as a sequential probabilistic inference problem
○ Given a probability of the initial target location, propagate it over the remaining frames
● Discriminative tracking
○ Classify the object from the distractors at every frame
○ Can be considered as sequential binary object detection (class = target, background)
Probabilistic tracking
● Tracking as a Bayesian network
Bayes Rule
z: object location (state)
x: frame (observation)
Likelihood Prior
Posterior
the measurement of The belief of object state
the probability of
how likely the without observation
object state given
observation
an observation
coincide with the
given state
Probabilistic tracking
● Tracking as a Bayesian network
Bayes Rule
z: object location (state)
x: frame (observation)
Target template
Prior
1 The belief of object state
without observation
Bayes Rule
z: object location (state)
x: frame (observation)
Target template
Likelihood
the measurement of
how likely the
observation
coincide with the
given state
Which region of
image look similar
to the target?
Probabilistic tracking
● Tracking as a Bayesian network
Bayes Rule
z: object location (state)
x: frame (observation)
Target template
Posterior
the probability of
object state given
an observation
Bayes Rule
● Markovian assumption
Probabilistic tracking
● Sequential Bayesian filtering
where
Probabilistic tracking
● Particle filtering (Sequential Markov-Chain Monte-Carlo)
○ Approximate the prior distribution using Markov-Chain Monte Carlo (MCMC) sampling
Probabilistic tracking pipeline
Frame t-1 Frame t
2. Move samples by
1. Extract samples transition model 3. Re-evaluate likelihood
proportional to using appearance model
previous posterior
Probabilistic tracking pipeline
Frame t
Example target
appearance model
Circulant matrix
Positive sample
Negative samples
Correlation filtering
● Any circulant matrices can be made diagonal by the Discrete Fourier Transform
(DFT)
DFT matrix
(constant,
independent to x)
DFT of base sample
Correlation filtering
● Putting all together
Circulant matrix
Matrix inner-product
ridge regression
We can do fast
computation if kernel
matrix K is circulant matrix
[1] Henriques et al., High-Speed Tracking with Kernelized Correlation Filters, In TPAMI, 2015
Challenges
● Modeling severe appearance variations in a video