0% found this document useful (0 votes)
2 views

18. Visual Object Tracking

The document discusses visual object tracking, focusing on the objective of locating objects over time in video sequences. It outlines the formal definition, approaches (probabilistic and discriminative tracking), and challenges such as appearance variations and temporal drift. Additionally, it highlights the integration of CNNs for improved feature extraction in tracking tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

18. Visual Object Tracking

The document discusses visual object tracking, focusing on the objective of locating objects over time in video sequences. It outlines the formal definition, approaches (probabilistic and discriminative tracking), and challenges such as appearance variations and temporal drift. Additionally, it highlights the integration of CNNs for improved feature extraction in tracking tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Visual Object Tracking

Instructor: Seunghoon Hong


Visual object tracking
Objective: locating the object(s) over time in a video

Initial frame

Target Tracking over


Visual Tracking
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.

In learning perspective:
● Classification problem with a single object class (= target vs distractors)
● Labeled data is given at only the initial frame
● Optionally requires online learning to adapt the variations in a video
● Online learning is driven by a self-supervision (training data = tracking results)
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.

Two sub-categories:
● Single target tracking
○ Tracking only one object in an video
○ Single-class classification (target vs. distractors)
● Multi target tracking
○ Tracking multiple objects in a video
○ Multi-class classification (target 1 vs. target 2 vs. target 3 vs. … vs. distractors)
Visual object tracking
Objective: locating the object(s) over time in a video
Formal definition: given an object state at the initial frame z0=(x0,y0,w0,h0),
identify z1:T={z1,z2,…,zT} over a video of length T.

Two sub-categories:
● Single target tracking
○ Tracking only one object in an video
○ Single-class classification (target vs. distractors)
● Multi target tracking
○ Tracking multiple objects in a video
○ Multi-class classification (target 1 vs. target 2 vs. target 3 vs. … vs. distractors)
Approaches in single object tracking
● Probabilistic tracking
○ Formulate the localization task as a sequential probabilistic inference problem
○ Given a probability of the initial target location, propagate it over the remaining frames
Approaches in single object tracking
● Probabilistic tracking
○ Formulate the localization task as a sequential probabilistic inference problem
○ Given a probability of the initial target location, propagate it over the remaining frames

● Discriminative tracking
○ Classify the object from the distractors at every frame
○ Can be considered as sequential binary object detection (class = target, background)
Probabilistic tracking
● Tracking as a Bayesian network

Bayes Rule
z: object location (state)
x: frame (observation)

Likelihood Prior
Posterior
the measurement of The belief of object state
the probability of
how likely the without observation
object state given
observation
an observation
coincide with the
given state
Probabilistic tracking
● Tracking as a Bayesian network

Bayes Rule
z: object location (state)
x: frame (observation)

Target template
Prior
1 The belief of object state
without observation

2 3 Where is the target


likely to exist?
Probabilistic tracking
● Tracking as a Bayesian network

Bayes Rule
z: object location (state)
x: frame (observation)

Target template
Likelihood
the measurement of
how likely the
observation
coincide with the
given state
Which region of
image look similar
to the target?
Probabilistic tracking
● Tracking as a Bayesian network

Bayes Rule
z: object location (state)
x: frame (observation)

Target template
Posterior
the probability of
object state given
an observation

Where is the object


in this frame?
Probabilistic tracking
● Tracking as a Bayesian network

Bayes Rule

Sequential Bayesian filtering

z1:T: object locations in frame 1 to T


x1:T: frames 1 to T
Probabilistic tracking
● Hidden Markov Model

● Markovian assumption
Probabilistic tracking
● Sequential Bayesian filtering

Integration over all object locations!


Likelihood Prior

Likelihood Transition Posterior upto


model the previous frame
Probabilistic tracking
● Approximation by Monte Carlo sampling

where
Probabilistic tracking
● Particle filtering (Sequential Markov-Chain Monte-Carlo)
○ Approximate the prior distribution using Markov-Chain Monte Carlo (MCMC) sampling
Probabilistic tracking pipeline
Frame t-1 Frame t

2. Move samples by
1. Extract samples transition model 3. Re-evaluate likelihood
proportional to using appearance model
previous posterior
Probabilistic tracking pipeline
Frame t

Tracking procedure (simplified):


1. Sample target states near the previous
target location
2. Evaluate the likelihood based on
appearance model

Example target
appearance model

3. Select the most probable sample as the


target at the current frame

4. Update the target appearance model


using the current tracking results
Attendance check
https://forms.gle/rGpXxLKZ4jbcArid8
Discriminative tracking pipeline
Quick overview: learning tracking-by-detection
● Objective: a ridge regression
Model parameters

Training Training data


labels
Quick overview: learning tracking-by-detection
● Objective: a ridge regression

How do we solve it?


Quick overview: learning tracking-by-detection
● Objective: a ridge regression

We should update this classifier for every frames


(i.e. every time we perform tracking and
get positive/negative samples)

Can we make it faster?


Correlation filtering
● We can make it extremely fast for certain positive/negative sets!
Negative samples
(translated samples)

+30 +15 -15 -30


Base sample
(tracking results)
Correlation filtering
● Representing positive/negative images using circulant matrices

Consider base sample x as n-dimensional array

Circulant matrix

Positive sample

Negative samples
Correlation filtering
● Any circulant matrices can be made diagonal by the Discrete Fourier Transform
(DFT)
DFT matrix
(constant,
independent to x)
DFT of base sample
Correlation filtering
● Putting all together

Circulant matrix

Matrix inner-product

Plug into ridge


regression
Kernelized Correlation filtering
● Easy to extend to kernelized version

ridge regression

ridge regression with


kernel

We can do fast
computation if kernel
matrix K is circulant matrix

Fortunately, it has been


shown that most useful
kernels are circulant[1]

[1] Henriques et al., High-Speed Tracking with Kernelized Correlation Filters, In TPAMI, 2015
Challenges
● Modeling severe appearance variations in a video

figure credit: Li et al., A survey of appearance models in visual object tracking


Modeling appearance for tracking
● Classic: hand-designed features
○ Color histogram
○ Intensity
○ Object Templates
○ Key-points (SIFT)
○ …
● Issue
○ All prone to overfitting
○ Cannot generalize to various appearances
Integrating CNN for appearance modeling
● Benefits
○ Features from a pre-trained CNN can be robust against various appearance changes
○ Especially useful in tracking since we have only one target ground-truth in the initial frame
CNN-based tracking
● CNNTrack: direct application of CNN feature for tracking
CNN-based tracking
● CNNTrack: direct application of CNN feature for tracking
CNN-based tracking
● CNNTrack: direct application of CNN feature for tracking
CNN-based tracking
● CNNTrack: direct application of CNN feature for tracking
CNN-based tracking
● CNNTrack: direct application of CNN feature for tracking
Discussions
● Limitations?
Better representation learning with videos
● MDNet: learn representation for tracking with a large amount of videos
Challenges in visual object tracking
● Temporal drift (i.e. error propagation through time)
○ Drift in posterior estimation: the error in posterior propagates through time
○ Drift in appearance model: if update the appearance model in temporal failure, the error will
propagate

● But why is it so prune to temporal drift?


Summary: Visual tracking
● Object localization in a video
● Probabilistic vs. discriminative tracking
● Modeling target appearance is important
○ Essential to evaluate the affinity of samples in both tracking frameworks
○ Should be able to handle a wide range of appearance variations
○ Should be able to generalize well from a single ground-truth at initial frame
● CNN for visual tracking
○ Applying a pre-trained CNN for feature extraction
○ Training CNN with many heterogeneous videos for tracking

You might also like