CNN For Object Tracking
CNN For Object Tracking
Seunghoon Hong
Course logistics (1)
● Assignment 2 is out
○ Deadline: Midnight June 7th
Course logistics (2)
● The instructions for paper presentation will be released by today
● Please read the instructions VERY CAREFULLY before you start
● Important deadlines (applied strictly; no late submission)
○ May 29: Paper bidding
○ June 7: Prepare presentation video and quiz
○ June 12: Watch presentations and solve quizzes
Recap: approaches in single object tracking
● Probabilistic tracking
○ Formulate the localization task as a sequential probabilistic inference problem
○ Given a probability of the initial target location, propagate it over the remaining frames
● Discriminative tracking
○ Classify the object from the distractors at every frame
○ Can be considered as sequential binary object detection (class = target, background)
Recap: Probabilistic tracking
● Sequential Bayesian filtering via Markov Chain Monte Carlo sampling
where
Recap: Discriminative tracking
Recap: Correlation filtering for discriminative tracking
● Solving a ridge regression via circulant matrices and discrete Fourier transform
If X is a circulant matrix,
model candidates
Revisiting challenges in object tracking
Hong et al., Online Tracking By Learning Discriminative Saliency Map With Convolutional Neural Network
Modeling a discriminator for target
● How can we learn weights for target classification?
Modeling a discriminator for target
● How can we learn weights for target classification?
● Online learning
○ Train a classifier on-the-fly using the ground-truth and tracking results (self-supervised)
Hong et al., Online Tracking By Learning Discriminative Saliency Map With Convolutional Neural Network
Modeling a discriminator for target
● How can we learn weights for target classification?
● Online learning
○ Train a classifier on-the-fly using the ground-truth and tracking results (self-supervised)
○ Problems
■ The model can easily overfit
■ Online update of the classifier is prune to drift (in case of temporal misclassification)
■ Using the pre-trained feature may not appropriate for tracking
(e.g. inaccurate localization due to translation-invariance, never trained for modeling
temporal variations, etc)
Training Testing
Pre-training the classifier for tracking?
● Actually, offline training and online deployment is a standard concept in CNN
○ E.g., image classification
x: candidates
Frame at #t
frame #(t+T)
Inference
● Use the initial frame to extract the target (z), and fix it for the rest frames
○ Online update of the target φ(z) is straightforward, but did not get the gain
● Handling scale variation
○ Construct image pyramid of x in multiple scales 1.025 * {−2,−1,0,1,2}
○ Search the best scale with the maximum score
Result
Result
● State-of-the-art performance despite the simplicity
● ● Super-fast!
State-of-the-art (real-time
performance despitespeed)
the simplicity
●
Summary: fully-convolutional Siamese network
● Discriminative tracking via exemplar classifier
○ Use the target at initial frame as a convolution filter = adaptable classifier
○ The entire model is pre-trained end-to-end and transferable across videos
○ Can be deployed to videos with arbitrary target in testing time
● Fully-convolutional network allows Siamese network
○ Both the target classifier and frame-level feature extractor share the same parameters
○ Produces a score map via filtering, which allows super-efficient examination of samples
● Fast, and reasonably accurate
○ Real-time performance (60~80 fps)
Later innovations in Siamese-FC
● Accurate localization through region-proposal network
● Efficient parameterization with deep network
● Mask prediction for further accurate localization
Efficient and accurate modeling of box configuration
● In Siamese-FC, only the scale variation is modeled via image pyramid
● If we want to model variations in more scales and aspect-ratio,
exhaustive search based on image pyramid is not efficient
Siamese network with region proposal
● Efficient search over scale+aspect ratio through region-proposal network
Li et al., High Performance Visual Tracking with Siamese Region Proposal Network
Siamese network with region proposal
● Efficient search over scale+aspect ratio through region-proposal network
k: # of proposals (anchors)
Siamese network with region proposal
● Efficient search over scale+aspect ratio through region-proposal network
The target (template) generates k number of filters
for different bounding boxes
Li et al., High Performance Visual Tracking with Siamese Region Proposal Network
Siamese network with region proposal
● Efficient search over scale+aspect ratio through region-proposal network
Classification produces binary
score of each proposals
Li et al., High Performance Visual Tracking with Siamese Region Proposal Network
Siamese network with region proposal
● Efficient search over scale+aspect ratio through region-proposal network
Regression branch generates
(dx,dy,dw,dh) for proposals
Li et al., High Performance Visual Tracking with Siamese Region Proposal Network
Result
Li et al., High Performance Visual Tracking with Siamese Region Proposal Network
Improving Siamese-RPN
● Efficient parameterization via depth-wise convolution
Li et al., SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Improving Siamese-RPN
● Efficient parameterization via depth-wise convolution
● Exploiting very deep network with skip connections
Li et al., SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Mask prediction for better localization Just additional mask branch on top
of Siamese-RPN!
Valmadre et al., End-to-end representation learning for Correlation Filter based tracking
Mask prediction for better localization Every pixels predict a binary mask
Valmadre et al., End-to-end representation learning for Correlation Filter based tracking
Result
● Accuracy in terms of bounding box
Valmadre et al., End-to-end representation learning for Correlation Filter based tracking
Result
Valmadre et al., End-to-end representation learning for Correlation Filter based tracking