0% found this document useful (0 votes)
40 views9 pages

MAST: A Memory-Augmented Self-Supervised Tracker: Zihang Lai, Erika Lu, Weidi Xie VGG, University of Oxford

MAST is a self-supervised dense tracking model that uses a memory module to learn from past frames without human annotations. It outperforms other self-supervised baselines on DAVIS-2017 and YouTube-VOS benchmarks, achieving 15% and 17% higher mean Jaccard and F-measure scores respectively. Qualitative results also show MAST produces more accurate predictions over time compared to other self-supervised methods.

Uploaded by

Jan Kristanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views9 pages

MAST: A Memory-Augmented Self-Supervised Tracker: Zihang Lai, Erika Lu, Weidi Xie VGG, University of Oxford

MAST is a self-supervised dense tracking model that uses a memory module to learn from past frames without human annotations. It outperforms other self-supervised baselines on DAVIS-2017 and YouTube-VOS benchmarks, achieving 15% and 17% higher mean Jaccard and F-measure scores respectively. Qualitative results also show MAST produces more accurate predictions over time compared to other self-supervised methods.

Uploaded by

Jan Kristanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MAST: A Memory-Augmented

Self-supervised Tracker
Zihang Lai, Erika Lu, Weidi Xie

VGG, University of Oxford

CVPR 2020
Objective
In this work, we propose a novel dense tracking model that is
able to learn without any human annotations
Key idea: Memory Module
Past frames (Memory) Present
Results
DAVIS-2017 (+15%)

Youtube-VOS (+17%)
Results from DAVIS-2017 Validation Set
Algorithms trained in a self-supervised fashion

70

60 65.5
(J & F Mean)

50
48.7 50.3
40

30 34
20
Vid. Color. [1] CycleTime [2] CorrFlow [3] MAST (Ours)

[1] Vondrick, Carl, et al. Tracking emerges by colorizing videos. In Proc. ECCV, 2018.

[2] Wang, Xiaolong, et al. Learning correspondence from the cycle-consistency of time. In Proc. CVPR, 2019.

[3] Lai, Zihang, et al. Self-supervised learning for video correspondence flow. In Proc. BMVC, 2019
Results from YouTube-VOS Validation Set
Algorithms trained in a self-supervised fashion

70

60 64.2
(J & F Mean)

50

40
46.6
38.9
30

20
Vid. Color. [1] CorrFlow [2] MAST (Ours)

[1] Vondrick, Carl, et al. Tracking emerges by colorizing videos. In Proc. ECCV, 2018.

[2] Lai, Zihang, et al. Self-supervised learning for video correspondence flow. In Proc. BMVC, 2019
Qualitative results * Black screen denotes no prediction.

Video Colorization (Vondrick et al.) CorrFlow (Lai et al.)

CycleTime (Wang et al.) SMAT (Ours)


Qualitative results

Video Colorization (Vondrick et al.) CorrFlow (Lai et al.)

CycleTime (Wang et al.) SMAT (Ours)

You might also like