SparseTT: Visual Tracking with Sparse Transformers

Fu, Zhihong; Fu, Zehua; Liu, Qingjie; Cai, Wenrui; Wang, Yunhong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2205.03776 (cs)

[Submitted on 8 May 2022]

Title:SparseTT: Visual Tracking with Sparse Transformers

Authors:Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, Yunhong Wang

View PDF

Abstract:Transformers have been successfully applied to the visual tracking task and significantly promote tracking performance. The self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers. However, self-attention lacks focusing on the most relevant information in the search regions, making it easy to be distracted by background. In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions, which enables a much accurate tracking. Furthermore, we introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes, which further improve the tracking performance. Extensive experiments show that, without bells and whistles, our method significantly outperforms the state-of-the-art approaches on LaSOT, GOT-10k, TrackingNet, and UAV123, while running at 40 FPS. Notably, the training time of our method is reduced by 75% compared to that of TransT. The source code and models are available at this https URL.

Comments:	Accepted by IJCAI2022 as a long oral presentation
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2205.03776 [cs.CV]
	(or arXiv:2205.03776v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2205.03776

Submission history

From: Zhihong Fu [view email]
[v1] Sun, 8 May 2022 04:00:28 UTC (1,635 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SparseTT: Visual Tracking with Sparse Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SparseTT: Visual Tracking with Sparse Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators