Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Wu, Chuhan; Wu, Fangzhao; Qi, Tao; Jiao, Binxing; Jiang, Daxin; Huang, Yongfeng; Xie, Xing

Computer Science > Computation and Language

arXiv:2108.09193 (cs)

[Submitted on 20 Aug 2021 (v1), last revised 2 Sep 2021 (this version, v3)]

Title:Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Authors:Chuhan Wu, Fangzhao Wu, Tao Qi, Binxing Jiao, Daxin Jiang, Yongfeng Huang, Xing Xie

View PDF

Abstract:Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by computing sparse self-attention instead of a dense one, which usually attends to tokens at certain positions or randomly selected tokens. However, manually selected or random tokens may be uninformative for context modeling. In this paper, we propose Smart Bird, which is an efficient and effective Transformer with learnable sparse attention. In Smart Bird, we first compute a sketched attention matrix with a single-head low-dimensional Transformer, which aims to find potential important interactions between tokens. We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads. Finally, we select token embeddings according to the index matrices to form the input of sparse attention networks. Extensive experiments on six benchmark datasets for different tasks validate the efficiency and effectiveness of Smart Bird in text modeling.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2108.09193 [cs.CL]
	(or arXiv:2108.09193v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2108.09193

Submission history

From: Chuhan Wu [view email]
[v1] Fri, 20 Aug 2021 14:22:00 UTC (617 KB)
[v2] Wed, 25 Aug 2021 07:49:16 UTC (617 KB)
[v3] Thu, 2 Sep 2021 06:44:38 UTC (617 KB)

Computer Science > Computation and Language

Title:Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators