KDEformer: Accelerating Transformers via Kernel Density Estimation

Zandieh, Amir; Han, Insu; Daliri, Majid; Karbasi, Amin

Computer Science > Machine Learning

arXiv:2302.02451 (cs)

[Submitted on 5 Feb 2023 (v1), last revised 29 Jun 2023 (this version, v2)]

Title:KDEformer: Accelerating Transformers via Kernel Density Estimation

Authors:Amir Zandieh, Insu Han, Majid Daliri, Amin Karbasi

View PDF

Abstract:Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, naïve exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former can be reduced to a variant of the kernel density estimation (KDE) problem, and an efficient KDE solver can be further utilized to accelerate the latter via subsampling-based fast matrix products. Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. Empirically, we verify that KDEformer outperforms other attention approximations in terms of accuracy, memory, and runtime on various pre-trained models. On BigGAN image generation, we achieve better generative scores than the exact computation with over $4\times$ speedup. For ImageNet classification with T2T-ViT, KDEformer shows over $18\times$ speedup while the accuracy drop is less than $0.5\%$.

Comments:	26 pages, 7 figures
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2302.02451 [cs.LG]
	(or arXiv:2302.02451v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.02451

Submission history

From: Insu Han [view email]
[v1] Sun, 5 Feb 2023 18:23:49 UTC (8,892 KB)
[v2] Thu, 29 Jun 2023 17:51:10 UTC (8,881 KB)

Computer Science > Machine Learning

Title:KDEformer: Accelerating Transformers via Kernel Density Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:KDEformer: Accelerating Transformers via Kernel Density Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators