Linear-Complexity Self-Supervised Learning for Speech Processing

Zhang, Shucong; Parcollet, Titouan; van Dalen, Rogier; Bhattacharya, Sourav

Computer Science > Computation and Language

arXiv:2407.13377 (cs)

[Submitted on 18 Jul 2024]

Title:Linear-Complexity Self-Supervised Learning for Speech Processing

Authors:Shucong Zhang, Titouan Parcollet, Rogier van Dalen, Sourav Bhattacharya

View PDF HTML (experimental)

Abstract:Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code is available at this https URL.

Comments:	Interspeech 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2407.13377 [cs.CL]
	(or arXiv:2407.13377v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.13377

Submission history

From: Shucong Zhang [view email]
[v1] Thu, 18 Jul 2024 10:34:33 UTC (95 KB)

Computer Science > Computation and Language

Title:Linear-Complexity Self-Supervised Learning for Speech Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Linear-Complexity Self-Supervised Learning for Speech Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators