FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Tian, Zhengkun; Yi, Jiangyan; Bai, Ye; Tao, Jianhua; Zhang, Shuai; Wen, Zhengqi

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2104.02882 (eess)

[Submitted on 7 Apr 2021]

Title:FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Authors:Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

View PDF

Abstract:Transducer-based models, such as RNN-Transducer and transformer-transducer, have achieved great success in speech recognition. A typical transducer model decodes the output sequence conditioned on the current acoustic state and previously predicted tokens step by step. Statistically, The number of blank tokens in the prediction results accounts for nearly 90\% of all tokens. It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence. Therefore, we propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model. During the inference, the transducer model can predict the blank tokens in advance by a simple CTC project layer without many complicated forward calculations of the transducer decoder and then skip them, which will reduce the computation and improve the inference speed greatly. All experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the fast-skip regularization can indeed help the transducer model learn the blank position alignments. Besides, the inference with fast-skip can be speeded up nearly 4 times with only a little performance degradation.

Comments:	Submitted to INTERSPEECH2021
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2104.02882 [eess.AS]
	(or arXiv:2104.02882v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2104.02882

Submission history

From: Zhengkun Tian [view email]
[v1] Wed, 7 Apr 2021 03:15:10 UTC (648 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators