Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Wang, Changhan; Pino, Juan; Gu, Jiatao

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2006.05474 (eess)

[Submitted on 9 Jun 2020 (v1), last revised 9 Oct 2020 (this version, v2)]

Title:Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Authors:Changhan Wang, Juan Pino, Jiatao Gu

View PDF

Abstract:Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low-resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6% word error rate (WER) reduction to the baseline (direct transfer from high-resource ASR). We show that training ST with human translations is not necessary. ST trained with machine translation (MT) pseudo-labels brings consistent gains. It can even outperform those using human labels when transferred to target ASR by leveraging only 500K MT examples. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.

Comments:	Accepted to INTERSPEECH 2020
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2006.05474 [eess.AS]
	(or arXiv:2006.05474v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2006.05474

Submission history

From: Changhan Wang [view email]
[v1] Tue, 9 Jun 2020 19:34:11 UTC (116 KB)
[v2] Fri, 9 Oct 2020 04:07:38 UTC (263 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators