Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Vyas, Apoorv; Madikeri, Srikanth; Bourlard, Hervé

Computer Science > Sound

arXiv:2104.02558 (cs)

[Submitted on 6 Apr 2021]

Title:Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Authors:Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

View PDF

Abstract:In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on three different datasets including out-of-domain (Switchboard) and cross-lingual (Babel) scenarios. Our results show that for supervised adaptation of the wav2vec 2.0 model, both E2E-LFMMI and CTC achieve similar results; significantly outperforming the baselines trained only with supervised data. Fine-tuning the wav2vec 2.0 model with E2E-LFMMI and CTC we obtain the following relative WER improvements over the supervised baseline trained with E2E-LFMMI. We get relative improvements of 40% and 44% on the clean-set and 64% and 58% on the test set of Librispeech (100h) respectively. On Switchboard (300h) we obtain relative improvements of 33% and 35% respectively. Finally, for Babel languages, we obtain relative improvements of 26% and 23% on Swahili (38h) and 18% and 17% on Tagalog (84h) respectively.

Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2104.02558 [cs.SD]
	(or arXiv:2104.02558v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2104.02558

Submission history

From: Apoorv Vyas [view email]
[v1] Tue, 6 Apr 2021 14:56:04 UTC (150 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-04

Change to browse by:

cs
cs.LG
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Apoorv Vyas
Srikanth R. Madikeri
Hervé Bourlard

export BibTeX citation

Computer Science > Sound

Title:Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators