Deliberation Model Based Two-Pass End-to-End Speech Recognition

Hu, Ke; Sainath, Tara N.; Pang, Ruoming; Prabhavalkar, Rohit

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2003.07962 (eess)

[Submitted on 17 Mar 2020]

Title:Deliberation Model Based Two-Pass End-to-End Speech Recognition

Authors:Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar

View PDF

Abstract:End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR) and perform competitively relative to conventional models. To further improve the quality, a two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model while maintaining a reasonable latency. The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses. In this work, we propose to attend to both acoustics and first-pass hypotheses using a deliberation network. A bidirectional encoder is used to extract context information from first-pass hypotheses. The proposed deliberation model achieves 12% relative WER reduction compared to LAS rescoring in Google Voice Search (VS) tasks, and 23% reduction on a proper noun test set. Compared to a large conventional model, our best model performs 21% relatively better for VS. In terms of computational complexity, the deliberation decoder has a larger size than the LAS decoder, and hence requires more computations in second-pass decoding.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2003.07962 [eess.AS]
	(or arXiv:2003.07962v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2003.07962

Submission history

From: Ke Hu [view email]
[v1] Tue, 17 Mar 2020 22:01:12 UTC (337 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Deliberation Model Based Two-Pass End-to-End Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Deliberation Model Based Two-Pass End-to-End Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators