On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Chen, Zhehuai; Liu, Qi; Li, Hao; Yu, Kai

Computer Science > Computation and Language

arXiv:1803.01090 (cs)

[Submitted on 3 Mar 2018]

Title:On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Authors:Zhehuai Chen, Qi Liu, Hao Li, Kai Yu

View PDF

Abstract:End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works mostly focus on E2E training a single model which integrates acoustic and language model into a whole. Although E2E training benefits from sequence modeling and simplified decoding pipelines, large amount of transcribed acoustic data is usually required, and traditional acoustic and language modelling techniques cannot be utilized. In this paper, a novel modular training framework of E2E ASR is proposed to separately train neural acoustic and language models during training stage, while still performing end-to-end inference in decoding stage. Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are trained using acoustic data and text data respectively. A phone synchronous decoding (PSD) module is inserted between A2P and P2W to reduce sequence lengths without precision loss. Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling. Experiments on a 300- hour Switchboard task show significant improvement over the direct A2W model. The efficiency in both training and decoding also benefits from the proposed method.

Comments:	accepted by ICASSP2018
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T10
ACM classes:	I.2.7
Cite as:	arXiv:1803.01090 [cs.CL]
	(or arXiv:1803.01090v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1803.01090

Submission history

From: Zhehuai Chen [view email]
[v1] Sat, 3 Mar 2018 02:08:46 UTC (213 KB)

Computer Science > Computation and Language

Title:On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators