Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Parcollet, Titouan; Zhang, Ying; Morchid, Mohamed; Trabelsi, Chiheb; Linarès, Georges; De Mori, Renato; Bengio, Yoshua

Computer Science > Sound

arXiv:1806.07789 (cs)

[Submitted on 20 Jun 2018]

Title:Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Authors:Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio

View PDF

Abstract:Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

Comments:	Accepted at INTERSPEECH 2018
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as:	arXiv:1806.07789 [cs.SD]
	(or arXiv:1806.07789v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1806.07789

Submission history

From: Titouan Parcollet [view email]
[v1] Wed, 20 Jun 2018 15:16:43 UTC (269 KB)

Computer Science > Sound

Title:Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators