Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Ng, Si-Ioi; Lee, Tan

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2110.04511 (eess)

[Submitted on 9 Oct 2021]

Title:Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Authors:Si-Ioi Ng, Tan Lee

View PDF

Abstract:Psychoacoustic studies have shown that locally-time reversed (LTR) speech, i.e., signal samples time-reversed within a short segment, can be accurately recognised by human listeners. This study addresses the question of how well a state-of-the-art automatic speech recognition (ASR) system would perform on LTR speech. The underlying objective is to explore the feasibility of deploying LTR speech in the training of end-to-end (E2E) ASR models, as an attempt to data augmentation for improving the recognition performance. The investigation starts with experiments to understand the effect of LTR speech on general-purpose ASR. LTR speech with reversed segment duration of 5 ms - 50 ms is rendered and evaluated. For ASR training data augmentation with LTR speech, training sets are created by combining natural speech with different partitions of LTR speech. The efficacy of data augmentation is confirmed by ASR results on speech corpora in various languages and speaking styles. ASR on LTR speech with reversed segment duration of 15 ms - 30 ms is found to have lower error rate than with other segment duration. Data augmentation with these LTR speech achieves satisfactory and consistent improvement on ASR performance.

Comments:	Submitted to ICASSP 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.04511 [eess.AS]
	(or arXiv:2110.04511v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2110.04511

Submission history

From: Si-Ioi Ng [view email]
[v1] Sat, 9 Oct 2021 09:00:39 UTC (2,535 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators