Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Lakshminarayana, Kishor Kayyar; Dittmar, Christian; Pia, Nicola; Habets, Emanuël

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2306.10152 (eess)

[Submitted on 16 Jun 2023]

Title:Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Authors:Kishor Kayyar Lakshminarayana, Christian Dittmar, Nicola Pia, Emanuël Habets

View PDF

Abstract:Many neural text-to-speech architectures can synthesize nearly natural speech from text inputs. These architectures must be trained with tens of hours of annotated and high-quality speech data. Compiling such large databases for every new voice requires a lot of time and effort. In this paper, we describe a method to extend the popular Tacotron-2 architecture and its training with data augmentation to enable single-speaker synthesis using a limited amount of specific training data. In contrast to elaborate augmentation methods proposed in the literature, we use simple stationary noises for data augmentation. Our extension is easy to implement and adds almost no computational overhead during training and inference. Using only two hours of training data, our approach was rated by human listeners to be on par with the baseline Tacotron-2 trained with 23.5 hours of LJSpeech data. In addition, we tested our model with a semantically unpredictable sentences test, which showed that both models exhibit similar intelligibility levels.

Comments:	Accepted for publication at EUSIPCO-2023, Helsinki
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2306.10152 [eess.AS]
	(or arXiv:2306.10152v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2306.10152

Submission history

From: Kishor Kayyar Lakshminarayana [view email]
[v1] Fri, 16 Jun 2023 19:42:40 UTC (702 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators