A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Wu, Yi-Chiao; Tobing, Patrick Lumban; Yasuhara, Kazuki; Matsunaga, Noriyuki; Ohtani, Yamato; Toda, Tomoki

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2005.08659 (eess)

[Submitted on 18 May 2020 (v1), last revised 7 Aug 2020 (this version, v2)]

Title:A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Authors:Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda

View PDF

Abstract:Recently, the effectiveness of text-to-speech (TTS) systems combined with neural vocoders to generate high-fidelity speech has been shown. However, collecting the required training data and building these advanced systems from scratch are time and resource consuming. An economical approach is to develop a neural vocoder to enhance the speech generated by existing or low-cost TTS systems. Nonetheless, this approach usually suffers from two issues: 1) temporal mismatches between TTS and natural waveforms and 2) acoustic mismatches between training and testing data. To address these issues, we adopt a cyclic voice conversion (VC) model to generate temporally matched pseudo-VC data for training and acoustically matched enhanced data for testing the neural vocoders. Because of the generality, this framework can be applied to arbitrary TTS systems and neural vocoders. In this paper, we apply the proposed method with a state-of-the-art WaveNet vocoder for two different basic TTS systems, and both objective and subjective experimental results confirm the effectiveness of the proposed framework.

Comments:	5 pages, 8 figures, 1 table. Proc. Interspeech, 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2005.08659 [eess.AS]
	(or arXiv:2005.08659v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2005.08659

Submission history

From: Yi-Chiao Wu [view email]
[v1] Mon, 18 May 2020 12:48:40 UTC (772 KB)
[v2] Fri, 7 Aug 2020 02:58:25 UTC (393 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators