Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

Nakamura, Kazuhiro; Takaki, Shinji; Hashimoto, Kei; Oura, Keiichiro; Nankaku, Yoshihiko; Tokuda, Keiichi

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1910.11690 (eess)

[Submitted on 24 Oct 2019 (v1), last revised 22 Apr 2020 (this version, v2)]

Title:Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

Authors:Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

View PDF

Abstract:The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs. An acoustic feature sequence is generated for each segment that consists of long-term frames, and a natural trajectory is obtained without the parameter generation algorithm. Furthermore, a computational complexity reduction technique, which drives the DNNs in different time units depending on type of musical score features, is proposed. Experimental results show that the proposed method can synthesize natural sounding singing voices much faster than the conventional method.

Comments:	Accepted to ICASSP 2020. Singing voice samples (Japanese, English, Chinese): this https URL. arXiv admin note: substantial text overlap with arXiv:1904.06868
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1910.11690 [eess.AS]
	(or arXiv:1910.11690v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1910.11690

Submission history

From: Kazuhiro Nakamura [view email]
[v1] Thu, 24 Oct 2019 04:25:47 UTC (211 KB)
[v2] Wed, 22 Apr 2020 00:30:04 UTC (211 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators