Literature Survey
Literature Survey
Abstraction
Research on low-resource text-to-speech (TTS) synthesis is ongoing,
particularly for languages with little training data available. The majority of
languages, particularly those that are indigenous or underrepresented, lack
the data necessary to create TTS models, in contrast to high-resource
languages like English. With the use of recent investigations, this literature
review seeks to examine state-of-the-art methods and difficulties in
low-resource TTS. Methods including knowledge distillation, transfer
learning, and dual transformation will be covered in the survey along with
neural architectures like Glow-TTS, FastSpeech, and end-to-end models.
We will also talk about evaluation metrics, multi-lingual generalization, and
vocoder choices. The best practices for creating scalable text-to-speech
systems in low-resource environments are highlighted in this survey, with
an emphasis on enhancing speech quality, training effectiveness, and
generalization to underrepresented languages.
1.Introduction
9.Reference
[2]Xu, J., Tan, X., Ren, Y., Qin, T., Li, J., Zhao, S., & Liu, T. Y. (2020).
"LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition".
26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD 2020).
[3]Ren, Y., Hu, C., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2022).
"FastSpeech 2: Fast and High-Quality End-to-End Text to Speech".
arXiv