Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

Park, Hyun Jin; Agarwal, Dhruuv; Chen, Neng; Sun, Rentao; Partridge, Kurt; Chen, Justin; Zhang, Harry; Zhu, Pai; Bartel, Jacob; Kastner, Kyle; Wang, Gary; Rosenberg, Andrew; Wang, Quan

Computer Science > Sound

arXiv:2408.10463 (cs)

[Submitted on 20 Aug 2024]

Title:Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

Authors:Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

View PDF HTML (experimental)

Abstract:The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded accuracy on real speech. To address this issue, we propose applying an adversarial training method to prevent the KWS model from learning TTS-specific features when trained on large amounts of TTS data. Experimental results demonstrate that KWS model accuracy on real speech data can be improved by up to 12% when adversarial loss is used in addition to the original KWS loss. Surprisingly, we also observed that the adversarial setup improves accuracy by up to 8%, even when trained solely on TTS and real negative speech data, without any real positive examples.

Comments:	to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2408.10463 [cs.SD]
	(or arXiv:2408.10463v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2408.10463

Submission history

From: Hyun-Jin Park [view email]
[v1] Tue, 20 Aug 2024 00:16:12 UTC (1,015 KB)

Computer Science > Sound

Title:Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators