Heterogeneous Target Speech Separation

Tzinis, Efthymios; Wichern, Gordon; Subramanian, Aswin; Smaragdis, Paris; Roux, Jonathan Le

doi:10.21437/Interspeech.2022-46

Computer Science > Sound

arXiv:2204.03594 (cs)

[Submitted on 7 Apr 2022]

Title:Heterogeneous Target Speech Separation

Authors:Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux

View PDF

Abstract:We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.

Comments:	Submitted to Interspeech 2022
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.03594 [cs.SD]
	(or arXiv:2204.03594v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2204.03594
Journal reference:	Interspeech 2022
Related DOI:	https://doi.org/10.21437/Interspeech.2022-46

Submission history

From: Efthymios Tzinis [view email]
[v1] Thu, 7 Apr 2022 17:14:20 UTC (221 KB)

Computer Science > Sound

Title:Heterogeneous Target Speech Separation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Heterogeneous Target Speech Separation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators