SpEx+: A Complete Time Domain Speaker Extraction Network

Ge, Meng; Xu, Chenglin; Wang, Longbiao; Chng, Eng Siong; Dang, Jianwu; Li, Haizhou

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2005.04686 (eess)

[Submitted on 10 May 2020 (v1), last revised 18 Aug 2020 (this version, v2)]

Title:SpEx+: A Complete Time Domain Speaker Extraction Network

Authors:Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

View PDF

Abstract:Speaker extraction aims to extract the target speech signal from a multi-talker environment given a target speaker's reference speech. We recently proposed a time-domain solution, SpEx, that avoids the phase estimation in frequency-domain approaches. Unfortunately, SpEx is not fully a time-domain solution since it performs time-domain speech encoding for speaker extraction, while taking frequency-domain speaker embedding as the reference. The size of the analysis window for time-domain and the size for frequency-domain input are also different. Such mismatch has an adverse effect on the system performance. To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+. Specifically, we tie the weights of two identical speech encoder networks, one for the encoder-extractor-decoder pipeline, another as part of the speaker encoder. Experiments show that the SpEx+ achieves 0.8dB and 2.1dB SDR improvement over the state-of-the-art SpEx baseline, under different and same gender conditions on WSJ0-2mix-extr database respectively.

Comments:	accepted in INTERSPEECH 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2005.04686 [eess.AS]
	(or arXiv:2005.04686v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2005.04686

Submission history

From: Chenglin Xu [view email]
[v1] Sun, 10 May 2020 15:00:07 UTC (196 KB)
[v2] Tue, 18 Aug 2020 03:03:01 UTC (204 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SpEx+: A Complete Time Domain Speaker Extraction Network

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SpEx+: A Complete Time Domain Speaker Extraction Network

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators