Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Mu, Zhaoxi; Yang, Xinyu; Zhu, Wenjing

Computer Science > Sound

arXiv:2303.03737 (cs)

[Submitted on 7 Mar 2023]

Title:Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Authors:Zhaoxi Mu, Xinyu Yang, Wenjing Zhu

View PDF

Abstract:Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network SE-Conformer that can model audio sequences in multiple dimensions and scales, and apply it to the dual-path speech separation framework. Furthermore, we propose Multi-Block Feature Aggregation to improve the separation effect by selectively utilizing information from the intermediate blocks of the separation network. Meanwhile, we propose a speaker similarity discriminative loss to optimize the speech separation model to address the problem of poor performance when speakers have similar voices. Experimental results on the benchmark datasets WSJ0-2mix and WHAM! show that ISCIT can achieve state-of-the-art results.

Comments:	Accepted by ICASSP 2023
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.03737 [cs.SD]
	(or arXiv:2303.03737v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2303.03737

Submission history

From: Zhaoxi Mu [view email]
[v1] Tue, 7 Mar 2023 08:53:20 UTC (242 KB)

Computer Science > Sound

Title:Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators