Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

Zhao, Guanlong; Wang, Quan; Lu, Han; Huang, Yiling; Moreno, Ignacio Lopez

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.06482 (eess)

[Submitted on 11 Nov 2022 (v1), last revised 4 Dec 2022 (this version, v2)]

Title:Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

Authors:Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

View PDF

Abstract:In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve the overall performance of the SCD model with the same number of parameters.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2211.06482 [eess.AS]
	(or arXiv:2211.06482v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.06482

Submission history

From: Quan Wang [view email]
[v1] Fri, 11 Nov 2022 21:09:58 UTC (453 KB)
[v2] Sun, 4 Dec 2022 01:02:34 UTC (563 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators