Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

Fernando, Tharindu; Sridharan, Sridha; McLaren, Mitchell; Priyasad, Darshana; Denman, Simon; Fookes, Clinton

doi:10.1109/TASLP.2020.2982297

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2004.01546 (eess)

[Submitted on 2 Apr 2020]

Title:Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

Authors:Tharindu Fernando, Sridha Sridharan, Mitchell McLaren, Darshana Priyasad, Simon Denman, Clinton Fookes

View PDF

Abstract:This paper presents a novel framework for Speech Activity Detection (SAD). Inspired by the recent success of multi-task learning approaches in the speech processing domain, we propose a novel joint learning framework for SAD. We utilise generative adversarial networks to automatically learn a loss function for joint prediction of the frame-wise speech/ non-speech classifications together with the next audio segment. In order to exploit the temporal relationships within the input signal, we propose a temporal discriminator which aims to ensure that the predicted signal is temporally consistent. We evaluate the proposed framework on multiple public benchmarks, including NIST OpenSAT' 17, AMI Meeting and HAVIC, where we demonstrate its capability to outperform state-of-the-art SAD approaches. Furthermore, our cross-database evaluations demonstrate the robustness of the proposed approach across different languages, accents, and acoustic environments.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:2004.01546 [eess.AS]
	(or arXiv:2004.01546v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2004.01546
Journal reference:	IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020
Related DOI:	https://doi.org/10.1109/TASLP.2020.2982297

Submission history

From: Tharindu Fernando [view email]
[v1] Thu, 2 Apr 2020 02:33:13 UTC (6,793 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators