Fusion of Self-supervised Learned Models for MOS Prediction

Yang, Zhengdong; Zhou, Wangjin; Chu, Chenhui; Li, Sheng; Dabre, Raj; Rubino, Raphael; Zhao, Yi

Computer Science > Sound

arXiv:2204.04855 (cs)

[Submitted on 11 Apr 2022]

Title:Fusion of Self-supervised Learned Models for MOS Prediction

Authors:Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

View PDF

Abstract:We participated in the mean opinion score (MOS) prediction challenge, 2022. This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD). To improve the accuracy of the predicted scores, we have explored several model fusion-related strategies and proposed a fused framework in which seven pretrained self-supervised learned (SSL) models have been engaged. These pretrained SSL models are derived from three ASR frameworks, including Wav2Vec, Hubert, and WavLM. For the OOD track, we followed the 7 SSL models selected on the main track and adopted a semi-supervised learning method to exploit the unlabeled data. According to the official analysis results, our system has achieved 1st rank in 6 out of 16 metrics and is one of the top 3 systems for 13 out of 16 metrics. Specifically, we have achieved the highest LCC, SRCC, and KTAU scores at the system level on main track, as well as the best performance on the LCC, SRCC, and KTAU evaluation metrics at the utterance level on OOD track. Compared with the basic SSL models, the prediction accuracy of the fused system has been largely improved, especially on OOD sub-track.

Comments:	MOS 2022 shared task system description paper
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.04855 [cs.SD]
	(or arXiv:2204.04855v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2204.04855

Submission history

From: Prasanna Raj Noel Dabre [view email]
[v1] Mon, 11 Apr 2022 03:50:52 UTC (1,267 KB)

Computer Science > Sound

Title:Fusion of Self-supervised Learned Models for MOS Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Fusion of Self-supervised Learned Models for MOS Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators