A Case Study on Filtering for End-to-End Speech Translation

Alam, Md Mahfuz Ibn; Anastasopoulos, Antonios

Computer Science > Computation and Language

arXiv:2402.01945 (cs)

[Submitted on 2 Feb 2024]

Title:A Case Study on Filtering for End-to-End Speech Translation

Authors:Md Mahfuz Ibn Alam, Antonios Anastasopoulos

View PDF HTML (experimental)

Abstract:It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation. Although these mined corpora are large in volume, their quality is questionable. This work shows that the simplest filtering technique can trim down these big, noisy datasets to a more manageable, clean dataset. We also show that using this clean dataset can improve the model's performance, as in the case of the multilingual-to-English Speech Translation (ST) model, where, on average, we obtain a 4.65 BLEU score improvement.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.01945 [cs.CL]
	(or arXiv:2402.01945v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.01945

Submission history

From: Md Mahfuz Ibn Alam [view email]
[v1] Fri, 2 Feb 2024 22:42:33 UTC (7,720 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-02

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:A Case Study on Filtering for End-to-End Speech Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Case Study on Filtering for End-to-End Speech Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators