NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023

Ryo Fukuda; Yuta Nishikawa; Yasumasa Kano; Yuka Ko; Tomoya Yanagita; Kosuke Doi; Mana Makinae; Sakriani Sakti; Katsuhito Sudoh; Satoshi Nakamura

doi:10.18653/v1/2023.iwslt-1.31

NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023

Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Yuka Ko, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

Abstract

This paper describes NAIST’s submission to the IWSLT 2023 Simultaneous Speech Translation task: English-to-German, Japanese, Chinese speech-to-text translation and English-to-Japanese speech-to-speech translation. Our speech-to-text system uses an end-to-end multilingual speech translation model based on large-scale pre-trained speech and text models. We add Inter-connections into the model to incorporate the outputs from intermediate layers of the pre-trained speech model and augment prefix-to-prefix text data using Bilingual Prefix Alignment to enhance the simultaneity of the offline speech translation model. Our speech-to-speech system employs an incremental text-to-speech module that consists of a Japanese pronunciation estimation model, an acoustic model, and a neural vocoder.

Anthology ID:: 2023.iwslt-1.31
Volume:: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 330–340
Language:
URL:: https://aclanthology.org/2023.iwslt-1.31
DOI:: 10.18653/v1/2023.iwslt-1.31
Bibkey:
Cite (ACL):: Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Yuka Ko, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Sakriani Sakti, Katsuhito Sudoh, and Satoshi Nakamura. 2023. NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 330–340, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):: NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023 (Fukuda et al., IWSLT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.iwslt-1.31.pdf

PDF Cite Search