Skip to main content

Showing 1–4 of 4 results for author: Yiwere, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.06307  [pdf

    cs.HC cs.AI cs.SD eess.AS

    Synthetic Speaking Children -- Why We Need Them and How to Make Them

    Authors: Muhammad Ali Farooq, Dan Bigioi, Rishabh Jain, Wang Yao, Mariam Yiwere, Peter Corcoran

    Abstract: Contemporary Human Computer Interaction (HCI) research relies primarily on neural network models for machine vision and speech understanding of a system user. Such models require extensively annotated training datasets for optimal performance and when building interfaces for users from a vulnerable population such as young children, GDPR introduces significant complexities in data collection, mana… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Presented at SpeD 23

  2. arXiv:2307.13008  [pdf

    eess.AS cs.AI

    Adaptation of Whisper models to child speech recognition

    Authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Peter Corcoran, Horia Cucu

    Abstract: Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child spee… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted in Interspeech 2023

  3. arXiv:2204.05419  [pdf

    eess.AS cs.SD

    A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

    Authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu

    Abstract: Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challenging task. Current Automatic Speech Recognition (ASR) models require substantial amounts of annotated data for training, which is scarce. In this work, we explore using the ASR model, wav2vec2, with different pretraining and finetuning configurations for self-supervised learning (SSL) toward improv… ▽ More

    Submitted 11 February, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Preprint, Submitted to IEEE Access

  4. arXiv:2203.11562  [pdf

    cs.SD cs.CL eess.AS

    A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

    Authors: Rishabh Jain, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu

    Abstract: Speech synthesis has come a long way as current text-to-speech (TTS) models can now generate natural human-sounding speech. However, most of the TTS research focuses on using adult speech data and there has been very limited work done on child speech synthesis. This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datase… ▽ More

    Submitted 4 April, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Submitted to IEEE ACCESS