Speech Segmentation
Speech Segmentation
IT MSc Regular
Course: - IMS
NLP Article review
by:-
1
Title and authors
1. Automatic Speech Segmentation(April 2017). Alaa Ehab Sakran 1, Sherif Mahdy Abdou
23, Salah Eldeen Hamid 4, Mohsen Rashwan 25
2. Amharic Speech Search Using Text Word Query Based on Automatic Sentence-like
Segmentation.( 8November2022). Getnet Mezgebu Brhanemeskel 1 ,SolomonTeferraAbate
1 ,Tewodros Alemu Ayall 2,3,* and Abegaz Mohammed Seid 3,†
3. Automatic Speech Segmentation for Amharic Phonemes Using Hidden Markov Model
Toolkit (HTK)( Aug 2016). Eshete Derb Emiru [1], Walelign Tewabe Sewunetie [2]
4. Phoneme level automatic speech segmentation for Amharic language using HMM
approach.by Dr. Sebsbie Hailmariam.
1
Introduction:
For more than thirty years, researchers have been studying automated speech segmentation in
an effort to divide speech signals into smaller pieces for use in voice synthesis and recognition,
among other applications. I present a thorough analysis of four research papers that examine
various strategies and developments in automatic voice segmentation in this article.
The first paper focuses on phoneme-level automatic speech segmentation for the Amharic
language using a Hidden Markov Model (HMM) approach. The authors highlight the importance
of accurate segmentation in speech processing systems and discuss the utilization of wavelets,
fuzzy methods, artificial neural networks, and HMM for segmentation.
The second article present a method based on automatic sentence-like segmentation, enabling
users to search for specific words in Amharic speech using text-based queries. The authors
emphasize the significance of automatic segmentation in speech analysis and its applications in
speech recognition and speech synthesis systems.
The third paper provides a comprehensive review of automatic speech segmentation techniques.
They discuss the general characteristics of speech signals, including voiced and unvoiced speech,
and the importance of accurate segmentation for various speech analysis tasks. The authors explore
different segmentation units such as words, phonemes, and syllables, and discuss the challenges
associated with context dependency and acoustic variability.
The fourth paper focuses on the segmentation of speech signals using both blind and aided
segmentation algorithms. The authors discuss the differences between blind segmentation, which
relies solely on statistical signal analysis, and aided segmentation, which incorporates external
linguistic knowledge. They highlight the use of techniques such as Hidden Markov Models
(HMMs), Dynamic Time Warping (DTW), and Artificial Neural Networks (ANNs) in aided
segmentation algorithms.
In this review, I aim to provide insights into the advancements and challenges in automatic
speech segmentation techniques. By examining these four articles, I will gain a better
understanding of the various approaches, methodologies, and applications in this field.
2. Getnet Amharic Speech They used manual The findings of the study o a limited training
Mezgebu Search Using segmentation as a baseline for indicate that sentence-like dataset
Brhanemesk Text Word Query Word error rate (WER) of the automatic segmentation o lack of detailed
el. et al Based on automatic segmentation resulted in a WER closer information on the
( 2022) Automatic approach, Artificial Neural to the WER achieved on dataset and
Sentence-like Network manually segmented test validation process
Segmentation speech. They used two
speech bodies, broadcast
news domain and spiritual
domain,
3. Eshete Derb Automatic Unsupervised method for In a context-dependent o The article does not
Emiru [1], Speech automatic speech setting with two Gaussian explicitly discuss the
Walelign Segmentation for segmentation using the mixtures, the phoneme- limitations of the
Tewabe Amharic Hidden Markov Model based technique produced proposed method.
Sewunetie Phonemes Using (HMM) Toolkit (HTK). the best results in terms of o Does not address the
[2].(Aug Hidden Markov Techniques, such as context- the lowest percentage of performance of the
2016). Model Toolkit independent, context- time boundary deviations. method on different
(HTK) dependent with single For the purpose of several speakers or in noisy
Gaussian mixture, and speech research fields, the environments.
context-dependent with suggested approach o speech corpus was
multiple Gaussian mixtures. effectively divided recorded by a single
Amharic speech into female speaker
phonemes.
4. Dr. Sebsbie Phoneme level Hidden Markov Model The proposed method o The performance of
Hailmariam. automatic speech (HMM) approach for effectively segments the system in
segmentation for modeling the Amharic continuous speech into capturing variations
Amharic phonemes. phonemes in the Amharic in speech due to
language using Techniques used are context- language. different speakers,
HMM approach. independent, context- accents, and other
dependent with single factors not
Gaussian mixture, and recognized.
context-dependent with o Study focuses on the
multiple Gaussian mixtures. Amharic language
only.
Describing authors with their titles, used methods, their findings and also limitations of articles are
shown in table format bellow.
Compression
Compression of all articles by their strength, contributions for area and their evaluation metrics.
No. Strengths contributions Evaluation metrics
1. Mentions various approaches and the basics of speech segmentation, does not explicitly
methods used in speech segmentation discussing state-of-the-art mention the specific
solutions, exploring different evaluation metrics
segmentation units, examining
evaluation methods, and
highlighting the challenges and
trends in automatic speech
segmentation.
2. Focuses on the issue of speech search The proposed approach aims to Word Error Rate (WER)
using text word queries for the enable efficient and accurate as a measure of
Amharic language, which can have searching of Amharic speech by performance.
practical applications. automatically segmenting the
Introduces the concept of automatic speech into meaningful units and
sentence-like segmentation, which aligning them with text queries.
may enhance the accuracy of the
speech search system.
Includes multiple authors, indicating a
collaborative effort and potentially
diverse perspectives.
Future works
1. The authors mention the need to investigate and develop more advanced algorithms that can
handle the challenges posed by noisy and non-standard speech data.
They also highlight the importance of incorporating linguistic knowledge and context into
segmentation algorithms. Furthermore, the authors suggest exploring novel features and techniques
for improved segmentation accuracy and efficiency.
2. The authors propose a method that combines automatic speech recognition with automatic
sentence-like segmentation and provide experimental results to support their findings.
3. The study has potential limitations related to the size of the text corpus and the speaker
characteristics. Further research can address these limitations and explore the generalization and
robustness of the proposed method in diverse settings.
4. The article does not explicitly mention future work.
Based on this review I will try to do Automatic Speech Segmentation for wolaita language.