Unit Selection Based Text-to-Speech Synthesizer For Tigrinya Language

This document describes the development of the first unit selection based text-to-speech system for the Tigrinya language. It constructed a unit database and implemented natural language processing modules. The system uses a 4 hour, 38 minute speech corpus labelled at the phoneme level. Letter to sound rules were implemented and an automatic clustering technique was used to group units by phonetic and prosodic context. The system was evaluated using mean opinion scores and was found to correctly recognize 97.1% of sentences on average. The naturalness of the synthesized speech demonstrates the effectiveness of the unit selection approach for Tigrinya.

Uploaded by

Mebiratu Beyene

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

201 views9 pages

Unit Selection Based Text-to-Speech Synthesizer For Tigrinya Language

Uploaded by

Mebiratu Beyene

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

Agazi Kiflu Tibebe Beshah

[email protected] School of Information Science, Addis Ababa
University, Ethiopia
[email protected]

Abstract
This paper brings together the development of the first unit selection based Text-to-Speech (TTS) system for
Tigrinya using the Festival framework and practical applications of it. Construction of a unit database and
implementation of the natural language processing modules are described and a Unit selection-based approach
generates speech by selecting proper units from a speech corpus and connecting them together. In this approach,
a set of features are defined to describe the speech units in the corpus and the expected units in the synthesized
utterance. In this paper the major tasks have been performed, via development of concatenative Unit selection
voice using phone as basic unit. We have used a speech corpus having a size of 4 hour, 38 minutes and 29
seconds, labelled at phoneme level.
We describe the implementation and evaluation of a G2P conversion model for a Tigrinya TTS system.
Letter to sound conversion for Tigrinya usually has simple one to one mapping between orthography and
phonemic transcription for most Tigrinya letters and an automatic clustering technique to cluster units based on
their phonetic and prosodic context. Having constructed the phonetic, prosodic, and acoustic features extraction
inventory for each phone to synthesize the input text the Festival speech synthesis was then adopted in order for
the synthesizer to use cluster unit selection algorithm. In order to minimize acoustically defined target and join
costs, a selection is made from cluster at the time of units synthesis.
The test results indicate that almost all of the words and sentences are recognizable. The system is evaluated
using MOS, one of the most popular testing techniques in speech synthesis. The system is tested for naturalness
and intelligibility of speech. On average, 97.1% of the sentences are correctly recognized by the listeners. The
naturalness of the synthesized speech demonstrates the appropriateness of the proposed approach.
Keywords: Text-to-Speech Synthesis; Concatenative Speech Synthesis; Unit selection based speech synthesis;
Syllable based concatenation; Consonants and Vowels

and intelligibility [2]. Naturalness describes how

1. Introduction
closely the output sounds like human speech, while
Text-to-Speech (TTS) synthesis can convert intelligibility is the ease with which the output is
arbitrary input text to intelligible and natural understood. In this research, we used the
sounding speech so as to transmit information from a concatenative text-to-speech system and the issues
machine to a person [1]. It is a process through which relevant to the development of a Tigrinya speech
input text is analyzed, processed, and understood and synthesizer using different choice of units: a word,
then the text is rendered as digital audio and then phrase, clause, sentence or phonemes as a database.
spoken [2]. The basic types of synthesis system are Since there is great advancement regarding TTS in
Formant, Concatenated, and Articulatory [3]. other languages globally, attempt will be made to
The process of TTS conversion allows the design and implement a method for TTS system in
transformation of a string of phonetic and prosodic one of the local language of Ethiopia.
symbols into a synthetic speech signal. The quality of This paper is organized as follows: Section 2
the result produced by a TTS synthesizer is a focuses on the nature of the Tigrinya script. Section 3
function of the quality of the string, as well as of the explains literature review, while Section 4 outlines
quality of the generation process. The most important the methodology of the proposed solution with the
qualities of a speech synthesis system are naturalness
14 Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

voice building process. Section 5 shows the results of pharyngeal consonants which were apparently part of
perceptual testing and finally conclusion and the ancient Ge'ez language and which, along with
recommendation are given in Sections 6 and 7. [x'/]a velar or uvular ejective fricative, make it
easy to distinguish spoken Tigrinya from related
2. The Tigrinya Language
languages such as Amharic [9]. These are exception
The script of Tigrinya is phonetic in nature. It characteristics from Amharic beside their accent,
uses different choice of units: a word, phrase, has 39 manner and place of articulation
consonants and 7 vowels [5, 9]. The orthographic
representation of the language is organized into
orders. Each of the 39 consonants has seven orders
(derivatives). Six of them are CV combinations while
the 7th is the consonant itself. The way Tigrinya
orthographic characters are written is very similar to
the way they are spoken. It means Tigrinya is a
phonetic language. The mapping of the written form Figure 1: Tigrinya Syllabic Structure
and the spoken form is one to one except the A syllable in Tigrinya is made up of only /cv/ and
epenthetic vowel. Characters representing the same /cvc/ (a consonant + a vowel) or (a consonant + a
consonant followed by different vowels are similar in vowel + a consonant). The vowel is a syllabic
shape. For example, here are the characters nucleus, while the first and the last consonants of the
representing: /he/, /hu/, /hi/, /ha/, /hie/, /h/ and /ho/: syllable are an onset and a coda respectively [9]. The
. Tigrinya native speaker, for example, can divide the
The total number of orthographic Symbols of the words (sabara) and (biili) into three and two syllables
language exceed 273. Like other languages, Tigrinya respectively. For example, /qaatala/ (Figure 2) is one
also has its own typical phonological and word which has three syllables in it. Some syllables
morphological features that characterize it. Among of Tigrinya have a nucleus or peak and an onset,
these, we found gemination of consonants and the while other syllables have a coda in addition to an
use of the automatic epenthetic vowel to be very onset and a peak. Observe the following:
critical for naturalness in Tigrinya speech synthesis.
Tigrinya language has special property in its spoken
form (CV or CVC sequence of the acoustic form of Word
the orthographic representation).
2.1 Phonology of Tigrinya Word RH RH RH
Phonology is the study of the distribution and
patterning of speech sounds in a language and of the
tacit rules governing pronunciation [4]. In O N C O N C O N C
phonology, phoneme is the fundamental unit that
describes how speech conveys linguistic meaning.
The phoneme represents a class of sounds that
q a t a l a
convey the same meaning. The meaning of a word is Figure 2
dependent on the phoneme that it contains [4]. Moreover, each onset and coda position is
Tigrinya has a fairly typical set of phonemes for occupied by a consonant, where as a nucleus position
an Ethiopian Semitic language. That is a set of is occupied by a vowel.
ejective consonants and the usual seven-vowel 2.2 Gemination
system. Unlike many of the modern Ethiopian
Semitic languages, Tigrinya has preserved the two Longer duration of identical segments, adjacent
consonants or vowels that are the same can form
HiLCoE Journal of Computer Science and Technology, Vol. 1, No. 1 15

germination. In Tigrinya sequence of vowels is not Additionally, it is not known to have been tested or
permissible. Whenever sequences of vowels occur, proven in any related manner.
either one of the vowels must be deleted or Alam et al. in [8] proposed a TTS system that
epenthetic segments are inserted between the vowels. creates the voice data for festival, and additionally
However, we do find geminated Tigrinya segments, extends the use of festival to its embedded scheme
with the exception of laryngeals and pharyngeals that scripting interface to incorporate Bangla language
may be geminated in only very limited environments support. The researchers TTS implementation used
as indicated in [9]. two different kinds of concatenative methods
Consonant germination may bring meaning supported in Festival: unit selection and multi-syn
differences in words. If we compare /zawara/ he got unit selection [8].
roaming and /zawwara/ he drove, /halifu/ he The researchers on their future work indicated
passed and /hallifu/ he excelled. There is a that a number of future plans need to be made to
difference of meaning in each pair. In each pair, we develop the complete TTS system for Bangla
observe a geminated or ungeminated medial language including the following: document analysis,
consonant that brings a meaning difference in each of text analysis, phonetic analysis, developing large
them. number pronunciation lexicon, automatic lexicon
2.3 Insertions entries instead of adding manually, find out LTS or
Grapheme-to-Phoneme (G2P) rule so that it can
Insertion is one way for arriving at a well formed handle unknown words), prosody analysis, and
or acceptable assignment of syllable structure. The waveform synthesis by diphone technique. In
syllable structure of Tigrinya is either /cv/ or /cvc/. conclusion, the researchers observed that unit
Insertion, unlike deletion, is the appearance of new selection and multisyn unit selection has a drawback
elements in a formerly unoccupied position. The because of the requirement of large set of speech
epenthetic (inserted) segments may appear word corpus.
initially, word medially or word finally. There are
Eker in [6] has found a research which exploits
several Tigrinya epenthetic segments, vowels and
the Turkish language structure and tried to
consonants, in different positions. Observe the
implement the system that takes a text as its input. It
following:
assumes that the text consists of words and it
asraha he made others to work processes word by word [6]. When a word is
Awassaxom he made them to add obtained from the text, it is passed to a unit that can
The morpheme /a/ is added to the root process word as text and produces the corresponding
consonants/srh/ speech. This part separates the word into diphones;
zii + asriih-a - zasriiha using diphone database, it gets a speech file
zii + awassaxa - zawassaxa corresponding to diphone and its pitch value.
Finally it concatenates the previously recorded
3. Literature Review speech segments using PSOLA algorithm and
In [7] Indian Natural Language Processing Lab manages to produce sound. As a future work, the
Centre for Development of Advanced Computing researcher recommended that the first thing should
(CDAC) uses multiform speech unit to develop the be done is to complete the diphone database and
speech synthesis. It primarily uses syllable and apply more experiments on words. The produced
phonemes. The speech corpus contains most frequent output is acceptable for small sentences, but it
words and initials. They have been segmented and requires much time for long sentences. Therefore, in
labelled into different speech units as required for order to have a real-time reading system, the system
development of a Hindi speech synthesis system. should be faster. In conclusion, this means that the
This research doesnt indicate as to whether any kind method used in this paper is an applicable one which
of implementation has been made or not. with some effort on completing and preparing a
16 Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

better diphone database will result in a system that popular method of performing speech synthesis
will produce more understandable output for all recently and is found to differ from older types of
Turkish words. synthesis by generally sounding more natural and
We have observed that few research attempts spontaneous than formant synthesis or diphone based
were made on local languages. One of the few concatenative synthesis. Unit selection synthesis is
attempts made was by Sebsibe H/Mariam et al. in proven to score higher than other methods in listener
[13]. Their focus was issues that need to be ratings of quality but it involves a tedious recording
considered in developing a concatenative speech many hours of speech by a single speaker.
synthesizer. They have tried to describe the issues to In this research we tried to explore the nature of
be considered in developing a concatenative speech Tigrinya script representation of the phone set, rules
synthesizer for Amharic language. The complexity of of letter to sound, Tigrinya syllable structure and
the syllable structure of the language, the phonetic syllabification rules that would show the voice
nature of the language, and the result of the building process. To do these researches we used the
perceptual test of the synthesizer has been discussed. following.
The researchers tried to explore the nature of Transliteration scheme based on orthographic
Amharic script representation of the phone set, ordering of the script and acoustic similarity of
Amharic syllable structure and syllabification rules, the letters were defined using ASCII
and showed the voice building process. Having noted characters.
that the quality of speech synthesiser for Amharic In Festvox, the phone set of the language is
was not high, they recommended on the need in the described with the corresponding features like
future to work on improvement of the quality voicing, tongue position, tongue height, place
desired. They suggested that this can be done by: of articulation, and manner.
1) Proper selection of unit. Since the language is Experiments on Phonology of Tigrinya word,
phonetic, syllable as a basic unit may outperform phone set and we try to cover all phonemes
the phone as a basic unit. defined a transliteration scheme using ASCII
2) Optimal selection of corpus, which proportionally characters.
covers all basic units and variations, will give
5. Design and integration of Tigrinya unit
better quality.
selection into festival frame work
Based on the reviewed made so far and
knowledge of the researchers, none of the works so The speech inventory is divided into clusters,
far have tried to design grapheme to phoneme where each cluster holds units of the same phone
converter or letter to sound speech synthesizer for class based on their phonetic and prosodic context.
Tigrinya. None of them show prototype for natural An outline of the steps to build a unit selection
sound for Tigrinya, which synthesize by accepting synthesizer are given below. A more detailed
normalized Tigrinya texts and generate prosodic description of same is available in [10, 13].
features (i.e., intonation, stress) using syllable based Design speech and text corpus
approach. The main focus in this work is to find the Creating LTS rules and phone set
proper quality speech corpus, which matches the
Building utterance structures
quality of synthetic speech from synthesizers
Generating speech unit clusters
including linguistic tasks and develop naturally
Building the unit synthesizer
sounding text to speech for Tigrinya language.
What have been done in each step to build unit
4. Methodology selection voice for Tigrinya is explained below.
After an extensive literature review regarding Tigrinya proverbs, articles, newspapers, magazine
concatenative speech synthesis method, unit selection and bible sentences are collected from different
concatenative synthesis is found to be the most sources and are primary data. We selected a native
HiLCoE Journal of Computer Science and Technology, Vol. 1, No. 1 17

speaker of the language and tried to record in quite An outline of the steps to build a unit selection
environment by a male speaker using PRRAT. We synthesizer are given below. A more detailed
used wave surfer for manual labelling of the recorded description of same is available in [1, 2, 6, 10].
voices. In this paper, we built a corpus of around
13171 words. The script of this speech corpus is
selected from a large text corpus (around 84000
characters). The corpus is designed to cover the
frequently used syllable and context as much as
possible.
The input to the TTS system is the transliteration
of a text in Tigrinya. The pronunciation generation
module generates the sequence of basic units using a
lexicon of units and letter-to-sound rules. The lexicon
is a list of all speech units - monosyllables, bi-
syllables and tri-syllables, present in the waveform
repository. The letter-to-sound rules are framed in
such a way that each word is split into its largest
constituent syllable units As the pronunciation of
most of the words in Tigrinya can be predicted from
their orthography, these rules suffice to generate
Figure 3: The system architecture of the Tigrinya
correct pronunciations. The unit selection algorithm speech synthesizer using cluster unit selection
generates a target specification for the speech units
5.2 Description of the Implementation Design
that have been identified and picks the best sequence
of speech units that minimize both the target cost and After we collected speech and text corpus, the
the join cost. The waveforms of these speech units next step was to check recorded utterances against
are then concatenated to produce synthetic speech. the transcription text in order to design the prompt
list in Festival format and correct the label manually.
5.1 System Architecture
Appropriate modifications have been made to get
As the system architecture shown in Figure 3, the them ready to be used in voice building process. By
synthesizer has text analysis and speech synthesis doing so the speech and text corpora has been built.
parts. The text analysis part uses grapheme to
5.2.1 Labelling the Utterance
phoneme converter to match the word to its
pronunciation whereas the synthesis part selects the The process that generates the labelled utterance
best sequence of units for target specification is labelling. Labelling is the process of giving a label
produced at the end of text analysis, and finally for each speech signal in the utterance. Unit selection
generates the speech from of the speech parameters. synthesizers are highly sensitive to the accuracy of
Defining the phone-set of the language labelling. Bad labels will adversely affect the quality
Tokenization and text normalization of synthesis in a number of ways [2, 13].
Incorporation of letter-to-sound rules The phone label itself can be incorrect, potentially
causing the wrong word to be said, or said with an
Incorporation of syllabification rules
undesired accent. However, it is time taking and
Assignment of stress patterns to the syllables
laborious, as part of our efforts to improve speech
in the word
synthesis, we have labelled the speech database
Assignment of duration to phones thoroughly using a tool called Wave Surfer.
Generation of f0 contour
Once a speech repository is in place, the
repository is integrated with the Festival framework.
18 Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

5.2.2 Creating letter-to-sound rules and phone- provides the label files for each sentence in the
set prompt list.

A comprehensive set of letter-to-sound rules was 5.2.4 Building utterance structures for the
created to syllabify the input text into the syllable- database
like units. These rules are framed in such a way that
The utterance structure holds all the relevant
each word is split into its largest constituent syllable
phonetic and prosodic information related to a speech
unit. The phone set, which is a list of basic sound
unit within this data structure. The phonetic
units for Tigrinya that the synthesizer supports, was
information in an utterance structure describes the
created by enumerating all the speech units identified
position of the speech unit in the word it appears and
in the syllabification process.
the information of units adjacent to it. Prosodic
5.2.3 Incorporation of Tigrinya Phone set and information holds information about the duration and
Grapheme to Phoneme Converter pitch of the unit. Festival provides relevant scripts for
building utterance structures for each speech unit.
The phone-set definition is the first text analysis
module in which every phoneme of the alphabet is 5.2.5 Generating speech unit clusters
classified according to phone features like consonant The process includes building coefficients for
voicing and vowel height. The second text analysis acoustic distances (MFCC, F0 and energy
module is the lexicon module. coefficients), creating distance tables for each class
The Tigrinya phone set is incorporated in Festival of units based on acoustic distances and generation of
corresponding to their characterizing features. Each features for building CART trees.
phone has eight features that describe how the vocal
organs behave when the sound is uttered. These
5.2.6 Building the unit synthesizer
features are vowel/consonant identification, Using the letter-to-sound rules, phone set and
consonant voicing, place of articulation, consonant clusters of each speech unit built in the previous
type, vowel length, vowel height, vowel front ness, steps, Festival generates the necessary files that need
and lip rounding. to be used along with the core Festival speech
The grapheme to phoneme converter is used to synthesizer to build a unit selection synthesizer for
convert an orthographic text into its corresponding Tigrinya using appropriate scripts.
phonetic representation. After incorporation of The rules of the language in relation to epithetic
Tigrinya phone set and the Tigrinya grapheme to vowel insertion, gemination and syllabification effect
phoneme converter into Festival, it provides the label on speech synthesis to determine the pronunciation
files for each sentence in the prompt list. We have of given Tigrinya words based on its spelling, in the
made manual label correction using the label process of grapheme to phoneme converter.
automatically generated along with the Epenthetic vowel insertion and germination rule
corresponding wave file. for Tigrinya are adopted from [11] and modified
The grapheme to phoneme converter is used to with:
convert an orthographic text into its corresponding 1. Accept input words and scan from left to right.
phonetic representation. We implemented the 2. If consonant cluster occur at word initially
grapheme to phoneme conversion architecture by position, insert epenthetic vowel between
making modification of syllabification algorithm them.
proposed in [11]. A C# based syllabification program
Exception: If the first phoneme is consonant
is implemented which is graphical based system and
and the next consonant is glide/w/
modified into C++ command line based G2P system
pharyngeal/h/ plain/x/ (rule #1).
is done as per the requirement of Festival tools. After
3. If three consonants are appeared in sequence
incorporation of Tigrinya phone set and the Tigrinya
word medially or word finally, position insert
grapheme to phoneme converter into Festival, it
HiLCoE Journal of Computer Science and Technology, Vol. 1, No. 1 19

epenthetic vowel before the third consonant extent of naturalness and intelligibility of synthetic
(rule #2). speech generated by the speech synthesizer.
Exception: if the middle consonant sonority is 6.1 Data Preparation and Prototype Testing
greater than the rest insert epenthetic vowel
after next the first consonant. We conducted perceptua1 tests on 6 people who
are native speakers of Tigrinya: 2 females and 4
4. If a cluster of consonant contains the
males. All subjects are between 30 to 60 years old.
germination and singleton in sequence, insert
Each subject listens to all of the 6 sentences with
epenthetic vowel after the geminated
various lengths selected from the data set used in the
consonant (rule #3).
voice construction and gives his/her ranking value
5. If a cluster of consonant contains the singleton
for the naturalness and intelligibility of the speech.
and geminate in sequence insert epithetic
They evaluated based on the quality of the speech
vowel after the singleton consonants (rule #4).
output by giving a measure of quality.
6. If a cluster of consonant contains two different
Based on the result found, we can conclude that
germinations in sequence, insert epenthetic
proper selection of units done by the TTS has great
vowel between the two geminate consonants
role for perceived naturalness and intelligibility of
(rule #5).
synthetic speech sounds.
7. If the sonority of the final consonant is greater
The results show that regarding the question as to
than that of the proceeding consonants, the
whether the voice is good to listen to or not, 38.8%
epenthetic vowel is inserted between the final
considered the voice is very good, 58.3% of them
consonant clusters (rule #6).
thought that the voice was good and 2.7 %
8. If a consonant cluster occurs at word final considered the voice unnatural. From the result it is
position, insert epenthetic vowel /i/. clear that more than 97.1% of the listeners found it to
9. Repeat steps 2 to 7 until the entire phoneme be ok and none of the listeners found it to be
are parsed in the phonemes list. excellent, fair or very poor. In general, the output
6. Perceptual Evaluation and Experimental was acceptable by most of the listeners. When
Results compared to a previous work done on unit selection
in [8] for the Bangla language in Festival framework
Perceptual evaluation is essential to determine the at sentence level, the average score is 90.1%, thus
quality of synthesized speech [13, 14]. The when compared to this thesis it has improved by
perceptual evaluation in this paper investigates the 7.1% [8].
naturalness and intelligibility of Tigrinya TTS. In this
research work mean opinion score (MOS) is used to 6.2 Summary
test the output of the synthesized speech. MOS is an Even if the experiment is conducted on a small
evaluation technique where evaluators indicate their scale, the results obtained are promising. From this
assessments on a scale ranging from bad (1) to result, it appears to indicate that with an ever
excellent (5). Then the average score of the opinion increasing size of speech database, the unit
given will be taken as the performance of the system synthesizer would be able to produce natural speech
[6, 12, 14]. with high flexibility and intelligibility. However,
As we stated in the above section, the impact in every feature of the Tigrigna language was not
perception is assessed by comparing the evaluation considered. This paper has achieved promising result
average result to be obtained from the score ranks by defining
given by native speakers at the end of their Transliteration scheme to work with Tigrigna
perceptual judgment for the synthetic speech scripts
produced by the synthesizer. Subsequently, the Incorporated phone set, Syllabification rules,
perceptual tests were carried out to evaluate the and Letter to sound rules
20 Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

Stress assignment into Festvox. Automatic gemination and epenthesis handling

algorithm.
7. Conclusion and Recommendation
Deep studies on syllabification and final
In this work, a first attempt is made to develop a consonant cluster will improve the
speech synthesizer for Tigrinya language using unit performance of the syllabifier.
selection method. However, every feature of the
Stress assignment algorithm.
Tigrinya language was not considered because it
Tigrinya morphological analyzer
needs a lot of time and detailed and deep linguistic
knowledge. Hence, only the characteristics and way Duration modeling of consonants and vowels
of creation of Tigrinya phonemes are considered. References
From the result and analysis, it can be concluded
[1] Thierry Dutoit, A Short Introduction to Text-
that this paper has achieved its main objectives. This
to-Speech Synthesis, TTS research team, TCTS
project has produced a Tigrinya TTS system that has
Lab.1997.
the ability to process the input of Tigrinya raw text to
[2] A.W. Black, P. Taylor, and R. Caley, The
an output of Tigrinya speech sound. From the
Festival speech synthesis system, 1998.
sentences level test also, it has proved that the user
[3] Juergen Schroeter, Text to-Speech (TTS)
can understand almost every simple sentences spoken
Synthesis, AT&T Laboratories, 2002.
by the system.
[4] Nadew Tademe Mergia, Formant based speech
The major contributions of this paper are: synthesis for Amharic vowels, MSc Thesis,
Identification of a new syllable based speech Faculty of Informatics, Addis Ababa University,
unit and suitable phone set for concatenative Ethiopia, 2008.
speech synthesis for Tigrigna, [5] N. Sridhar Krishna, Text-to-speech synthesis
Development of LTS rule for Tigrigna. system for Indian languages within the Festival
Development of natural sounding TTS systems Framework:, M.S. Dissertation, Department of
for Tigrinya. Computer Science and Engineering, Indian
Demonstration of the prototype. Institute of Technology, Madras, 2004.
[6] Bar Eker, Turkish text to speech system,
Investigation of techniques to develop speech
MSc Thesis, The Institute of Engineering and
synthesis systems.
Science of Bilkent University, 2002.
To assist raise further research question.
[7] Natural Language Processing Lab Centre for
There are quite a lot of methods that can be used Development of Advanced Computing,
to improve this system. This improvement may range Building speech corpora for unit selection
from its database method to its NLP processing based concatenative text to speech system for
method and the following points are recommended Indian Languages, India.
for future work either to extend the work or to [8] Firoj Alam, Promila Kanti Nath, and Mumit
increase the quality of the synthesized speech. As a Khan, Text To Speech for Bangla Language
future work we would like to suggest the following using Festival, BRAC University, Bangladesh.
points: [9] Tesfaye Tewolde Yohannes, A modern
Syllabification of words: will greatly improve Grammar of Tigrinya, Rome, Italy, 2002.
prosodic modeling with the segmental prosody [10] Alan W Black and Kevin A Lenzo, Building
of Tigrinya should be appropriately studied Synthetic Voices, For FestVox 2.0 Edition,
and modeled. Language Technologies Institute, Carnegie
The proper identification of Tigrinya stress Mellon University and Cepstral, LLC, 2003.
point to help determine where should fall [11] Nirayo Hailu Gebregziabher Modeling
within a word. Improved Amharic Sylliblification Algorithm,
HiLCoE Journal of Computer Science and Technology, Vol. 1, No. 1 21

MSc Thesis, Faculty of Informatics, Addis [14] Hyunsong Chung, Duration Models and the
Ababa University, Ethiopia, 2011. Perceptual Evaluation of Spoken Korean,
[12] Sebsibe H/Mariam, S P Kishore, Alan W Black, Proceedings of ISCA Archive, France, 2002.
Rohit Kumar, and Rajeev Sangal, Unit [15] AlanW Black and Kevin A. Lenzo, Building
Selection Voice for Amharic using Festivox, Synthetic Voices, For FestVox 2.1 Edition.
5th ISCA Speech Synthesis Workshop, 2007.
Pittsburgh, pp. 103-107, 2005.
[13] Yonas Demeke, Duration modeling of
phonemes for Amharic text to speech system,
M Sc Thesis, Faculty of Informatics, Addis
Ababa University, Ethiopia, 2011.

Traveller Languages
No ratings yet
Traveller Languages
29 pages
Development Yoruba Syllabic at or
No ratings yet
Development Yoruba Syllabic at or
6 pages
01 TMSS 01 R0
0% (1)
01 TMSS 01 R0
0 pages
Developing Concatenative Based Text To Speech Synthesizer For Tigrigna Language
No ratings yet
Developing Concatenative Based Text To Speech Synthesizer For Tigrigna Language
12 pages
Text To Speech Synthesis For Ethiopian Semitic Languages: Issues and The Way Forward
0% (1)
Text To Speech Synthesis For Ethiopian Semitic Languages: Issues and The Way Forward
4 pages
(IJCST-V4I4P2) :walelign Tewabe Sewunetie, Eshete Derb Emiru
No ratings yet
(IJCST-V4I4P2) :walelign Tewabe Sewunetie, Eshete Derb Emiru
8 pages
Development of An Amharic Text-to-Speech System PDF
No ratings yet
Development of An Amharic Text-to-Speech System PDF
7 pages
Taalinformatie Tigrinya EN
No ratings yet
Taalinformatie Tigrinya EN
4 pages
Wishart, Trevor - Encounters in The Republic of (TES 2012 KEYNOTE)
No ratings yet
Wishart, Trevor - Encounters in The Republic of (TES 2012 KEYNOTE)
14 pages
Gatkoi Michael's Term Paper - Writing
No ratings yet
Gatkoi Michael's Term Paper - Writing
18 pages
Ethiopian Writing System - Baye Yimam PDF
100% (1)
Ethiopian Writing System - Baye Yimam PDF
9 pages
2-Phonology Notes of Tedim
No ratings yet
2-Phonology Notes of Tedim
6 pages
A Corpus-Based Concatenative Speech Synthesis System For Turkish
No ratings yet
A Corpus-Based Concatenative Speech Synthesis System For Turkish
15 pages
Text To Speech Conversion: Muhammad Amar (19L-1916)
No ratings yet
Text To Speech Conversion: Muhammad Amar (19L-1916)
4 pages
Features of Alibata
No ratings yet
Features of Alibata
5 pages
Tone in Tiv Non-Segmental Phonology. by Dyako Aondonguter Leo
No ratings yet
Tone in Tiv Non-Segmental Phonology. by Dyako Aondonguter Leo
30 pages
Precise Tone Generation For Vietnamese Text-To-Speech System
No ratings yet
Precise Tone Generation For Vietnamese Text-To-Speech System
4 pages
PL Features Parameters
No ratings yet
PL Features Parameters
13 pages
Analysis of Word-Based and Unit-Based Diphone Concatenation For Myanmar Text-To-Speech
No ratings yet
Analysis of Word-Based and Unit-Based Diphone Concatenation For Myanmar Text-To-Speech
11 pages
Ladefoged 1969 The Measurement of Phonetic Similarity
No ratings yet
Ladefoged 1969 The Measurement of Phonetic Similarity
15 pages
Sankhyakarika.: Evidence That Tamil Must Have Been Derived From Sanskrit
No ratings yet
Sankhyakarika.: Evidence That Tamil Must Have Been Derived From Sanskrit
3 pages
A Survey of Intonation Systems
No ratings yet
A Survey of Intonation Systems
43 pages
Festival Hindi Pxc3893287
No ratings yet
Festival Hindi Pxc3893287
6 pages
Unit 5 Speech Processing
No ratings yet
Unit 5 Speech Processing
12 pages
Tigrinya Speech Project - 1
No ratings yet
Tigrinya Speech Project - 1
12 pages
Tigrinya 2
No ratings yet
Tigrinya 2
4 pages
Written, Scratch and Spelling Languages: Yullips Ziwen Wong
No ratings yet
Written, Scratch and Spelling Languages: Yullips Ziwen Wong
15 pages
An Acoustic Analysis of Chinese and English Vowels
No ratings yet
An Acoustic Analysis of Chinese and English Vowels
19 pages
474 - Moutaman Mirghani
No ratings yet
474 - Moutaman Mirghani
5 pages
Syllable Typology
No ratings yet
Syllable Typology
17 pages
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
No ratings yet
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
12 pages
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
No ratings yet
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
12 pages
Graphic Representation of Language
No ratings yet
Graphic Representation of Language
3 pages
Phonotactic in The Syllables of Batak Toba Dialect
No ratings yet
Phonotactic in The Syllables of Batak Toba Dialect
12 pages
Concordance of Text and Iconography On Indus Seals
No ratings yet
Concordance of Text and Iconography On Indus Seals
9 pages
Parisberg, Adeniyi
No ratings yet
Parisberg, Adeniyi
17 pages
English Digraphs History
No ratings yet
English Digraphs History
32 pages
Latin Text To Speech
No ratings yet
Latin Text To Speech
13 pages
Distribution of Complexities in The Vai Script
No ratings yet
Distribution of Complexities in The Vai Script
12 pages
Tgrigna Cosonant Roots
No ratings yet
Tgrigna Cosonant Roots
33 pages
Reading - On Phonmes
No ratings yet
Reading - On Phonmes
3 pages
PSLLT 15408 Biber
No ratings yet
PSLLT 15408 Biber
3 pages
Humming, Whistling, Singing, and Yelling in Pirahã
No ratings yet
Humming, Whistling, Singing, and Yelling in Pirahã
27 pages
G4-Makalah ItoL Phonetics
No ratings yet
G4-Makalah ItoL Phonetics
21 pages
Chapter 4, Intro. To Phonology
No ratings yet
Chapter 4, Intro. To Phonology
16 pages
Abstracts Word2007
No ratings yet
Abstracts Word2007
15 pages
Foreign Languages 36 40887
No ratings yet
Foreign Languages 36 40887
9 pages
A Guide To Russian Diction Full Version
100% (3)
A Guide To Russian Diction Full Version
67 pages
Writing and History
No ratings yet
Writing and History
29 pages
Gabor Toth - Middle Egyptian Grammar Through Literature
No ratings yet
Gabor Toth - Middle Egyptian Grammar Through Literature
259 pages
How Tamil As A Language Evolved?
No ratings yet
How Tamil As A Language Evolved?
9 pages
Grammar Sketch
No ratings yet
Grammar Sketch
7 pages
Gemination at The Junction of Phonetics and Phonology (Tashlhiyt)
No ratings yet
Gemination at The Junction of Phonetics and Phonology (Tashlhiyt)
24 pages
A Study of Syllable Structure in Sheni Language
No ratings yet
A Study of Syllable Structure in Sheni Language
14 pages
Bhaashika: Telugu Tts System: Dr. K.V.N.Sunitha
No ratings yet
Bhaashika: Telugu Tts System: Dr. K.V.N.Sunitha
9 pages
Allomorphs
No ratings yet
Allomorphs
4 pages
Meru Dialects - The Evidence (F. Kanana - 2011)
No ratings yet
Meru Dialects - The Evidence (F. Kanana - 2011)
28 pages
Nature of Human Language and Its Charact PDF
No ratings yet
Nature of Human Language and Its Charact PDF
7 pages
Wellcome To My Pressentation
No ratings yet
Wellcome To My Pressentation
21 pages
Chapter Three: Marking Uptext Block Level Elements
100% (1)
Chapter Three: Marking Uptext Block Level Elements
20 pages
Dilla Universty Dilla University School of Mathematics and Computer Science
No ratings yet
Dilla Universty Dilla University School of Mathematics and Computer Science
8 pages
Computer Science Note
No ratings yet
Computer Science Note
180 pages
Project Guideline
No ratings yet
Project Guideline
5 pages
ch11 PP
No ratings yet
ch11 PP
29 pages
Types of Operating Systems
No ratings yet
Types of Operating Systems
7 pages
Basicchristianity Dabbs
No ratings yet
Basicchristianity Dabbs
55 pages
IRIS Biometric For Person Identification: by Lakshmi Supriya.D M.Tech 04IT6002 Dept. of Information Technology
No ratings yet
IRIS Biometric For Person Identification: by Lakshmi Supriya.D M.Tech 04IT6002 Dept. of Information Technology
27 pages
Basicchristianity Dabbs
No ratings yet
Basicchristianity Dabbs
55 pages
Database Recovery Techniques: Closely Related To Concurrency Control Protocols
No ratings yet
Database Recovery Techniques: Closely Related To Concurrency Control Protocols
18 pages
E3 Chap 11
No ratings yet
E3 Chap 11
17 pages
Chapter 3 C# Essentials
No ratings yet
Chapter 3 C# Essentials
75 pages
ch11 PP
No ratings yet
ch11 PP
29 pages
Phoneme-Based English-Amharic Statistical Machine Translation
No ratings yet
Phoneme-Based English-Amharic Statistical Machine Translation
5 pages
Information Systems Analysis 488
No ratings yet
Information Systems Analysis 488
20 pages
Admasu2010 PDF
No ratings yet
Admasu2010 PDF
112 pages
Chapter One: Basics of Software Engineering
No ratings yet
Chapter One: Basics of Software Engineering
129 pages
Tsegaye Semere PDF
100% (1)
Tsegaye Semere PDF
99 pages
Admasu2010 PDF
No ratings yet
Admasu2010 PDF
112 pages
A Text To Speech (TTS) System With English To Punjabi Conversion
No ratings yet
A Text To Speech (TTS) System With English To Punjabi Conversion
6 pages
Information Society
No ratings yet
Information Society
10 pages
CS305: HCI in SW Development: Software Process and User-Centered Design
No ratings yet
CS305: HCI in SW Development: Software Process and User-Centered Design
62 pages
Lecture 11 Hci in The Software Process
No ratings yet
Lecture 11 Hci in The Software Process
13 pages
Interaction Design: Natnael Gonfa
No ratings yet
Interaction Design: Natnael Gonfa
40 pages
Enterprise Resource Planning: MODULE 9: Business Process Management (BPM)
No ratings yet
Enterprise Resource Planning: MODULE 9: Business Process Management (BPM)
4 pages
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
No ratings yet
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
23 pages
Osram Ultra Vitalux User Manual 1993.
No ratings yet
Osram Ultra Vitalux User Manual 1993.
4 pages
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
No ratings yet
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
7 pages
Bab III
No ratings yet
Bab III
22 pages
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
No ratings yet
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
2 pages
Lucky House Others
No ratings yet
Lucky House Others
16 pages
Document 4
No ratings yet
Document 4
27 pages
Chapter 2 Opaud
No ratings yet
Chapter 2 Opaud
5 pages
Ductility Factor - Article368966 - Structuraldesigncodesofaustraliaandnewzealand - Manuscript
No ratings yet
Ductility Factor - Article368966 - Structuraldesigncodesofaustraliaandnewzealand - Manuscript
16 pages
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
No ratings yet
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
8 pages
Financial Math Assignment
No ratings yet
Financial Math Assignment
2 pages
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
No ratings yet
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
12 pages
Enterprise Value and EBITDA
No ratings yet
Enterprise Value and EBITDA
3 pages
Ict Lesson 9 Notes
No ratings yet
Ict Lesson 9 Notes
1 page
PHD Download
No ratings yet
PHD Download
1 page
Coldrinks Project
No ratings yet
Coldrinks Project
23 pages
Terms of Reference Microeconomic/econometric Consultant For The Poverty and Equity Global Practice
No ratings yet
Terms of Reference Microeconomic/econometric Consultant For The Poverty and Equity Global Practice
2 pages
NY2B21
No ratings yet
NY2B21
8 pages
66279238
No ratings yet
66279238
8 pages
Railway Institute Research Center Concept Paper
No ratings yet
Railway Institute Research Center Concept Paper
112 pages
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
No ratings yet
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
17 pages
Updated CV Hrithik Mhatre
No ratings yet
Updated CV Hrithik Mhatre
2 pages
JLG-860SJ - en
No ratings yet
JLG-860SJ - en
142 pages
Comparative Analysis of Short Film
No ratings yet
Comparative Analysis of Short Film
4 pages
(YEAR 3) Math Worksheet
No ratings yet
(YEAR 3) Math Worksheet
7 pages
External Environment Affecting Business in Nigeria
No ratings yet
External Environment Affecting Business in Nigeria
9 pages
To Investigate The Relationship Between Specific Energy (E) and Depth of Flow (Y) in A Rectangular Channel
67% (3)
To Investigate The Relationship Between Specific Energy (E) and Depth of Flow (Y) in A Rectangular Channel
4 pages
Foto Electrici2
No ratings yet
Foto Electrici2
98 pages

Unit Selection Based Text-to-Speech Synthesizer For Tigrinya Language

Uploaded by

Unit Selection Based Text-to-Speech Synthesizer For Tigrinya Language

Uploaded by

Unit Selection Based Text-to-Speech Synthesizer for Tigrinya Language

Agazi Kiflu Tibebe Beshah

and intelligibility [2]. Naturalness describes how

Stress assignment into Festvox. Automatic gemination and epenthesis handling

You might also like