Improvization of Malayalam Speech Output in Espeak Text-To-Speech Synthesizer
Improvization of Malayalam Speech Output in Espeak Text-To-Speech Synthesizer
Speech is the primary means of communication and Malayalam Text to Speech systems that are available to-
interaction between people.Automatic generation of speech day generate poor quality voice.In the last few years, efforts
from text, referred to as text-to-speech (TTS) synthesis has were improving towards development of a TTS system for
been gaining significant interest in commercial applications Malayalam. Fragmented efforts can be seen with many dif-
like talking aid for vocally handicaped people,training and ferent agencies working toward this goal with different sets
educational aid,reading aid for visually handicaped people of technologies. One major challenge in achieving this goal
etc.Recent progress in speech synthesis has produced syn- was lack of linguistic details in a usable and documented
thesizers with high intelligibility for some major languages form. Further to that various analysis has to be done on
like English, but the sound quality and naturalness remain actual human voice find out various parameter that are re-
a major problem.Synthesis of natural sounding speech de- quired to synthesize voice.
pends on how well the duration and intonation pattern are eSpeak[3] is one of the TTS systems that synthesizes
imposed on them.Durational variation is incorporated in many foreign as well as Indian languages including Malay-
text to speech synthesis systems using duration models, alam.There are several issues regarding the Malayalam lan-
which predict the duration of individual segments by con- guage output in the current version(eSpeak 1.47.10).This
sidering various factors affecting duration of the speech paper aims at improving the Malayalam speech output of
4321
eSpeak. Success of Malayalam TTS depends not only on address-
ing this issue but also in incorporating the regional varia-
The overall paper is organized as follows.Section 2 dis-
tion in speaking[?].Not much work has been done in eS-
cusses about some of the existing Malayalam TTS sys-
peak.The natives of Latin have tried to improve their lan-
tems.Section 3 outlines the issues in the current eSpeak
guage in eSpeak by defining their own phoneme sets and
1.47.10 .Section 4 describes the methodology.In section 5
dictionary[11].Letter-to-sound rules have been framed for
we discuss about the results.We evaluate our result based
many languages like Latin[11], Urdu[5], Bengali[1] etc.
on Mean Opinion Score in section 6.The status of the mod-
But in Malayalam it has not been done yet.
ified eSpeak is discussed in section 7.
Literature shows that the following Malayalam speech Several issues were identified regarding the Malayalam
engines are available with some advantages and limitations. language output in the current version.
• ML-TTS : It works with both Windows and Linux. • Pronunciation of Numbers are incorrect.Numbers
ML-TTS was developed through the effort of IIT above hundred were pronounced as /onnynu:ti/
Madras, IIT Hyderabad, C-DAC Trivandrum and instead of /orynu:ti/.For eg: 1121 was pronounced
Mumbai.Has the advantage of legibility and involve- /a:yiratti onnynu:ti irupatti onny/ instead of /a:yiratti
ment of native speakers in development.They have cer- orynu:ti irupatti onny/.Number 500 was pro-
tain limitations like bigger size 2GB, less mobility, nounced as /anju:Ry/ instead of /anû:Ry/.There
slow and without speed control. were problems with 3000,5000,8000,9000, 10000,
13000,15000,18000 and 19000.1 lakh was pronounced
• Dhvani : It was developed by Simputer trust, headed /onny laksham/ instead of /ory laksham/.
by Dr. Ramesh Hariharan at Indian Institute of Sci-
ence, Bangalore in year 2000. • Duration of speech sounds are not proper.Appropriate
durational variations are required inorder to acieve nat-
• eSpeak : Developed by Jonathan Duddington.It is orig- ural sound.
inally known as Speak and written for Acorn/RISC OS
computers starting in 1995. eSpeak is a speech syn-
thesizer software for various foreign as well as Indian 4. Methodology
languages including Malayalam. It uses formant syn-
thesis method and allows many languages to be pro- As already mentioned, eSpeak is an open source soft-
vided in a small size of just 2MB. The speech is clear, ware synthesizer for various foreign as well as Indian lan-
and can be used at high speeds, but is not as natural or guages including Malayalam.In eSpeak the specific features
smooth as larger synthesizers which are based on hu- of a language are captured in easy to understand text files
man speech recordings.However it has the limitations and so development is easier since no time is required to
that native speakers are not involved in development familiarize with the source code.Each language is provided
and Malayalam phonemes used at present are not per- with its own language module,their names starting with the
fectly legible to comprehend the spoken text. language name.The developer of eSpeak software,Jonathan
Duddington, has given a provision for modifying the lan-
guage modules by the native speakers.
It is observed that the development of TTS in Indian
languages is a difficult task, especially for Malayalam, The language module consists of a voice file,a pronun-
in which same letters are pronounced in multiple ways. ciation dictionary and a phoneme source file. A voice file
4322
specifies a language along with various attributes that af- In case of English ,the number pronunciation has
fect the characteristics of the voice quality and how the lan- a regular pattern.But, in Malayalam there are several
guage is spoken.The phoneme file contains phoneme defi- variations.Issues regarding the pronunciation of num-
nitions for the vowels and consonants which the language bers were already discussed in section 3.They were
uses.Some of the phonemes are derived from Hindi.All corrected by adding exceptions in the ml list file.
phonemes are represented in ASCII characters using the
Kirshenbaum scheme[10].The phoneme definitions consists 3. Duration Modelling
of the type (vowel, nasal etc), length or duration in millisec- Vowel durations in Malayalam were analyzed
onds, and formant frequencies.For the phonemes that are from a database created by IIIT Hyderabad[6].The
difficult to synthesis, prerecorded WAV files are used.Once database consists of around 1000 sentences taken from
the language’s phonemes have been defined, then pronun- Malayalam wikipedia,Out of which, duration data
ciation dictionary data can be produced in order to trans- from about 50 sentences were taken and durations of
late the input text into phonemes. This consists of two each vowels based on their positions in a word,were
source files: language rules (the spelling to phoneme rules) analyzed by plotting histograms.The durational varia-
and language list (an exceptions list and attributes of cer- tions were incorporated along with the phoneme defi-
tain words).Since the aim was to improve eSpeak TTS nitions in the ph malayalam file.
for Malayalam, we were concerned about the Malayalam
language module that consisted of ml rules, ml lists and
5. Results and Discussion
ph malayalam files which can be viewed in the eSpeak
documentation[3]. 1. Letter-to-Sound Rules in Malayalam
The Letter-to-sound(LTS) rules were framed and imple- Rules were formulated based on the knowledge
mented, the pronunciation of numbers were corrected and obtained from Malayalam language experts and com-
duration modelling was done to improve the naturalness of putation linguists. Here we mainly describe the for-
output speech of eSpeak. mulation of pronunciation rules which can be broadly
classified for consonants and vowels.
1. Framing and implementation of LTS rules
A set of 7 rules were formulated based on • Case /a/
the knowledge obtained from Malayalam language If /a/ is not in word final syllable and if
experts and computation linguists.These rules were suceeding consonant is palatal or alveolar (eg:
implemented ml rules file and the Malayalam phoneme /aTayum/) or,if preceding consonant is voiced
source file in the eSpeak directory .By rewriting the stop(/ga/,/ja/,/da/,/Da/,/ba/) or /ya/, /ra/, /Ra/, /la/
rules stored in the ml rules file, allows the user to (eg: /balaM/, /jalaja/) then,
define pronunciations for a single character or a group replace /a/ with the sound /e/.
of characters based on a certain context.The rules • Case /u/
are organized in a particular group and written in a /u/ is rounded only in the initial syllable and fi-
particular syntax as shown below: nal syllable ( ammu - rounded ; /karuNa/ - un-
rounded).
In word middle it is rounded only if the vowel in
.group P the preceding syllable is also /u/ (eg: /uDuppy/)
P( Ja If not word initial or word final and if preceding
P (w ja vowel (vowel in the preceding phoneme) is not
P Je /u/ then,
P (B J replace /u/ with raised and retracted form of shwa
/y/.
where B is a combining vowel sign and ’ ’ denotes • Case /i/
that the position of that particular consonant or vowel /i/ will be changed to a form of shwa in word
is at the end of the word.Apart from the capital let- middle and duration will be very less.
ter B and ’ ’, there are more such special symbols
which can be seen in detail in the dictionary file of • Case /h/
eSpeak documentation.[3].The newly formulated rules When in a consonant cluster with a nasal, /h/ is
were added to the existing rules file. not pronunced, instead the consonant is gemi-
nated i.e replace /h/ with that nasal. (eg: /brah-
2. Correcting the pronunciation of numbers mam/, /chihnam/)
4323
• Case /y/ length 90
When preceded by a consonant and followed FMT(vowel/a# 3)
by /a/, /ya/ changes into an opened up /e/ (eg: ENDIF
/vyasanam/) ENDIF
• Case /nda/ FMT(vowel/a# 3)
Post nasal stops are converted to nasals.In endphoneme
/nandi/, /da/ is replaced with /na/; ie it is sounded
/nanni/.
• Case anuswaram 6. Evaluation
If not word initial or word final and if succeeding
phoneme is /ka/ vargam then replace it with cor- Evaluation was done by playing the speech output(in
responding nasal. WAV format) of the version,eSpeak 1.47.10 and the mod-
(eg: /bhaNgi/, /saNgiidam/) ifed version to about 15 listeners.Output WAV files included
14 numbers, 8 words and 5 sentences.Listeners were asked
The newly formulated rules were added to the existing to put scores(between 0-5) according to the quality of per-
rules file(ml rules).For example, the last rule described ception.Finally Mean opinion score(MOS) was calculated
above is written as : for each of the numbers, words and sentences and tabulated
w ) K (B Ni , as shown in tables 1,2 and 3.
in the .group K . Here /ga/ was replaced by the corre-
sponding nasal sound. 1. Evaluation of Numbers
2. Pronunciation of Numbers First the numbers were evaluated.A set of 14
The pronunciation of numbers were corrected by numbers as specified in Table 1 were taken.The speech
adding exceptions in the ml list file.It sounded much outputs of eSpeak 1.47.10 and the modified version
better than the output of the current version(eSpeak were played to the listeners.The mean opinion score
1.47.10). was calculated and tabulated as shown in table 1.
4324
Figure 1. Comparison of number outputs of eSpeak 1.47.10 and
modified version
Figure 3. Comparison of sentence outputs of eSpeak 1.47.10 and
Sl No Words MOS(1.47.10) MOS(modified)
modified version
1 Jalam 2.02 3.93
2 Nanni 1.23 3.83
3 Brammam 1.03 3.567 exceptions were added to the list file inorder to improve
4 Chinnam 1.06 3.93 the pronunciation of numbers.The corrected exception list
5 Bhangi 1.0 3.86 file(ml list) was sent to Jonathan Duddington,the developer
6 Uduppu 1.8 2.9 of eSpeak synthesizer and is being updated in the latest ver-
7 Vyasanam 1.63 2.8 sion released(eSpeak 1.47.11.c). Also we were able to im-
8 Vananira 0.866 1.7 prove the rhythm of speech to some extent by appropriate
Table 2. MOS of Word outputs of espeak 1.47.10 and modified modifications in the vowel durations.Eventhough problems
version still exists with the pronunciation of the letter /r./ and also
the intonation of speech, after the modifications we made as
already described, the speech output sounded better than the
previous version(eSpeak 1.47.10).An evaluation was also
done to compare the speech outputs of the current(1.47.10)
and the modified versions and it was observed that the out-
put of modified version sounded much better and natural
than the current version.Further improvement can be made
by defining new phoneme set specifically for Malayalam
and also by modifying the intonation of speech.
4325
Sl No Sentences MOS(1.47.10) MOS(modified)
1 Jalajaye kanan nalla bhangi und 1.97 4.03
2 Malayalam keralathile audhyogika bhashayanu 2.13 3.43
3 Nammalk orumich sramich nokam 1.97 3.13
4 Nammal cheyyunat valare nalla karyamanu 2.3 3.5
5 Ellavarkum nanni ariyikunnu 2.1 3.9
Table 3. MOS of Sentence outputs of espeak 1.47.10 and modified version
4326