Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
275
TEXT TO SPEECH: A SIMPLE TUTORIAL
III. ARCHITECTURE OF TTS “too common words “and other diacritics from letters .
The TTS system comprises of these 5 fundamental components: Text normalization is useful for example for comparing
two sequences of characters which represented differently but
A. Text Analysis and Detection mean the same. “Don‟t” vs “Do not”, “I‟m” vs “I am”, “Can‟t”
B. Text Normalization and Linearization
C. Phonetic Analysis
vs “cannot” are some of the examples.
D. Prosodic Modeling and Intonation The main 4 phases of Text Normalization are
E. Acoustic Processing (i). Number converter: Number is pronounced differently in
The input text is passed through these phases to obtain the speech. different situations. Like
1772 (date): seventeen seventy two.
1772(phone number): one seven seven two
Input Text 1772 (quantifier): one thousand seven hundred and seventy
two .
Fractional and decimal numbers are handled.
0.302 (number): point three knot two
Text Analysis Text Normalization
&Text Detection & Text Linearization (ii). Abbrevation converter:
Abbreviations area changed to full textual format.
Mrs. - Misses
Phonetic Analysis St. Joseph St. - Saint Joseph Street
276
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-2, Issue-1, March 2012
Phoneme Set (English) E. Acoustic Processing
The speech will be spoken according to the voice
Vowels (19) : /a/, /ae/, /air/, /ar/, /e/, /ee/, /i/, /ie/, /o/, characteristics of a person, There are three type of Acoustic
/oe/, /oi/, /oo/, /ow/, /or/, /u/, /ur/, /ue/, /uh/, /w/. synthesing available
The letter sounds for a word are blended together to form a This paper made a clear and simple overview of working of
pronunciation based on some rule. Here main advantage is text to speech system (TTS) in step by step process. There are
that it requires no database and it works on any type of input. many text to speech systems (TTS) available in the market
same way the complexity grows for irregular inputs and also much improvisation is going on in the research area
to make the speech more effective, natural with stress and
D. PROSODIC MODELLING AND INTONATION emotions. We expect the synthesizers to continue to improve
research in prosodic phrasing, improving quality of speech,
The concept of prosody is the combination of stress voice, emotions and expressiveness in speech and to simplify
pattern , rhythm and intonation in a speech. The prosodic the conversion process so as to avoid complexity in the
modeling describes the speakers emotion. Recent program.
investigations suggest the identification of the vocal features
which signal emotional content may help to create a very
natural [9] synthesized speech. REFERENCES
Intonation is simply a variation of speech while speaking. [1] Frances Alias, Xavier Servillano, Joan Claudi socoro and Xavier
Gonzalvo “Towards High-Quality Next Generation Text-to-Speech
All languages use pitch, as intonation to convey an instance,
Synthesis:A multi domain Approach by Automatic Domain
to express happiness, to raise a question etc. Modelling of an Classification”,IEEE Transactions on AUDIO,SPEECH AND
intonation is an important task that affects intelligibility and LANGUAG PROCESSING, VOL16,NO,7 september 2008.
naturalness of the speech. To receive high quality text to [2] Qing Guo, Jie Zhang, Nobuyuki Katae, Hao Yu , “High –Quality
speech conversion, good model of intonation is needed. Prosody Generation in Mandrain Text-to-Speech system”, FujiTSu
Sci.Tech,J., vol.46, No.1,pp.40-46 ,2010.
Generally intonations are distinguished as [3] Gopalakrishna anumanchipalli,Rahul Chitturi, Sachin Joshi, Rohit
Kumar, Satinder Pal Singh,R.n.v Sitaram,D.P.Kishore, “Development
(i) Rising Intonation of Indian Language Speech Databases for Large Vocabulary Speech
(when the pitch of the voice increases) Recognition System”,
[4] A.Black, H.Zen and K.Tokuda “Statistical parametric speech
(ii) Falling Intonation synthesis”, in proc.ICASSP, Honolulu, HI 2007, vol IV, PP
(when pitch of the voice decreases) 1229-1232.
(iii) Dipping Intonation [5] G.Bailly, N.Campbell and b.Mobius, “ISCA special session: Hot topics
(when the pitch of the voice falls and then rises) in speech synthesis”, in proc.Eurospeech,Genea, Switzerland, 2003, pp
37-40.
(iv) Peaking Intonation
[6] M.Ostendorf and I.Bulyko, “The impact of speech recognition on
(when the pitch of the voice raises and then falls) speech synthesis”, in proc, IEEE Workshop Speech Synthesis, Santa
Monica,2002,pp. 99-106.
[7] Text To Speech Synthesis - a knol by Jaibatrik Dutta .
277
TEXT TO SPEECH: A SIMPLE TUTORIAL
278