Python TamilNLP
Python TamilNLP
net/publication/239531679
CITATIONS READS
18 3,590
3 authors:
Laxmi Narayana
Mylan Inc.
5 PUBLICATIONS 36 CITATIONS
SEE PROFILE
All content following this page was uploaded by A.G. Ramakrishnan on 02 June 2014.
Abstract
This paper describes the development of the Natural language Processing module for the Tamil TTS Synthesis System. The input to a
TTS system is not always pure text and it may contain some acronyms, abbreviations and non-standard words, which need to be first
converted to the corresponding Tamil graphemic form. A text normalization module is developed for accomplishing this task. A G2P
converter which converts the normalized orthographic text input into its phonetic form is necessary for both the input text and also for
the text corpus under consideration. The phonetic transcription helps in identifying and analysing the basic units such as mono-phones,
diphones and syllables and also for segmenting the speech corpus. A character to phoneme mapping interface is developed to map the
Tamil graphemic text to the corresponding phonetic representation in Roman script. A Rule base is created which contains the inter
and intra word rules for changing the default character phone mapping wherever necessary. A proper noun lexicon as well as foreign
word lexicon is also incorporated for dealing cases where G2P fails. The NLP module designed is to be used for Tamil TTS synthesis
in both Windows platform and Festival (Linux) environment.
Look Up Table
Character Phone
Mapping
6. Appendix