A Rule Based Kannada Morphological Analyzer and Generator Using Finite State Transducer

International Journal of Computer Applications (0975 – 8887)
Volume 27– No.10, August 2011
A Rule based Kannada Morphological Analyzer and

Generator using Finite State Transducer
Ramasamy Veerappan, Antony P J, S Dr. Soman K P
Saravanan Professor and Head
Research Scholar Computational Engineering and Networking Centre,
Computational Engineering and Networking Centre, Amrita Vishwa Vidyapeetham University, Coimbatore,
Amrita Vishwa Vidyapeetham University, Coimbatore, Tamil Nadu, India
Tamil Nadu, India
ABSTRACT historically and linguistically rich, the development in natural

M orphology plays an essential role in machine translation and language processing for Kannada is very slow. The main reasons
many other natural language processing applications. Developing includes: non-availability of large scale data resources and also
a well fledged morphological analyzer and generator (M AG) tools due to the inherent complexities of the language.
for highly agglutinative language like Kannada is a challenging To build a M AG for a language one has to take care of the
task. The function of morphological analyzer is to return all the morphological peculiarities of that language, specifically in case
morphemes and their grammatical categories associated with a of machine translation. Some peculiarities of Kannada language
particular word form. For a given root word and grammatical such as, the usage of classifiers, excessive presence of vowel
information, morphological generator will generate the particular harmony etc. make it morphologically complex and thus, a
word form of that word. In the proposed project, we have challenge in natural language generation (NLG).
developed a rule based M AG using finite state transducer. This
project has been developed as part of the development of a Generally there are two approaches used to develop
machine translation system for English to Kannada language. The morphological analyzer and generator. The first approach is called
performance of the system was tested randomly against a set of corpus based approach where a large sized well generated corpus
lexicon containing approximately twenty thousand root words is required for training using a machine learning algorithm. The
including noun, verb, adjectives and adverbs. performance of the system will depends on the feature and size of
the corpus. The disadvantage is that corpus creation is a time
consuming process. On the other hand, rule based approaches are
Keywords based on a set of rules and dictionary that contains root and
M orphology, Kannada, Finite state transducer, Agglutinative, morphemes. In rule based approaches every rule depends on the
Orthographic rules previous rule. So if one rule fails, it will affect the entire rules that
follow it. When a word is given as an input to the morphological
analyzer and if the corresponding morphemes are missing in the
1. INTRODUCTION dictionary then the rule based system fails [1]. This paper is about
The morphological structure of an agglutinative language is
the design and development of M AG for Kannada language using
unique and capturing its complexity in a machine analyzable and
the rule based approach by considering all the peculiarities. We
generatable form is a challenging job. Analyzing the internal
have implemented the system using AT &T Finite State M achine.
structure of a particular word is an important intermediate stage in
many natural language processing applications especially in The function of morphological analyzer is to segment the given
bilingual and multilingual M T system. A M orphological analyzer word into component morphemes and assignin g correct morpho-
is used to analyze the internal structure of the words of a syntactic information. The table 1 shows examples for
language. On the other hand a morphological generator does morphological analysis of Kannada words.
exactly the reverse of it i.e. given a root word and grammatical
information morphological generator will generate the particular
word form of that root word. The role of morphology is very Table 1. Input/Output examples for morphological analyzer
significant in the field of NLP, as seen in applications like MT, Input Output
question-answering (QA) system, IE, IR, spell checker,
(AnegaLu) + (Ane+gaLu)
lexicography etc. So from a serious computational perspective the
creation and availability of a morphological analyzer for a
language is important. (hOguttEne) + +
Kannada is one of the four major Dravidian languages of South (hOgu+utt+Ene)
India. It is a state language of Karnataka and is spoken by about
20 million people. It has a long linguistic of about 1,500 years and The function of morphological generator is to combine the
had a continuous literature for over 1,200 years. Kannada is a constituent morphemes to get the actual word. The table 2 shows
morphologically rich language in which morphemes combine with examples for morphological generation of Kannada words.
the root words in the form of suffixes. Even though Kannada is
45
Table 2. Input/Output examples for morphological generator handle compound formation morphology and can handle
maximum 500 distinct nouns and verbs. A Paradigm based
Input Output
M orphological Analyzer for Kannada Language Using M achine
+ (Ane+gaLu) (AnegaLu) Learning Approach was developed by Antony P J and Dr Soman
K P of Amrita Vishwa Vidyapeetham in 2010 [14]. This is a
+ + (hOguttEne) morphological analyzer for Kannada verbs and can also handle
compound verb morphology. Uma M aheshwar Rao G and
(hOgu+utt+Ene) Parameshwari K of CALTS, University of Hyderabad attempted
to develop a morphological analyzer and generators for South
2. LITERATURE SURVEY Dravidian languages in 2010 [15]. A network and process model
In general there are several approaches attempted for developing for Kannada morphological analysis/ generation was developed
morphological analyzer. In 1983 Kimmo Koskenniemi developed by K. Narayana M urthy and the performance of the system is 60
a two-level morphology approach, where he tested this formalism to 70% on general texts [16]. Recently (Jan- 2011) Shambhavi B.
for Finnish language [2]. In this two level representation, the R and Dr. Ramakanth Kumar of RV College, Bangalore
surface level is to describe word form as they occur in written text developed a paradigm based morphological generator and
and the lexical level is to encode lexical units such as stem and analyzer using a trie based data strucure [17]. The disadvantage of
suffixes. In 1984 the same formalism was extended in other trie is that it consumes more memory as each node can have at
languages such as Arabic, Dutch, English, French, German, most „y‟ children, where y is the alphabet count of the language.
Italian, Japanese, Portuguese, Swedish, Turkish and developed As a result it can handle up to maximum 3700 root words and
morphological analyzers successfully. In the same time a rule around 88K inflected words.
based heuristic analyzer for Finnish nominal and verb forms was
developed by Jappinen [3]. In 1996, Beesley developed an Arabic 3. CHALLENGES IN KANNADA
finite state transducer for M A using Xerox finite state transducer MORPHOLOGY
(XFST), by reworking extensively on the lexicon and rules in the Kannada is a verb-final inflectional language with a relatively free
Kimmo-style [4]. At 2000, Agirve introduced a word–grammar word order. Kannada morphology is characterized as agglutinative
based morphological analyzer using the two- level and a or concatinative, i.e., words are formed by adding suffixes to the
unification- based formalism for a highly agglutinative language root word in a series. M ost of the words may change spelling
called Basque [5]. Similarly using XFST, karine made a Persian when stems are inflected. Normally root word is affixed with
MA in 2004 and Wintner came up with a morphological analyzer several morphemes to generate thousands of word forms. The
for Hebrew in 2005 [6, 7]. Oflazer Kamel developed a Finite State complexity of developing M AG for Dravidian language like
M achine (FSM ) based Turkish morphological analyzer. In 2008, Kannada is comparatively higher than the other languages like
using the syllables and utilizing the surface level clues, the English. M ost of the words may change spelling when stems are
features present in a word are identified for Swahili (or Kiswahili) inflected. In agglutinative language like Kannada normally root
language by Robert Elwell. word is affixed with several morphemes to generate thousands of
In case of Indian languages, AU-KBC Research Centre of Anna word forms. To build an effective morphological analyzer one
University developed a finite state automata based morphological should carefully analyze and identify all these roots and
analyzer for Tamil language [8]. Dr. Shailly Goyal and Dr. Niladri morphemes.
Chatterjee of Indian Institute of Technology Delhi, worked on Due to the highly agglutinating nature of the Kannada language
Hindi noun phrase morphology for developing a link grammar and the morphophonemic variations that take place at the point of
based parser [9]. M rs. Rita M athu , Dr. M adhavi Sinha and Prof. agglutination, it is very difficult to mark word boundaries [14].
Rekha Govil also worked on Hindi M orphology. M any attempts Design should possibly cover all types of inflections. For
have been done in case of Bengali and M arathi language example, the different meaningful parts of the word
morphology. In Bengali, unsupervised methodology is used for „ ದವನ’ (OdikoM Diddavana) -> „the one (masculine)
developing a morphological analyzer and two-level morphology
approach was used to handle Bengali compound words by Sajib who was reading‟ is:
Dasgupta, in 2007 [10]. M anish Shrivastav, Nitin Agrawal, +ಇ+ +ಉ+ + +ಅ+ +ಅ
Bibhuti M ahapatra, Smriti Singh and Pushpak Bhattacharyya
worked on morphology based natural language processing tools Odu + i + koLLu +M D+ u + iru + dd + a + avanu + a
for Indian languages. A morphological analyzer and generators Root + VBP + AUXV +P ST+ VBP + AUXV + P ST+ RP + P RON -3SM + ACC
for Telugu, Tamil and Kannada was developed by University of
Hyderabad [2]. Rule based morphological analyzer have been 3.1 Types and Features of Kannada Words
developed for Sanskrit and Oriya by Girish Nath jha and M ohanty In general, there are three types of Kannada Words namely: i)
respectively. namapada (Declinable words or nouns) ii) kriyapada (Conjugable
words or Verbs) and iii) avyaya (Uninflected words). Nouns,
We have made a literature survey on Kannada natural language Pronouns and Adjectives are belongs to declinable words and are
processing and found the following developments: A Kannada inflected to differences of case, number and gender. Conjugable
indexing software prototype is developed by Settar in 2002 [11]. words are inflected to mark differences of person, gender,
A Kannada Word net is attempted by Sahoo and Vidyasagar of number, aspect, mood and tense. All the Kannada words are of
Indian Institute of Technology, Bangalore, in 2003 [12]. T. N. three genders: masculine, feminine and neuter. Declinable and
Vikram and Shalini R Urs developed a prototype of Conjugable words have two numbers: singular and plural. The
morphological analyzer for Kannada language (2007) based on singular has no particular distinguishing marker added. The plural
Finite State M achine [13]. This is just a prototype and does not marker is usually “gaLu”, but there are some exceptions as
46
follows: M asculine nouns (E.g., huDuga) ending in “a” and some Ablative (deseyiMda (gaLadeseyiM
feminine nouns (E.g.,hemgasu) endings in “u” have plural with
(Pachami)
“aru” . Feminine nouns ending with “i (E.g.,huDugi)” or “e (atte)” ) (Mdirad
have plural with “yaru”. Also nouns with kinship terms (E.g., eseyiMda)/ (y
aNNa), the marker for plural is often “aM diru”. Some nouns are
irregular plurals such as “makkaLu” which is the plural for noun aradeseyiMda)/
“magu”. (radeseyiMda)
3.2 Noun Cases and Characte ristics suffixes Genitive

The case system of Kannada is similar to those of other south ದ(da)/ಯ(ya)/ಇನ(ina)/ ಗಳ (Mdira)/ಯ
(Shashti)
Dravidian languages like Tamil, Telugu and M alayalam. Nouns ನ(na)/ಅ(a)/ (vina)/ ರ(yara)/ರ(ra)
may usually end in a, e, i, u, A, or in a consonant [18]. Various ಅರ(ra)
suffixes are added to the noun stem to indicate different
relationships between the noun and other constituents of the
sentence. The different types of suffixes are used with a particular Locative (dalli)/ (yalli) (M
(Saptami)
case based on the type of nouns and their end character. For / (alli)/ (nalli) diralli)/ (yaralli)/
example “dative” case characteristic suffixes are decided by the
following criteria as shown in table 3. (ralli)
Table 3. Dative Case Characteristics suffixes for Nouns Vocative ಏ(E)/ (vE)/ಆ(A)/ಈ(I) (MdirE)/
Noun type Ends with Dative Examp Dative (Sambhod
ana) / (yare)
suffix le noun form
Neuter noun ಅ (a) ಮರ
(kke) (mara) (marakke)
3.3 Verb Morphology
ಎ,ಇ,ಉ Comparing with other Dravidian language like M alayalam, the
(e,i,u) (ge) (mane) (manege) morphological structure of Kannada is more complex because it
consonants inflects to person, gender, and number markings [14]. In case of
verb morphology each root word is combines with auxiliaries that
(ige) (Uru) (Urige) indicate aspect, mood, causation, attitude etc. The uniqueness in
Neuter - the structure of verbal complexity makes it very challenging to
determinative capture in a machine analyzable and generatable format. Also the
(akke) (idu) (idakke)
formation of the verbal complex involves arrangement of the
Rational noun - verbal units and the interpretation of their combinatory meaning.
(nige) (aNNa) (aNNanige) Phonology also plays a little role in word formation in terms of
„morphophonemic‟ and „sandhi‟ rules which account for the shape
changes due to inflection.
Table 4 below shows the different cases and their corresponding
characteristic suffixes for nouns. Verb forms can be broadly classified into two types: finite verbs
and non-finite verbs. In case of finite verbs, the verbs are usually
Table 4. Noun Cases and their Characteristics suffixes added to the end of sentences with the exception of Clitics and can
have nothing added to them. The general syntax of finite verb is
Feature Characteristic S uffix the form: Subject-Object-Verb. Some of the finite forms of the
Singular Plural verbs are imperatives, present and past forms marked with PNG,
modals and verbal/participle nouns. The tense can be
Nominative (vu)/ (yu)/ಉ(u)/ (gaLu)/ ( past/present/future, if it is in the affirmative. The negative form
(Prathama)
(nu) Mdiru)/ (yaru) does not take tense. The non-finite verbs in contrast cannot stand
alone and must have some other forms following them. Non-finite
Accusative (vannu)/ (ya verb forms include infinitives, verbal and adjectival participles
(Dwitiya) and tense-marked verb stems [19]. The non-past denotes both
nnu)/ (annu)/ (Mdirannu)/ (yaran
present and future tenses and unlike M alayalam language (another
(nannu) nu)/ (rannu) south Dravidian language) all tenses have different tense markers
in Kannada language. M ood is another important feature of
Instrument (diMda)/ ( Kannada language and is associated with statements of fact versus
al possibility, supposition, etc [20]. There are four different moods
(yiMda)/ (iMda)/ MdiriMda)/ (yariMd
(Tritiya) that are expressed in Kannada are: infinitive, imperative,
(niMda) a)/ (riMda) affirmative and negative. Also Kannada has some additional
modal forms such as: indicative, conditional, optative, potential,
Dative (kke)/ (ge)/ (Mdiri monitory and conjunctive.
(C haturthi)
(ige)/ (nige)/ ge)/ (yarige)/ (rige) Kannada language also include past verb stems in addition to
(akke) simple verb stems, that are used in forming the past tense, past
participles, conditionals and some other constructions. The past
47
stems also form the base to which contingent PNG markers are The Person–Noun-Gender (PNG) and the tense marker
added. The contingent form is another distinguished feature of concatenated to the verb stems are the two important aspect of
Kannada language that is not present in any other Dravidian verb morphology [14]. The verbal inflectional morphemes attach
languages [21]. Table 5 shows the features of Kannada words to the verbs providing information about the syntactic aspects like
with examples. number, person, case-ending relation and tense. Usually the
Kannada verbs follow the regular pattern of suffixation. The table
Table 5. Verb features and Characteristics suffixes 6 shows the various PNG suffixes that can be attached to be any
Feature Characteristic Example verb root word.
Suffixes
Table 6. Kannada PNG- Suffixes
‘ ‘(al)/ ‘ ’ ( baru ) -> come + Pe Numb Gender PNG S uffix
rso er Pres Futu Past Conti
(Okke). ( al ) + ( illa ) -> n ent re ngent
Infinitive
negative = ( Fir Singul M asculi , ,
st ar ne/
baralilla ) -> didn’t come (Ene ಎ ಎ (Enu)
Femini
ne )
O(yO)/ E(yO)/ (hOgO)/ (enu, (enu,
Imperativ e)
iri(yiri) (hOgE/ (hOgiri) e)
e Plural M asculi
ne/
‘ ’ (hOgu) + ಅ (a) + Femini (Eve (evu) (evu) (
Negative
(bAradu) / (bAradu) ne ) Evu)
Imperativ Se Singul M asculi
e ‘ ’ (bEDa)/ = ಈ, ಇ, ಇ, ಈಯ
co ar ne/
‘ ’ (hOgabAradu) -> ‘don’t go’ nd Femini (Izha)
ne (I, (i, (i,
(kUDadu)
Iye) iye) iye)
‘ಇ’ (i) (maDu) + (al) + Plural M asculi
Optative ne/
ಇ (i) = (mADali), ‘let Femini (Iri) (iri) (iri) ( Iri)
do’ ne
Singul M asculi
ar ne
‘ಓಣ’ (ONa) (mADu)+ ಓಣ (ONa) = Thi (Ane (anu) (anu) (Anu)
Hortative rd )
(mADONa), ‘let’s
Singul Femini
do’ ar ne
(Ale) (aLu) (aLu) (
‘ಆ’ (A)/ ‘ಇ’ (i)/ + = ALu)
Participle Plural M asculi (
‘ ‘(ade)/’ (nODu + ade = nODade) ->
: ne/
’ (adu)/‘ಅದ’ ‘without seeing’ Femini (Are (aru) (aru) Aru)
ne )
(ada)
Singul Neuter ,
ar
Verbal ‘ ’(biDu)/‘ ( biTTu biDu ) (ide) (udu) (ittu) (Ittu)
aspect Plural Neuter
markers ’(hOgu) ‘let go’
(ive) (avu) (avu) (
causative ‘ ’(isu) / ‘ ’ (kali -> learn) + Avu)
suffix
‘yisu’ ‘ ‘(isu) -> ‘ ’ (kalisu
-> teach) 4. IMPLEMENTATION OF MAG MODEL
The proposed rule based MAG tool was developed using AT &T
condition ‘are’ ‘ ’ (hOdare) ‘if
Finite State M achine. This section explains the various efforts
al suffix required to create the proposed M AG system.
(someone) goes, (then…)’
48
4.1 Classifying Verb Paradigms Class-19 - -(-dd-) Verbs ends with 'Eyu'
One of the most important steps involved in the creation of M AG Eg: :mEyu
is to classify the verb paradigms with computational perspective. Class-20 - -(-dd-) Verbs ends with 'ellu'
M ost of the cases the problem arises due to past tense markers that Eg: „gellu‟
change from one paradigm to another [22]. Past verbs are broadly Class-21 - -(-dd-) Verbs ends with 'ADu', 'ODu'
classified into two types called regular and irregular (or semi Eg: „ADu‟,‟nODu‟, „kADu‟,
regular). In case of regular the different words are formed by „tODu‟ , etc
adding „id‟ to the verb stem. In the other case different words are Class-22 - - (-id-) Verb ends with 'TTu',‟ddu‟, „bbu‟,‟
formed by adding any one of the past tense marker as shown in ttu‟, „llu‟, „ccu‟
table 7. To resolve the computational challenges in verb Eg: aTTu, addu, ubbu, kuttu, cellu,
morphological analysis we have classified verbs into 35 heTTu, beccu,hottu etc
distinguished paradigms and verb words are grouped based on Class-23 - - (-id-) verbs ending with 'Oru', 'Eru'
their class paradigms [14]. Eg: tOru, sEru, hEru ,hOru etc
Class-24 - - (-id-) Verbs ends with 'ju',‟Du‟,‟su‟
Table 7. Proposed Kannada Verb Paradigms Eg: mOju, ADu, aM kurisu etc
Paradig Past tense Description & Example Class-25 - - (-id-) Verb ends with 'M Tu',M ju, M cu
ms marker Eg. IM Tu, aM ju, hoM ju etc
Class-1 - -(-tt-) Verbs ends with 'Ayu', 'Iyu', 'ILu' Class-26 - - (-id-) Verbs ends with 'ELu', 'ILu'
Eg: sAyu, Iyu, kILu etc. Eg: hELu, sILu etc
Class-2 - -(-tt-) Verbs ends with 'eru', 'aLu', 'uLu' Class-27 - - (-nd-) Verbs ends with 'Eyu', 'Oyu'
Eg: „heru‟,‟horu‟,‟aLu‟,,‟uLu‟ etc. Eg: bEyu, nOyu etc
Class-3 - -(-tt-) Verbs ends with „aLu‟,‟uLu‟ Class-28 - - (-nd-) Verbs ends with 'A'(aru)
Eg : aLu, uLu Eg: taru(tA),baru(bA) etc.
Class-4 - - (-Mt-) Verbs ends with 'illu' Class-29 - - (-nd-) Verbs ends with 'ollu', 'ellu', 'allu'
Eg : nillu Eg: kollu,mellu ,sallu etc.
Class-5 - - (-t-) Verbs ending with „I‟ and „e‟ Class-30 - - (nD-) Verb stems ending with 'ANu'
Eg: „kali‟, „bali‟, „mere‟, „koLe‟ Eg: kANu
etc. Class-31 - - (nD-) Verb ends with 'oLLu'
Class-6 - - (-t-) Verbs ends with 'ULu' Eg: koLLu
Eg: Example: hULu Class-32 - - (-T-) Verb ends with „aDu',‟eDu‟,
Class-7 - - (-t-) Verbs ends with „Olu‟, „Ulu‟, „Elu‟ ‟oDu‟,‟iDu‟,‟uDu‟
Eg: „jOlu‟,‟sOlu‟,‟nUlu‟,‟hElu‟ Eg: aDu, keDu, koDu, naDu, iDu,
etc. uDu, toDu, paDu, haDu etc
Class-8 - - (-d-) Verbs ending with Class-33 - - (-k-) Verb ends with 'ggu'and ''gu'
'Ayu','Oyu','Eyu','Iyu' Eg: oggu, miggu(migu), hoggu,
Eg:‟ kAyu‟, „kOyu‟ sigu, nagu, etc
,‟tEyu‟,‟sIyu‟,‟hAyu' etc. Class-34 - - (-d-) Verbs ends with 'kAyu'
Class-9 - - (-d-) Verbs ending with 'A gu','O gu' Eg: : kAyu, dArikAyu
Eg: „hOgu‟, „A gu‟ etc Class-35 Verbs ends with 'kAyu'
- - (nD-)
Class-10 Verbs ends with 'are' Eg: baggiko, bEDiko etc
- - (-d-)
Eg: „bare‟
Class-11 - - (-d-) verbs ending with 'ge' and 'gi' 4.2 Information required to build MAG
Eg: „age‟, „agi‟ The following information‟s are required to build a morphological
Class-12 Verbs ending with 'yyu' analyzer and generator:
- - (-d-)
Eg: „koyyu‟, „geyyu‟, „hoyyu‟,
„bayyu‟, „suyyu‟ etc. 4.2.1 Lexicon
Class-13 Verbs ends with 'nnu' The list of stems and affixes together with basic information‟s
- - (-d-) about them (Noun stem or Verb stem etc,).
Eg:‟ annu‟, „tinnu‟, „ennu‟ etc
Class-14 - - (-d-) Verbs ending with 'Eyu' 4.2.2 Morphotactic
Eg: „gEyu‟, „nEyu‟ etc The model of morpheme ordering that explains which classes of
morphemes can follow other classes of morphemes inside a word.
E.g., the rule that Kannada plural morpheme follows the noun
Class-15 - - (-d-) Verbs ending with 'Ayu'
stem rather than preceding it.
Eg: „Ayu‟
Class-16 - -(-dd-) Verbs ends with 'iru' 4.2.3 Orthographic rules
Eg: „iru‟ These are spelling rules used to model the changes that occur in a
Class-17 - -(-dd-) Verbs ends with 'kaLu' word, usually when two morphemes combine. For example, insert
Eg: kaLu a “yu” on the surface tape just when the lexical tape has a
Class-18 - -(-dd-) Verbs ends with 'ILu','ELu' morpheme ending in „e‟ (or i, etc) and the next morphemes are
Eg: „bILu‟ ,‟ELu‟, etc “tt”(PRES) and “Ane”(3SM ).
49
beLe + insert“yu” + PRES(tt) + 3SM (anu) ->beLe-yu-tt-Ane

=beLeyuttAne Noun Lexicon Verb Lexicon
4.3 Creation of rules using FST

The proposed rule based MAG tool was developed using AT &T
Finite State M achine (FST). A finite state transducer essentially is
a finite state automaton that works on two (or more) tapes. The
most common way to think about transducers is as a kind of
“translating machine” which works by reading from one tape and Noun Noun Verb Verb
writing onto the other. For example, on one tape we read M orph Orthog M orph Orthog
“ ”, on the other we write “ +N +PL”, or the otactic raphic otactic raphic
other way around as shown in figure 1. : means read a Rules Rules
“ symbol on one tape and write the same “ on the other

tape. Similarly “+N:ε” means read a “+N” symbol on one tape and
write nothing on the other.
+PL:
: :: +N: ε
:
MAG for MAG for

Nouns Verbs
Fig 1. FST working principle

Kali
FST‟s can be used for both analysis and generation (they are +V
bidirectional) and it act as two level morphology as shown in +PRE Kannada M AG
figure 2 [23]. Represent a word as a correspondence between a S+ KaliyuttA
M odel
lexical level and surface level. At lexical level represents a simple 3SF Le
concatenation of morphemes making up a word. But at the Output
surface level represents the actual spelling of the final word.
Input
+N +PL
Lexical Level
Fig 3. Architecture of proposed MAG model.

Surface Level
Here “beLe” is the root word, “V” indicates the category of the
root word as verb, “PRES and FUT” indicates the tense markers
Fig 2. FST as Two-level morphology for presentence and future tense respectively and 3SM indicates
PNG marker for third singular masculine.
4.4 Architecture of Proposed MAG Model

With all relevant morphological feature information of Kannada
words we have created well defined sandhi rules based on finite
state transducer. The architecture of proposed a MAG tool is as
shown in figure 3.
The system is based on lexicon and orthographic rules from a two
level morphological system. For the M orphological generator, if
the string which has the root word and its morphemic information
is accepted by the automaton, then it generates the corresponding
root word and morpheme units in the first level as shown in figure
4. Fig 4. Example for M orphotactics Rule
50
The output of the first level becomes the input of the second level
where the orthographic (sandhi) rules are handled as shown in
Figure5. If it gets accepted then it generates the inflected word.
Fig 8. GUI of Kannada M orph generator for Noun

Fig 5. Application of Sandhi Rule
The sandhi rule should be written in such a way that, if the root
word ends with “e” and the next morphemes are “tt”(PRES) or
“Ane”(3SM ), then insert “yu” immediately after the root word.
Figure 6 below shows the corresponding sandhi rule.
Fig 9. GUI of Kannada M orph analyzer for Verb
Fig 6. Example for Sandhi Rule
4.5 GUI of Proposed MAG Model

Sample screenshots of the proposed MAG model for noun are
shown in figures 7 and 8. Similarly figures 9 and 10 shows the
screenshots of the proposed MAG model for verb.
Fig 10. GUI of Kannada M orph generator for Verb
5. SYSTEM PERFORMANCE AND

CONCLUSION
Development of M AG is a challenging task for all types of word
forms. The proposed M AG is capable of analyzing and generating
Fig 7. GUI of Kannada M orph analyzer for Noun a list of twenty thousand nouns, around three thousand verbs and a
relatively smaller list of adjectives. The uniqueness of the
proposed MAG is its capacity to generate and analyze transitive,
causative and tense forms apart from the passive constructions,
auxiliaries and verbal nouns. The performance of the proposed
system can be substantially improved by adding more rules such
as rules for complex morphology etc. Also by checking against
more and more different types of word lexicons, the accuracy of
51
the proposed M AG can be improved. A rule based machine [10] Sajib Dasgupta, „M orphological Analysis of Inflecting
translation system for English to Kannada language was Compound Words in bangla‟, BRAC University, Dhaka,
developed using the proposed MAG. Bangladesh.
6. ACKNOWLEDGMENTS [11] S Settar, Sanjoy Goswami, H.K Abhishek, „Indexing

Software for Ancient Kannada Books‟ Proceeding LEC '02
We acknowledge our sincere gratitude to Dr. Rajendran S
Proceedings of the Language Engineering Conference
(Tamil University, Tanjavur, Tamil Nadu, India) and Prof. (LEC'02).
M Shankaranarayana Bhat (Head of Kannada department and
Principal, Junior College, Sampaje, Coorg, Karnataka, India) for [12] Sahoo k and Vidyasagar K V, „Kannada Wordnet- A lexical
their excellent support to generate Kannada word paradigms. We database‟, TENHON 2003, Conference on convergent
also express our gratitude to M r. Harsha (Research Scholar, CEN, Technologies for Asia-Pacific Region.
AM RITA Vishwa Vidyapeetham, Coimbatore, India) and Ms.
Dhanya (CEN, AM RITA Vishwa Vidyapeetham, Coimbatore, [13] T.N. Vikram & Shalini R Urs, (2007), „Development of
India) for their valuable support and encouragement for Prototype M orphological Analyzer for the South Indian
developing the rule based Kannada M AG tool. Language of Kannada‟, Lecture Notes In Computer Science:
Proceedings of the 10th international conference on Asian
7. REFERENCES digital libraries: looking back 10 years and forging new
[1] Dhanalakshmi V., Anand Kumar M ., Rekha R.U., Arun frontiers. Vol. 4822/2007, 109-116.
Kumar C., Soman K.P., Rajendran S.:‟ M orphological [14] Antony P J, M . Anand Kumar and K.P. Soman, „Paradigm
Analyzer for Agglutinative Languages Using M achine based M orphological Analyzer for Kannada Language Using
Learning Approaches‟, Advances in Recent Technologies in M achine Learning Approach‟, International journal on
Communication and Computing, International Conference on Advances in Computational Sciences and Technology ISSN
Advances in Recent Technologies in Communication and 0973-6107 Volume 3 Number 4 (2010) pp. 457–481.
Computing, 2009.
[15] Language in India, www.languageinindia.com/may2011/
[2] Uma Parameshwari Rao G, Parameshwari K: CALTS, v11i5may2011.pdf.
University of Hyderabad, „On the description of
morphological data for morphological analyzers and [16] 202.41.85.68/knm-publications/morph-icosal2.pdf.
generators: A case of Telugu, Tamil and Kannada‟. [17] Shambhavi. B. R, Dr. Ramakanth Kumar P, Srividya K,
[3] Harri Jappinen, „Knowledge engineering approach to Jyothi B J, Spoorti Kundargi, Varsha Shastri G, „Kannada
morphological analysis‟, first conference on European M orphological Analyser and Generator Using Trie‟,
chapter of the Association for Computational Linguistics. International Journal of Computer Science and Network
Security, VOL.11 No.1, January 2011.
[4] Beesley, K. and L. Karttunen. „Finite State M orphology‟.
Stanford, CA: CSLI Publications, 2003. [18] http//ccat.sas.upenn.edu/plc/kannada/grammar/KannadaChap
.2.pdf.
[5] Aduriz I, Agirre E., „A word-grammar based morphological
analyzer for agglutinative languages‟, University of the [19] http//ccst.sas.upenn.edu/plc/kannada/grammar/KannadaChap
Basque Country. .3.pdf.
[6] Karine M egerdoomian „Finite-State M orphological Analysis [20] B.A Sharada, Transformation of Natural Language into
of Persian‘, Inxight Software, Inc, University of California, Indexing Language: Kannada - A Case Study.
San Diego. [21] Dr. K. Kushalappa Gouda: Kannada Sankshipta vyakarana,
[7] Shuly Winter, „Hebrew Computational Linguistics: Past and Kannada University, Hampi, Publication: Suvarna
Future‟, Artificial Intelligence Review 21: 113–138, 2004, Karnataka, 2006.
Kluwer Academic Publishers.
[22] S.N. Sridar: “KANNADA”, a Kannada grammar book,
[8] nlp.au-kbc.org/ma_language_format_final.pdf Series Editor, Bernard Comrie.
[9] Shailly Goyal, „Parsing Aligned Parallel Corpus by [23] Dr. A.G. M enon, S. Saravanan, R. Loganathan and Dr. K.
Projecting Syntactic Relations from Annotated Source Soman, Amrita University, Coimbatore, India. „Amrita
Corpus‟, Proceedings of the COLING/ACL 2006 M ain M orph Analyzer and Generator for Tamil: A Rule Based
Conference Poster Sessions, pages 301–308, Sydney, July Approach‟.
2006. Association for Computational Linguistics.
52

A Rule Based Kannada Morphological Analyzer and Generator Using Finite State Transducer

Uploaded by

Copyright:

Available Formats

A Rule Based Kannada Morphological Analyzer and Generator Using Finite State Transducer

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Rule Based Kannada Morphological Analyzer and Generator Using Finite State Transducer

Uploaded by

Copyright:

Available Formats

International Journal of Computer Applications (0975 – 8887)

Volume 27– No.10, August 2011

A Rule based Kannada Morphological Analyzer and

ABSTRACT historically and linguistically rich, the development in natural

3.2 Noun Cases and Characte ristics suffixes Genitive

beLe + insert“yu” + PRES(tt) + 3SM (anu) ->beLe-yu-tt-Ane

4.3 Creation of rules using FST

“ symbol on one tape and write the same “ on the other

MAG for MAG for

Fig 1. FST working principle

Fig 3. Architecture of proposed MAG model.

4.4 Architecture of Proposed MAG Model

Fig 8. GUI of Kannada M orph generator for Noun

Fig 9. GUI of Kannada M orph analyzer for Verb

Fig 6. Example for Sandhi Rule

4.5 GUI of Proposed MAG Model

Fig 10. GUI of Kannada M orph generator for Verb

5. SYSTEM PERFORMANCE AND

6. ACKNOWLEDGMENTS [11] S Settar, Sanjoy Goswami, H.K Abhishek, „Indexing

You might also like