Proceeding Total Pages 422 428 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Finite-state Approach based Myanmar Morphological Analysis

Tin Myo Latt, Aye Thida


University of Computer Studies, Mandalay
[email protected], [email protected]

Abstract as the smallest linguistic pieces with a grammatical


function. So computationally the MA of a word
Morphological analysis (MA) is needed in any constitutes taking a word form as input and
Natural Language Processing (NLP) Application. It producing the structure of the word by showing the
means taking a word as input and identifying the stem lexical category of the constituent morphemes.
and affix. MA provides information about a word’s This paper is organized as follows: Section 2
semantics and the syntactic role which plays in a discusses the MA used for initial analysis of the
sentence. This paper presents the development of a input text as well as previous work that has been
Myanmar morphological analysis; morphological done in the area for languages. Section 3 describes
processes prevalent in Myanmar language are the nature of Myanmar grammatical categories.
explored. We consider Myanmar MA for noun, verb, Section 4 states Myanmar morphology applying the
adjective and adverb. Myanmar morphology reveals FSA. Section 5 shows the building of Finite-state
the three types. They are inflectional morphology, Automaton for Words and section 6 contains final
derivational morphology and compounding. In this conclusions.
work, finite state automaton (FSA) is used to model
Myanmar morphology which contains a monolingual 2. Related work
lexicon. MA will be to apply as a portion of the
Grammar Checker for detecting grammatical errors in Ksh. Krishna B. Singha et.al [5] proposed a
Myanmar texts. The proposed framework of this paper constrained finite-state model to represent the
is to describe Myanmar morphological analysis. morphotactic rule of Manipuri adjective word
forms. There was no adjective word category in
Keywords: Morphology, finite-state automaton, Manipuri language. By rule this category was
Morphological Analysis derived from verb roots with the help of some
selected affixes applicable only to verb roots.
1. Introduction Finite-state machine was used to describe the
concatenation rules and corresponding
Natural Language Processing (NLP) is a set of nondeterministic and deterministic automaton were
computational techniques for analyzing and developed for ease of computerization. A root
representing text in Natural Language (NL) with lexicon of verb category words was used along with
linguistic analysis for achieving human-like language an affix dictionary in a database. The system was
processing for a range of tasks or applications. It deals capable to analyze and recognize a certain word as
with interactions between computer and human adjective by observing the morpheme concatenation
(natural) languages. Morphology is the field of the rule defined with the help of finite-state networks.
linguistics that studies the internal structure of the Soe Lai Phyue et. al [7] presented the
words. morphological processor (analyzer and generator),
MA allows us to reduce the size of the Morphocon, to support the inflectional verbal and
dictionary (lexicon), but we need a list of exception for colloquial cases for knowledge resourcesby using
every morphological rule we invent. MA refers to the the rule-and-feature based model of Myanmar
computational processes which provide structural inflectional morphology. By supporting with
information about surface words in a language. The Morphocon in Myanmar Language Resources, it
MA of a word is the investigation through the could reduce the time and storage consumption.
identification and study of morphemes, often defined The evaluation of the correctness of Morphocon
yields the satisfactory result because precision,
422
recall and f-measure are nearly and over 95% in both မ်က္ႏွာစံုညီစည္းေဝးပြဲ/mje`hna soun nji si wei: pwe: /
morphological analyzer and generator. (plenary Meeting).

3. Morphology 4. Finite State Morphological Parsing


Morphology is the study of the way words are A Myanmar word will divide into smaller
built up from smaller meaning-bearing units, subdivisions. For example, if a word is given
morphemes. Morphemes are either free or bound ္ (play) to the morphological parser it will
forms, with the free forms corresponding to word level generate the output + V and ္ +
units and the bound forms to a closed class of
PresentTense. is the root morpheme and ္
grammatical affixes. For example, the word (river)
is postposition of verb (PresentTense), are
consists of a single morpheme (the morpheme )
morphological features. These features specify the
while the word (cats) consists of two: the additional information about the stem. In order to
morpheme (cat) and the morpheme (-s). [2] build a morphological parser we need at least the
Morphemes are divided into two types, open following: (1) Lexicon (2) Morphotatics (3)
class and closed class. Open class items belong to Orthographic Rules. [2]
categories/types to which new members may be freely In this paper will express Lexicon and
added. Closed class items on the other hand belong to Morphotatics for Myanmar morphological analysis.
categories/types to which new members cannot be Although English language requires orthographic
added. rules such as consonant doubling rule, E insertion
There are many ways to combine morphemes to and E deletion rule etc…, there is no need in
create words. In this paper presents three of these Myanmar language.
methods which are common inflection, derivation and
compounding for Myanmar morphology. 4.1. Lexicon

3.1 Inflectional Morphology The list of stems and affixes, together with
basic information about them (whether a stem is a
Inflectional is the combination of a word stem Noun stem or a Verb stem, etc.). Every lexicon is of
with a grammatical morpheme, usually syntactic a certain class. The following example:
resulting in a word of the same class as the original Morpheme\1:
stem, and usually filling some syntactic function like /gaza (play)
agreement. Myanmar has a relatively simple Class: Verb_Stem or Root
inflectional system which contains noun, verb and Feature: Parts of Speech = Verb
adjective, not adverb. Morpheme2:
ျခင္း/chin (particle for noun phrase change)
3.2 Derivational Morphology
Class: Noun_ Suffix
Derivation is the combination of a word stem Feature: Parts of Speech = Particle
with a grammatical morpheme, usually resulting in a All the lexicons in a certain class are stored
word of a different class, often with a meaning hard to in a FSA. Myanmar morphological analysis will
predict exactly. For example, the verb can take the need the lexicon which contains the stems and
affixes.
derivational suffix ျခင္း to produce the noun ျခင္း.
It is not at all unusual for derivational affixes to 4.2. Morphotatics
change verbs into nouns or adjectives, adjectives into
nouns or verbs, that sort of thing. Derivational Myanmar morphology is rich and complex.
affixation can change category. Morphotactics represent the ordering restrictions in
place on the ordering of morphemes. Morphotactics
3.3 Compounding morphology can be concatenative, with morphemes either
prefixed or suffixed to stems. A basic morphotactic
Compounding is the combination of multiple
fact about affixes is where they attach with respect
words stems together. For example, ပဲျပဳတ္/pe: bjou`/ to the stem.
(boiled pea), ေန႔စဥ္မွတ္တမ္း/nei. zin hma` tan: / (diary),

423
Prefix + Stem + Suffix
Figure 1. An FSA for a fragment of noun
An affix is either a prefix or a suffix; Plural - inflectional morphology
is a suffix, အ- is a prefix. Myanmar
5.2. Inflection of Verb
morphological analysis applies the building of FSA.
Verbs have three tenses for Myanmar
5. Building of Finite-state Automaton for language. There are the present tense, followed by
Words the past tense, and future tense. A verbal
postposition used to express the same as present
The objective of the FSA can use Myanmar
and past tense of the verb is called a verbal
morphology to solve the problem of determining
postposition of present tense. It is သည္/thi/ (word
whether an input string makes up a legitimate
indicating the verb ending a sentence ), ၏/i/ (word
Myanmar word or not in the language. Given an input
string, an FSA will either accept or reject the input. An placed at the end of an affirmative sentence), ၿပီ/pji/
FSA can use set of symbol for its alphabet, including (word following a verb indicating that an action is
words. FSA using all possible affixes is built. taking place or has already taken place). The future
An FSA defines a language to be tense of the verb is called a verbal postposition of
 A set of strings over some alphabet ∑ future tense. It is မည္/ mji/ (shall or will),
 A set of states Q လိမ့္မည္/lein mji/ (shall or will), အံ့ /an/ (shall or
 A designated start state q0 (q0ϵQ) will), လတၱံ႔/latan/ (shall or will) [3].
 A set of accepting final statesqj (qj⊂ Q ) The word ခ/khe is particle suffixed of verbs
 Edges: given current state qi and input x ∈∑,
to emphasize definitiveness of action or condition.
gives new state qj
It is not expressed as suffixed the past tense in
MLC, 2006 [3, page-253]. So it can not contain in
5.1. Inflection of Noun
the building of the FSA in figure 2.
Myanmar nouns are regular nouns and irregular The FSA in Figure 2 shows the lexicon
nouns. They have three kinds of inflection and an affix includes verb stem plus three more suffix (present,
marks plural. Nouns in Myanmar are pluralized by past and future tense). Particle is between stem and
suffixing the particle ေတြ [twe] in colloquial Burmese suffix. It has five states. If state q0 is the start state
or မ်ား [myar] in formal Burmese. The particle တို႔ [tou] and input word is verb in lexicon then changes the
state q1, reading the next word is particle, change to
which indicates a group of people or things is also
the next state q2, continuous reading from the next
suffixed to the modified noun.
input word is tense; change to the state q3 is final
The numbered circles (nodes) represent states
state. Another path, the state q1 and input is word of
and the labeled arcs represent transitions from one state
tense that change to q3 is final state.
to another. Here the start state is the circle numbered
with q0. The double circle denotes the final (accepting)
state. The labels with each arc suggest that a transition
is possible only when the labeled string is matched
with the input text.
The FSA in Figure 1 assumes that the lexicon
includes regular nouns and irregular noun that take the
regular - plural. Figure 2. An FSA for a fragment of verb
inflectional morphology

5.3. Inflection of Adjective

Adjectives can be divided into three stages:


the normal stage (positive degree), the superior
stage (comparative degree) and the most superior
stage (superlative degree). The normal stage is the
base form of adjective that precedes the head
424
nominal and is marked with the particle .
The comparative degree is expressed in Myanmar by
“ ၍/ (more) or ၍/pou jwei./ (more)” while in
English the comparative degree is marked by adding
the prefix “more” before an adjective or by the suffix
“er” after an adjective. In Myanmar, the superlative
degree is formed by prefixing “အ/a./” and affixing
“ /hsoun:/ (most)” to the adjective. In Figure 3, the
start state is q0 and the final state is q5. Figure 5. An FSA for a fragment of verb
derivational morphology

5.6. Derivation of Adjective

In Myanmar, the use of adjective-forming


particles as ‘ /tho:/ (adjective), /thi./
(which, that) and /mji./ (really)’ together with
verbs is found in the way English verbal adjectives
Figure 3. An FSA for a fragment of adjective do.
(degree) inflectional morphology In Figure 6, the FSA will recognize the part
of speech of input word is the verbs; the starting
5.4. Derivation of Noun
state is q0, and then changes to state q1. And then
A noun formed by using a particle before or state q1, the next input is suffix then changes to the
after an attributive word is called an attributive noun. final state q2.
The particles used to join together with an attributive
word to form an attributive noun are “အ/a/, မႈ/mu./and
ျခင္း/chin:/”. For example, းမႈ/kaun: mu./ (good
deed), လွျခင္း/hla. kjin:/(beauty), ထူးျခားခ်က္/htu: kja: kje./ Figure 6. An FSA for a fragment of Myanmar
(being distinctive). adjective derivational morphology
In Figure 4, the start state q0, the input is the
adjective or verb then changes to state q1. And then 5.7. Derivation of Adverb
state q1, the next input is suffix then changes to the
final state q2. Adverbs make the sense of the sentence
more profound by combining word classes in terms
of structure apart from meaning. In most cases, the
term adverb is not a major one in the structure of
the sentence.
Adverbs in terms of structure are
reduplicated adverbs, affixed adverbs, rhyming
adverbs. Some adverbs can express in this paper,
not all of adverbs.
Figure 4. An FSA for a fragment of noun A particle-suffixing adverb in Myanmar is
derivational morphology an adverb formed by affixing the particle ‘စြာ/ swa/
(-ly)’ after a verb or an adjective.An adverb of
5.5 Derivation of Verb manner in Myanmar is a word used to modify a
verb expressing how someone behaves or
An adjective and a verbal postposition can be
something is done [1]. For example, ရုိေသစြာ /jou
combined to form a derivation of verb in Myanmar. In
thei swa/ (respectfully), လ်ွင္ျမန္စြာ/hlin mjan swa/
Figure 5 as shown the FSA will recognize the
adjectives followed by tense or particle. If the next (quickly), ခင္မင္္စြာ/khin min swa/ (affectionately),
input is particle then follow by tense. ခ်ိဳသာစြာ /chou tha swa/ (sweet and approving).

425
In Figure 7, the FSA will recognize the verbs or together accordingly without putting prepositions,
adjectives, start state is q0, and part of speech of input particles and conjunctions between them [1].
word is the verb or adjective then changes to state q1. The FSA makes a choice from the starting
And then state q1, the next input is suffix then changes state q0, going either to q1 and q2, which are the new
to the final state q2 (accept), otherwise reject. states corresponding to old state q0 and input noun
or verb. If the FSA selects to q1, part of speech of
input word is noun or adjective, the new state is q3
or q5 which are final state. It continues to operate in
this processing and there may be many choices. The
final states have one or more states which are state
q3, q4, q5, q6, q7, q8, q9, q10, q12.
Figure 7. An FSA for a fragment of adverbs
(particle-suffixed) derivational morphology

A particle-affixing adverb (mid and end) in


Myanmar is an adverb formed by affixing such
particles as ‘ခ်ည္္-ခ်ည္ /chi chi/, လိက ု ္ / lai`/’ after a
ု ္-လိက
verb or an adjective, or in the middle of them [1]. For
example, ဝင္ခ်ည္ထက ြ ္ခ်ည္-/win chi htwe` chi/ (coming in
and out alternately), ပူလိုက္ေအးလိုက္ /pu lai` ei: lai`/ (being
hot and cold alternately. Then particle-affixing such as
မိ-ရာ and some verbs such as ေတြး၊ ေငး၊ ထင္၊ ေျပာ combine
and the combinations are used as adverbs.
In Figure 8, the FSA will recognize the verbs, Figure 9. An FSA for a fragment of noun
adjectives, reduplication verb. The start state is q0; compounding morphology
input word is verb in lexicon then changes to state q1.
The start state is q0; input word is adjectives then Table 1. Myanmar compounding morphology
changes to state q2. The start state is q0; input word is for some nouns
adjectives then changes to state q2. The start state is q0;
input word is reduplication verb then changes to state
q3. And then state q1, the next input word is infix and
suffix then changes to the final state q4 (accept),
otherwise reject.

5.9. Compounding of Verb

Compound verbs of English are verbs


Figure 8. An FSA for a fragment of adverbs
affixed by a verbal postposition on the formation of
(particle-infixed and suffixed) derivational
two words. For example, ႏႈတ္ခြန္းဆက္သ/hnou` khun:
morphology
hse` tha. thi/ (greet).
The FSA can check part of speech of the
5.8. Compounding of Noun
input word whether accept or reject. If it corrects
the compound verb then it will be accepting.
Compound noun refers to a noun which joins a
noun, a verb, a pronoun, an adjective and an adverb

426
ခ်ိဳခ်ိဳသာသာ (sweetly). Three adjectives can also be
joined together to form an adverb ခ်စ္ခ်စ္ေတာက္
(blazingly, blisteringly, feverishly), စိုထိုင္းထိုင္း (be
humid, be damp).This ခ်စ္ခ်စ္ေတာက္ is divided into
two morphemes, ခ်စ္ခ်စ္ and ေတာက္. The morpheme
ခ်စ္(burnt) is adjective.[3]
In Figure 12, FSA starts in state q0, an input
Figure 10. An FSA for a fragment of verb of adjective of words will choose the state q1, q2, q3,
compounding morphology q4, q5 and an input of adjective or verb. If it has
selected one state then it moves to finite state q5.
Otherwise it will reach rejecting state.
5.10. Compounding of Adjective

A compound adjective in Myanmar is an


adjective, consisting of at least an adjective, formed by
the adjective and a noun or another adjective [1]. For
example, ရုပ္ေျဖာင့္/jou` hpjaun./ (handsome),
ေသးသြယ္/thei: dhwe/(slim), သတၱိေျပာင္/tha` ti. pjaun /
(bold).
In Figure 11, FSA starts in state q0, an input of
adjective of words will change to state q1, and an input Figure 12. An FSA for a fragment of adverbs
of adjective will choose either state q2 or q3. If it will (reduplicated) compounding morphology
select state q4 for the checking of two consecutive
adjectives of words and this state q4 is finial state 6. Conclusion
(accept). But it will select state q2 will continue another
states reaching to final state. This means that the input Morphological analysis is very important
words are valid for compound adjective. If it will not and basic applications of Natural Language
reach to state q4 (reject state) then the input words are Processing. Morphological analysis needs for
invalid. Myanmar language because Myanmar language is
morphologically rich and agglutinative language.
This paper describes the framework of
morphological analysis based on finite-state
automaton approach for Myanmar word class. It
reveals the framework but not implementation.
Most of this works are focus on analysis of noun,
verb, adjective and adverb.
Figure 11. An FSA for a fragment of adjective
compounding morphology
References

5.11. Compounding of Adverb [1] Aung Zin Minn,A Comparative Study of the Two
Grammatical Systems of Written English &
Compound adverbs of Myanmar are formed by Myanmar and Its Significance to Learning English
adding together_ noun and noun or noun and adjective as a foreign language, Department of English,
or verb and verb [1]. For example, ပိုက္စပိ ္တိုက္/pai` sei` University of Mandalay, Myanmar, May, 2009
[2] Daniel Jurafsky & James H. Martin, Speech and
tai`/ (searching closely), ေကာင္းေကာင္း /kaun: kaun:/ Language Processing: An introduction to natural
(well) language processing, computational linguistics, and
In Myanmar, adverbs are formed two adjectives speech recognition, Copyright c 2006, All rights
or verb can be joined together to form an adverb. Such reserved. Draft of June 25, 2007.
kind of adverb is called a double adverb having one
word ျမန္ျမန္ (quickly), ခင္ခင္မင္မင္ (with friendliness),
427
[3] Department of the Myanmar Language Commission, KnowledgeResource”, International Journal of
Myanmar-English Dictionary, Ministry of Education, Advanced Research in Computer Engineering &
Myanmar, 2006. Technology (IJARCET), Volume 1, Issue 7,
[4] http://en.wikipedia.org/wiki/ Burmese_ Language September 2012wledge Resource,
[5] Ksh. Krishna B. Singha et. al, “Morphotactics of [8] Thang Khan Dim et. al, A Contrastive Study of
Manipuri Adjectives: A Finite State Approach”, I.J. Adverbs of Manner in German and Myanmar, The
Information Technology and Computer Science, 2013, Government of The Republic of the Union of
09, 94-100 Myanmar Ministry of Education, Universities
[6] Paulette M. Hopple, The Structure of Nominalization in Research , Journal 2012, Vol. 5, No. 7
Burmese, SIL International 2011
[7] Soe Lai Phyue et. al, “Morphological Processor for
Inflectional Case of Multipurpose Lexico-Conceptual

428

You might also like