0% found this document useful (0 votes)
24 views10 pages

Grammatical Relations of Myanmar

Uploaded by

zaw khaing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

Grammatical Relations of Myanmar

Uploaded by

zaw khaing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IJCSI International Journal of Computer Science Issues, Vol.

8, Issue 5, No 2, September 2011


ISSN (Online): 1694-0814
www.IJCSI.org 90

Grammatical Relations of Myanmar Sentences Augmented by


Transformation-Based Learning of Function Tagging
Win Win Thant1, Tin Myat Htwe2 and Ni Lar Thein3
1
University of Computer Studies
Yangon, Myanmar

2
Computer Software Department, University of Computer Studies
Yangon, Myanmar

3
University of Computer Studies
Yangon, Myanmar

sentence below, the function tags are appended at the end


Abstract of each word with ‘#’. For example:
In this paper we describe function tagging using Transformation သူ#PSubj သည္#SubjP ေက်ာင္း#PPla သိ# ု႔ PlaP သြားသည္#Verb
Based Learning (TBL) for Myanmar that is a method of
extensions to the previous statistics-based function tagger.
Grammatical relations are the process of analyzing an
Contextual and lexical rules (developed using TBL) were critical
in achieving good results. First, we describe a method for input sequence in order to determine its grammatical
expressing lexical relations in function tagging that statistical structure with respect to a given grammar. They show the
function tagging are currently unable to express. Function sentence structure of Myanmar language by using function
tagging is the preprocessing step to show grammatical relations tags of the words in a sentence. We describe a context free
of the sentences. Then we use the context free grammar grammar (CFG) based grammatical relations for Myanmar
technique to clarify the grammatical relations in Myanmar sentences. In the simple sentence below, the grammatical
sentences or to output the parse trees. The grammatical relations relations are appended at the end of each phrase with ‘#’.
are the functional structure of a language. They rely very much For example:
on the function tag of the tokens. We augment the grammatical
သူသည္#Subj ေက်ာင္းသိ# ု႔ Pla သြားသည္#Verb
relations of Myanmar sentences with transformation-based
learning of function tagging. In the complex sentence below, the grammatical relations
Keywords: Function Tagging, Grammatical Relations, are appended at the end of each phrase with ‘#’.
Transformation Based Learning, Context Free Grammar, Parse For example:
Tree. မိုးရြာ#Verb ေသာေၾကာင္# ့ CCS ကၽြန္မ#Subj ေစ်းသို႔#Pla
မသြားပါ#Verb

1. Introduction Function tagging and grammatical relations are the


important steps in Myanmar to English machine translation.
Function tagging is the process of marking up each word in
Statistical natural language processing (NLP) research in
a text with a corresponding function tag like Subj, Obj,
Myanmar language can only be given a push by the
Tim, Pla etc. based both on its definition, as well as its
creation of annotated corpus for Myanmar language. In
context [1]. It has been developed using the statistical
Myanmar language, the availability of the functional
implementations, linguistic rules and sometimes both.
annotated tagged corpus is very less and so most of the
Identifying the function tags in a given text is an important
techniques suffer due to data sparseness problem. We
aspect of any Natural Language Application. We apply
present a method that extends a pre-existing function
TBL for function tagging by extending the Naïve Bayesian
tagger. Grammatical relations are augmented with
based function tagging that is proposed in [2]. The number
transformation-based learning of function tagging.
of function tags in a tagger may vary depending on the
information one wants to capture. In the
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 91

2. Related Work 3.1 Grammatical Hierarchy in Myanmar

We [2] proposed 39 function tags for Myanmar Language The grammatical hierarchy is a useful notion of
and addressed the question of assigning function tags to successively included levels of grammatical construction
Myanmar words and used a small functional annotated operating within and between grammatical levels of
tagged corpus as the training data. In the task of function analysis [5]. This hierarchy is generally assumed in this
tagging, we used the output of morphological analyzer study as a heuristic principle for the purposes of laying a
which tagged the function of Myanmar sentences with foundational understanding of Burmese grammatical units
correct segmentation, POS (part-of-speech) tagging and and constructions. This hierarchy is a compositional
chunking information. We used Naïve Bayesian statistics hierarchy in which lower levels typically are filler units for
to disambiguate the possible function tags of each word in the next higher level in the hierarchy (Longacre 1970, Pike
the sentence. We evaluated the performance of function and Pike 1982). Table 1 shows the hierarchy from the
tagging for simple and complex sentences. We concluded lowest level to the highest.
our remarks on tagging accuracy by giving examples of
some of the most frequent errors. We showed some Table 1: Grammatical Hierarchy
examples of common error types. Text
Paragraph
Yong-uk Park and Hyuk-chul Kwon [3] tried to Sentence
disambiguate for syntactic analysis system by many Clause
dependency rules and segmentation. Segmentation is made Phrase
during parsing. If two adjacent morphemes had no Word
syntactic relations, their syntactic analyzer made new Morpheme
segment between these two morphemes, and found out all
possible partial parse trees of that segmentation and 3.2 Sentences of Myanmar Language
combined them into complete parse trees. Also they used
adjacent-rule and adverb subcategorization to There are two kinds of sentences according to the syntactic
disambiguate of syntactic analysis. Their syntactic analyzer structure of Myanmar language [6][7]. They are simple
system used morphemes for the basic unit of parsing. They sentence (SS) and complex sentence (CS). Fig 1 shows the
made all possible partial parse trees on each segmentation syntactic structure of Myanmar language.
process, and tried to combine them into complete parse
trees.

Mark-Jan Nederhof and Giorgio Satta[4] considered the


problem of parsing non-recursive context-free grammars,
i.e., context-free grammars that generateed finite
languages and presented two tabular algorithms for these
grammars. They presented their parsing algorithm, based
on the CYK (Cocke–Younger–Kasami) algorithm and
Earley’s alogrithm. As parsing CFG (context-free
grammar), they have taken a small hand-written grammar
of about 100 rules. They have ordered the input grammars
by size, according to the number of nonterminals (or the
number of nodes in the forest, following the terminology
by Langkilde (2000)).
Fig 1: Syntactic Structure

3. Myanmar Language
3.2.1 Simple Sentence
The Myanmar language, Burmese, belongs to the Tibeto-
Myanmar language group of the Sino-Tibetan family. It is It contains only one clause. There are two basic phrases
also morphologically rich and agglutinative language. such as subject phrase and verb phrase in a simple sentence.
Myanmar words are postpositionally inflected with various For example:
grammatical features. သူ (Subject phrase) အိပ္ေနသည္(Verb phrase)
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 92

However, a simple sentence can be constructed by only A complex sentence consists of two or more independent
one phrase. This phrase may be verb phrase or noun phrase. clauses (or simple sentences) joined by postpositions,
For example: particles or conjunctions. There are at least two verbs or
စားပါ (Verb phrase) more than two verbs in a complex sentence.
(သူဘယ္သူလဲ) မမ (Noun phrase)
There are two kinds of clause in a complex sentence called
Besides, a simple sentence can be constructed by two or independent clause(IC) and dependent clause (DC). DC is
three phrases. in front of IC. A complex sentence contains one
For example: independent clause and at least one dependent clause. DC
သြား (Object phrase) တိုက္ (Verb phrase) is the same as IC but it must contain a clause marker (CM)
ရန္ကုန္ တြင္ (Place phrase) ေနသည္ (Verb phrase)
in the end. A clause maker may be postpositions, particles
or conjunctions [8][9]. There are three dependent clauses
depending on the clause marker.
Myanmar phrases can be written in any order as long as the
verb phrase is at the end of the sentence.
(1)Noun DC (joined by postpositions such as မွာ၊က၊ကိ)ု
For example:
မမ ေစ်းသို႔ သြားသည္ ကို ကၽြန္မ ျမင္သည္။
ဦးဘသည္ မႏၱေလးမွ ျပန္လာသည္။ (Subject, Place, Verb)
မႏၱေလးမွ ဦးဘသည္ ျပန္လာသည္။ (Place, Subject, Verb)
I see that Ma Ma goes to the market.
Noun DC : မမ ေစ်းသို႔ သြားသည္ ကို
A simple sentence can be extended by placing many other IC : ကၽြန္မ ျမင္သည္။
phrases between subject phrase and verb phrase. All of the
following are simple sentences, because each contains only (2)Adjective DC (joined by particles such as ေသာ ၊ သည္ ့၊
one clause. It can be quite long. မည့္)
For example: မမ ေပးေသာ စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
ဦးဘသည္ ျပန္လာသည္။ I read the book that is given by Ma Ma.
U Ba comes back. Adjective DC :မမ ေပးေသာ (စာအုပ)္
ဦးဘသည္ မႏၱေလးမွ ျပန္လာသည္။ IC :စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
U Ba comes back from Mandalay.
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ ျပန္လာသည္။ (3)Adverb DC (joined by conjunctions such as ေသာေၾကာင့့္ ၊
U Ba comes back from Mandalay to Yangon. လ်က္ ၊ သျဖင့္)
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္ ျပန္လာသည္။
မိုးရြာေန ေသာေၾကာင့္ ကၽြန္မေစ်းသို႔ မသြားပါ။
U Ba comes back from Mandalay to Yangon by train. I do not go to the market because it is raining.
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္ မနက္က ျပန္လာသည္။
Adverb DC : မိုးရြာေန ေသာေၾကာင့္
U Ba comes back from Mandalay to Yangon by train in the IC : ကၽြန္မေစ်းသို႔ မသြားပါ။
morning.
ဦးဘသည္ ေမာင္ေမာင္ႏွင့္အတူ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္
မနက္က ျပန္လာသည္။ 4. Corpus Creation
U Ba comes back from Mandalay to Yangon by train in the
morning with Mg Mg. Our corpus is to be built manually. We extended the
functional annotated tagged corpus that is proposed in
It is also constructed by adding noun phrases such as [2].We added sentences from newspapers and historical
subject phrase, object phrase, time phrase and verb phrase. books of Myanmar to the existing corpus. The corpus
These added noun phrases are called emphatic phrases. consists of approximately 5000 sentences with average
For example: word length 15 and it is not a balanced corpus that is a bit
ပါေမာကၡ ဦးဘသည္ သား ေမာင္ေမာင္ႏွင့္အတူ အထက္ မႏၱေလးမွ biased on Myanmar textbooks of middle school. The
ၿမိဳ႕ေတာ္ ရန္ကုန္သို႔ အျမန္ မီးရထားျဖင့္ မေန႔ နံနက္က ေခ်ာေခ်ာေမာေမာ corpus size is bigger and bigger because the tested
ျပန္လာသည္။
sentences are automatically added to the corpus. Myanmar
textbooks and historical books are text collections, as
Professor U Ba and his son Mg Mg came back safely
shown in Table 2. In our corpus, a sentence contains
from upper Mandalay to capital Yangon by express train
chunk, function tag, Myanmar word and its POS tag with
in yesterday morning.
category. Fig 2 shows the example corpus sentence.
3.2.2 Complex Sentence
Table 2: Corpus Statistics
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 93

the type of transformations that are learned given these


Text types # of sentences triggering environments. The first transformation specifies
Myanmar textbooks of middle school 1200 that Cau should be retagged as PCau when the next tag is
"CauP ". The first four transformations are triggered by tags
Myanmar grammar books 700
and the last three transformations are triggered by words,
Myanmar websites 900
as shown in Table 3.
Myanmar newspapers 750
Myanmar historical books 1150 Table 3: Examples of some transformations learned in transformation-
based tagging
Others 300
Total 5000 Source Target Triggering environment
tag tag
Cau PCau the next tag is CauP
VC@Active[မိုး႐ြာ/v.common] #CC@CCS[လွ်င္/cc.sent] # NC@Subj
PObj PPla the second tag is CCC and the fourth
[ကေလး/n.person,မ်ား/part.number] # NC@PPla[လမ္း/n.location] # tag is PlaP
PPC@PlaP[ေပၚတြင္/ppm.place] # NC@Obj[ေဘာလံုး/n.objects] # Obj Subj the second tag is CCC and the fourth
VC@Active[ကန္ၾက/verb.common]# tag is Active
SFC@Null[သည္/sf.declarative]။ Obj Subj the second tag is CCC and the fourth
tag is CCC and the fifth tag is Active
Subj PcomplS the lexical item of its next word is
Fig 2: A sentence in the corpus "ျဖစ္သည္"
Obj PcomplS the lexical item of its next word is
"နက္သည္"
Pla PcomplS the lexical item of its next word is
5. Function Tagging by Transformation "ရွိသည္"

Based Learning
Transformation-based learning starts with a supervised 6. Error Analysis for Function Tagging
training corpus that specifies the correct values for some
Transformation rules produced by TBL are then used to
linguistic feature of interest, a baseline heuristics for
change the incorrect tags produced by the Naive Bayesian's
predicting the values for that feature, and a set of rule
method. Interestingly it gave an increase of 0.7% for
templates that determine a space of possible features in the
Myanmar initially the accuracy decreased. This is due to
neighborhood surrounding a word, and their action is to
the agglutinative nature of Myanmar and the lack of
change the system’s current guess as to the feature for the
postpositional marker (PPM) in the sentences. There are
word. The lexical and the contextual rules are generated
about 1200 sentences in the test data for function tagging.
from the training corpus [10].
Error analysis for function tagging is shown in Table 4.
We are not concerned with finding the correct attachment Table 4: Error Analysis for function tagging
of prepositional phrases. We have stressed at several
points that the Naive Bayesian assumptions are crude for Actual Tags Assigned Tags Counts
many properties of natural language syntax. We describe a
PcomplS Subj 133
method for expressing lexical relations in function tagging
PcomplS Obj 108
that statistical function tagging [2] are currently unable to PcomplS Pla 52
express. One of the strengths of this method is that it can PcomplS Tim 24
exploit a wider range of lexical and syntactic regularities. PSubj Subj 28
In particular, tags can be conditioned on words and on PObj Obj 37
more contexts. Transformation-based tagging encodes PTim Tim 23
complex interdependencies between words and tags by PPla Pla 18
selecting and sequencing transformations that transform an Subj Obj 54
initial imperfect tagging into one with fewer errors [11].
The training of a transformation-based tagger requires an
order of magnitude fewer decisions than estimating the 7. Grammatical Relations
large number of parameters of a Naïve Bayesian model.
A transformation consists of two parts, a triggering Grammatical functions (or grammatical relations) refer to
environment and a rewrite rule. Table 3 shows examples of syntactic relationships between participants in a
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 94

postposition. Examples are subject, object, time, and place


and object complement. We use the context-free grammar
(CFG) for grammatical relations of Myanmar sentences.
The grammatical relations of the sentences are represented
by parse tree. A parse tree is a tree that represents the
syntactic structure of a string according to some formal
grammar.

The LANGUAGE defined by a CFG is the set of strings


derivable from the start symbol S (for Sentence). The core
of a CFG grammar is a set of production rules that replaces
single variables with strings of variables and symbols. The
grammar generates all strings that, starting with a special
start variable, can be obtained by applying the production
rules until no variables remain. A CFG is usually thought
in two ways: a device for generating sentences, or a device
if assigning a structure to a given sentence [12]. We use Fig 4: A parse tree for simple sentence
CFG for grammatical relations of function tags.
A CFG is a 4-tuple <N,Σ,P,S> consisting of
• A set of non-terminal symbols N 7.2 Complex Sentences
• A set of terminal symbols Σ
• A set of productions P 7.2.1 Complex Sentence joined with postpositions
– A-> α
– A is a non-terminal Consider a complex sentence that is joined with
– α is a string of symbols from the infinite postposition (ကို), “ကေလးမ်ား သစ္ပင္ေအာက္တြင္ ကစားေနသည္
set of strings (ΣU N)* ကို ကၽြန္ေတာ္ ျမင္သည္” (I see that children are playing
• A designated start symbol S under the tree). This sentence is described as a
sequence of function-tags
S → SS| CS as“Subj[ကေလးမ်ား]#PPla[သစ္ပင္] # PlaP[ေအာက္တြင္]#
SS → IC
Active[ကစားေနသည္]# CCP[ကို]# Subj[ကၽြန္ ေတာ္
CS → Subj? (Noun_DC| Adj_DC| Adv_DC) IC
]#Active[ျမင္သည္]”.
Noun_DC → IC CCP
Adj_DC → IC CCA
Adv_DC → IC CCS
IC → Subj Obj Pla Active | Subj Active
Subj → Subj | PSubj SubjP
Obj → Obj | PObj ObjP
Pla → Pla | PPla PlaP
Sim → PSim SimP
Com → PCom ComP

Fig 3: A context free grammar for Myanmar language

7.1 Simple Sentence


Consider a simple declarative sentence “သူသည္ စာအုပ္ကို
ဆရာ႔အား ေပးသည္” (He gives the book to the teacher). This
sentence is represented as a sequence of function-tags as
”PSubj[သူ]# SubjP[သည္]# PObj[စာအုပ]္ #ObjP[ကို] #PIobj[ဆရာ႔] Fig 5: A parse tree for complex sentence (Noun_DC) + (IC)
# IobjP[အား]#Active[ေပးသည္]”

7.2.2 Complex Sentence joined with particles


IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 95

Consider a complex sentence that is joined with particle The unrecognized grammatical relations occurs, are the
(ေသာ), “ကၽြန္ေတာ္ ဖတ္ေန ေသာ စာအုပ္ ကို အေဖ ဝယ္ခဲ့သည္္” (I am problem that were caused by the DC that are in the middle
reading the book that is bought by my father). This of IC and do not has a fixed format. DC may exist between
sentence is described as a sequence of function-tags as the subject phrase and verb phrase of IC. Consider a
“Subj[ကၽြန္ေတာ္]#Active[ဖတ္ေန]#CCA[ေသာ]#PObj[စာအုပ]္ #ObjP complex sentence “ေမာင္ဘ က ကၽြန္ေတာ္ စာက်က္ေနသည္ ဟု
[ကို]#Subj[အေဖ]#Active[ဝယ္ခဲ့သည္္]”. ေျပာသည္” (Mg Ba says that he is studying). This sentence is
described as a sequence of function-tags as
“PSubj[ေမာင္ဘ]#SubjP[က]#Subj[ကၽြန္ေတာ္]#
Active[စာက်က္ေနသည္] #CCP[ဟု ]#Active[ေျပာသည္]”.

Fig 6: A parse tree for complex sentence (Adj_DC) + (IC)

Fig 8: A parse tree for complex sentence Subj+ (Noun_DC) + (IC)


7.2.3 Complex Sentence joined with conjunctions
Consider a complex sentence that is joined with
conjunction (ေသာေၾကာင္)႔ , “ေမာင္ေမာင္ ၾကိဳးစား ေသာေၾကာင့္
8. Performance Evaluation
ဂုဏ္ထူး ရသည္” (Mg Mg gets the distinction because he
tried). This sentence is represented as a sequence of Evaluation is based on the performance evaluation by
function-tags as “Subj[ေမာင္ေမာင္]#Active[ၾကိဳးစား]#CCS comparing between the system’s outputs with the manual
[ေသာေၾကာင္]့ #Obj[ဂုဏ္ထူး]#Active[ရသည္]”. parse tree of the sentence. By using the way of assessing
the quality of grammatical relations is to assign scores to
the output sentences. That is affected by POS tagging and
function tagging errors. The evaluation steps describe the
evaluation methodology:
• Run the system on the selected test case
• Compare the original parse tree with the system’s
output
• Classify the criteria that arise from the
mismatches between the two grammatical
relations of the sentences or parse trees
• Assign a suitable score for each criterion. A range
of score between 0 and 3 determines the
correctness of the relations. While 0 indicates
absolutely incorrect grammatical relations and 3
indicates absolutely correct grammatical relations
• When a situation belongs to multiple problems
compute its score average
Fig 7: A parse tree for complex sentence (Adv_DC) + (IC) • Determine the correctness of the test case by
computing the percentage of the total scores
7.2.4 Complicated Complex Sentence
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 96

Table 5: Accuracy scoring criteria

No Criterion Score
1 if the output parse tree is completely wrong 0
format
2 if each Myanmar word can generate correct 1
function tag but the grammatical relations
are false
3 if each Myanmar word cannot generate 1.5
correct function tag but the grammatical
relations are true
4 if the output sentence is quite well in 2
function tagging and there are some errors
in grammatical relations
5 if the output parse tree is completely true 3
Fig 9: The result of the grammatical relations accuracy for each sentence
To the best of our knowledge, there has been no Myanmar- type
English machine translation before so that there is no
standard test set for evaluating Myanmar-English MT
system. The data set is derived from the Myanmar Table 7: The result for each sentence type from the score’s point of view
textbooks of middle school and Myanmar grammar books,
Ministry of Education. The data set consists of 65 Sentence Score Score Score Score Score
sentences for simple sentence, 54 sentences for complex Types 3 2 1.5 1 0
sentence joined with postpositions, 37 sentences for Simple 74.0% 8.5% 12.1% 5.4% 0.0%
complex sentence joined with particles, 44 sentences for Complex 67.9% 0.0% 29.4% 2.7% 0.0%
complex sentence joined with conjunctions and 29 (Noun_DC)
+(IC)
sentences for complicated complex sentence.
Complex 62.2% 6.3% 17.4% 14.1% 0.0%
(Adj_DC)
The system produces 94.36% score for simple sentences +(IC)
while 68.39% score for complicated complex sentences, as Complex 81.6% 8.2% 0.0% 10.2% 0.0%
shown in Table 6. (Adv_DC)
+( IC)
Table 6: The result of the score for each sentence type from data set Complicated 32.4% 1.8% 19.5% 46.3% 12%
Complex
No Sentence Types No. of Total Score Accuracy 63.5% 4.7% 15.6% 15.7% 2.4%
sentences Score (%)
1 Simple 65 184 94.36 Fig 10 to 14 shows the accuracy of grammatical relations
2 Complex 54 141 87.04 for simple and complex sentences. Fig 15 shows the total
(Noun_DC)+(IC) result of the grammatical relation accuracy from the score
3 Complex 37 96.5 86.94 point of view.
(Adj_DC) +(IC)
4 Complex 44 121 91.67
(Adv_DC) + (IC)
5 Complicated 29 59.5 68.39
Complex
Total 229 602 87.63

Fig 9 depicts the relation accuracy for each sentence type.


Table 7 shows detailed expression of the score for each
sentence type. It can be seen that the proposed system
generates 63.5% accuracy for all sentence types, as shown
in Table 7.

Fig 10: Accuracy for Simple Sentence


IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 97

Fig 11: Accuracy for Complex Sentence (Noun_DC) + IC Fig 14: Accuracy for Complicated Complex Sentence

Fig 12: Accuracy for Complex Sentence (Adj_DC) + IC Fig 15: Grammatical relation accuracy for all sentence types from the
score point of view

9. Conclusion
We demonstrated the use of TBL for function tagging for
Myanmar language. Using TBL method further improved
accuracy and produced correct function tags that could not
be produced by previous method. Once studied the results
and analyzed the mistakes, it must be said that a correct
identification of the function tag is crucial in order to
obtain a good analysis. If the function tagging fails in this
process, the error is dragged throughout the analysis and
the result is a badly parse tree. The more accuracy for
Fig 13: Accuracy for Complex Sentence (Adv_DC) + IC function tagging increase, the more convenient for
grammatical relations of simple sentences and complex
sentences of Myanmar language are.

From our experience we have noted that development in


natural language processing for Myanmar language is very
slow. The main reason for this includes non-availability of
large scale data resources and also due to the inherent
complexities of the language. The performance of the
proposed system can be improved by incorporating more
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 98

syntactical information by increasing more and more Table 9: Chunk


sentence types and well-formed large corpus.
Chunk Type Example

Appendix Noun Chunk NC[ေခြး/n.animal]

Postpositional Chunk PPC[ကို/ppm.obj]

Table 8: Function Tagset Adjectival Chunk AC[လွ/adj.dem]

Tag Description Example Adverbial Chunk RC[ေပ်ာ္ရႊင္စြာ/adv.manner]


Active Verb စားသည္
Conjunctional Chunk CC[ႏွင့္/cc.chunk]
Subj Subject သူ
PSubj Subject သူ Verb Chunk VC[ျဖစ္/v.common]
SubjP PPM of Subject သည္
Sentence Final Chunk SFC[၏/sf.declarative]
Obj Object ေကာ္ဖီ
PObj Object ေကာ္ဖီ
ObjP PPM of Object ကို Table 10: POS tags
PIobj Indirect Object မလွ
IobjP PPM of Indirect Object အား Description POS Tag Name
Pla Place ရန္ကုန္ Noun n
PPla Place ရန္ကုန္
PlaP PPM of Place သို႔ Pronoun pron
Tim Time မနက္ Postpositional Marker ppm
PTim Time မနက္
TimP PPM of Time တြင္ Adjective adj
PExt Extract ေက်ာင္းသားမ်ား Adverb adv
ExtP PPM of Extract အနက္
PSim Similie မင္းသမီး Conjunction cc
SimP PPM of Similie ကဲ့သို႔ Particle part
PCom Compare သူ႔ဦးေလး
ComP PPM of Compare ႏွင့္အတူ Verb v
POwn Own သူ Sentence Final sf
OwnP PPM of Own ၏
Ada Adjective လွ Table 11: Categories
PcomplS Subject Complement သူသည္ဆရာျဖစ္ သည္
PcomplO Object Complement ေ႐ႊကိုလက္စြပ္လုပ္သည္ Category Example
PPcomplO Object Complement ထြန္းထြန္း
PcomplOP PPM of Object ဟု Noun Categories n.animal, n.food, n.body,
Complement
n.person, n.group, n.time,
n.common, n.building,
PUse Use တုတ္
n.location, n.objects,
UseP PPM of Use ျဖင့္ n.congnition,
PCau Cause မိုး Pronoun Categories pron.person, pron.distplace,
CauP PPM of Cause ေၾကာင့္ pron.disttime, pron.possessive
PAim Aim အေမ႔ Postpositional Categories ppm.subj, ppm.obj, ppm.time,
AimP PPM of Aim အတြက္ ppm.cause, ppm.use, ppm.sim,
ppm.aim, ppm.compare,
CCS Join with conjunctions လွ်င္
ppm.accept, ppm.place,
CCM Join the meanings ထို႔ေၾကာင့္ ppm.extract,
CCC Join the words ႏွင့္ Adjectival Categories adj.dem, adj.distobj
CCP Join with postpositions ကို
CCA Join with particles မည့္
Adverbial Categories adv.manner, adv.state

Conjunctional Categories cc.sent, cc.mean, cc.chunk,


cc.part, cc.adj
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 99

Particle Categories part.type, part.eg, part.number [12] E. Charniak, “Statistical parsing with a context-free
grammar and word statistics”. In Proceedings of the
Verb Categories v.common, v.compound Fourteenth National Conference on Artificial
Intelligence, pages 598-603, Menlo Park, 1997.
Sentence Final sf.declarative, sf.question,
[13] P. H. Myint, “Assigning automatically Part-of-Speech tags
Categories sf.negative,
to build tagged corpus for Myanmar language”, The Fifth
Conference on Parallel Soft Computing, Yangon, Myanmar,
2010.
Acknowledgments [14] P. H. Myint, “Chunk Tagged Corpus Creation for Myanmar
Language”. In Proceedings of the ninth International
We would like to thank Ministry of Science and Conference on Computer Applications, Yangon, Myanmar,
Technology, Department of Myanmar, Department of 2011.
English and the Republic of the Union of Myanmar, for
promoting a project on Myanmar to English Machine
Translation System, where this part of the work was Win Win Thant is a Ph.D research student. She received B.C.Sc
carried out. Large part of this work was carried out at (Bachelor of Computer Science) degree in 2004, B.C.Sc (Hons.)
degree in 2005 and M.C.Sc (Master of Computer Science) degree
University of Computer Studies, Yangon and our thanks go in 2007. She is now an Assistant Lecturer of U.C.S.Y (University
to all members of the project for their encouragement and of Computer Studies, Yangon). She has written one local paper for
support. Parallel and Soft Computing (PSC) conference in 2010, one
international paper for International Conference on Computer
Applications (ICCA) conference in 2011 and one journal paper for
International Journal of Computer Applications (IJCA) in July
References 2011. Her research interests include Natural Language
Processing and Machine Translation.
[1] D. Blaheta, and M. Johnson,” Assigning function tags to
parsed text”. In Proceedings of the 1st Annual Meeting of Tin Myat Htwe is an Associate Professor of U.C.S.Y. She
the North American Chapter of the Association for obtained Ph.D degree of Information Technlogy from University of
Computer Studies, Yangon. Her research interests include Natural
Computational Linguistics, 234–240, 2000. Language Processing, Data Mining and Artificial Intelligence. She
[2] W. W. Thant, T. M. Htwe, and N. L. Thein, “Function has published papers in International conferences and
Tagging for Myanmar Language”, Inernational Journal of International Journals.
Computer Applications, Vol. 26, No. 2, July, 2011
[3] Y. Park and H. Kwon, “Korean Syntactic Analysis using Ni Lar Thein is a Rector of U.C.S.Y. She obtained B.Sc. (Chem.),
Dependency Rules and Segmentation “, Proceedings of the B.Sc. (Hons) and M.Sc. (Computer Science) from Yangon
Seventh International Conference on Advanced Language University and Ph.D. (Computer Engg.) from Nanyang
Technological University, Singapore in 2003. Her research
Processing and Web Information Technology(ALPIT2008), interests include Software Engineering, Artificial Intelligence and
Vol.7, pp.59-63, China, July 23-25, 2008 Natural Language Processing. She has published papers in
[4] M. Nederhof and G. Satta, “Parsing Non-Recursive International conferences and International Journals.
Context-Free Grammars”. In Proceedings of the 40th
Annual Meeting of the Association for Computational
Linguistics (ACL ANNUAL'02), July 7-12, Pages 112-119,
Philadelphia, Pennsylvania, USA, 2002.
[5] J. Okell, A Reference Grammar of Colloquial Burmese,
London: Oxford University Press, 1969.
[6] Myanmar Thudda, vol. 1 to 5 in Bur-Myan, Text-book
Committee, Basic Edu., Min. of Edu., Myanmar, ca. 1986.
[7] S. P. Soe, Aspects of Myanmar Language, Myanmar
Department, University of Foreign Language, 2010.
[8] K. Lay, Construction of Myanmar Thudda. Ph.D.
Dissertation, Myanmar Department, University of Educaion,
2003.
[9] P. M, Tin, Some Features of the Burmese Language.
Myanmar Book Centre & Book Promotion & Service Ltd,
Bangkok, Thailand, 1954.
[10] E. Brill, and P. Resnik, "A transformation-based approach
to prepositional phrase attachment disambiguation." In
Proceedings, Fifteenth International Conference on
Computational Linguistics (COLING-1994), Kyoto, Japan.
[11] E. Brill, “Transformation-based error driven learning and
natural language processing: A case study in part-of-speech
tagging”. Computational Linguistics, 1995.

You might also like