Grammatical Relations of Myanmar
Grammatical Relations of Myanmar
2
Computer Software Department, University of Computer Studies
Yangon, Myanmar
3
University of Computer Studies
Yangon, Myanmar
We [2] proposed 39 function tags for Myanmar Language The grammatical hierarchy is a useful notion of
and addressed the question of assigning function tags to successively included levels of grammatical construction
Myanmar words and used a small functional annotated operating within and between grammatical levels of
tagged corpus as the training data. In the task of function analysis [5]. This hierarchy is generally assumed in this
tagging, we used the output of morphological analyzer study as a heuristic principle for the purposes of laying a
which tagged the function of Myanmar sentences with foundational understanding of Burmese grammatical units
correct segmentation, POS (part-of-speech) tagging and and constructions. This hierarchy is a compositional
chunking information. We used Naïve Bayesian statistics hierarchy in which lower levels typically are filler units for
to disambiguate the possible function tags of each word in the next higher level in the hierarchy (Longacre 1970, Pike
the sentence. We evaluated the performance of function and Pike 1982). Table 1 shows the hierarchy from the
tagging for simple and complex sentences. We concluded lowest level to the highest.
our remarks on tagging accuracy by giving examples of
some of the most frequent errors. We showed some Table 1: Grammatical Hierarchy
examples of common error types. Text
Paragraph
Yong-uk Park and Hyuk-chul Kwon [3] tried to Sentence
disambiguate for syntactic analysis system by many Clause
dependency rules and segmentation. Segmentation is made Phrase
during parsing. If two adjacent morphemes had no Word
syntactic relations, their syntactic analyzer made new Morpheme
segment between these two morphemes, and found out all
possible partial parse trees of that segmentation and 3.2 Sentences of Myanmar Language
combined them into complete parse trees. Also they used
adjacent-rule and adverb subcategorization to There are two kinds of sentences according to the syntactic
disambiguate of syntactic analysis. Their syntactic analyzer structure of Myanmar language [6][7]. They are simple
system used morphemes for the basic unit of parsing. They sentence (SS) and complex sentence (CS). Fig 1 shows the
made all possible partial parse trees on each segmentation syntactic structure of Myanmar language.
process, and tried to combine them into complete parse
trees.
3. Myanmar Language
3.2.1 Simple Sentence
The Myanmar language, Burmese, belongs to the Tibeto-
Myanmar language group of the Sino-Tibetan family. It is It contains only one clause. There are two basic phrases
also morphologically rich and agglutinative language. such as subject phrase and verb phrase in a simple sentence.
Myanmar words are postpositionally inflected with various For example:
grammatical features. သူ (Subject phrase) အိပ္ေနသည္(Verb phrase)
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 92
However, a simple sentence can be constructed by only A complex sentence consists of two or more independent
one phrase. This phrase may be verb phrase or noun phrase. clauses (or simple sentences) joined by postpositions,
For example: particles or conjunctions. There are at least two verbs or
စားပါ (Verb phrase) more than two verbs in a complex sentence.
(သူဘယ္သူလဲ) မမ (Noun phrase)
There are two kinds of clause in a complex sentence called
Besides, a simple sentence can be constructed by two or independent clause(IC) and dependent clause (DC). DC is
three phrases. in front of IC. A complex sentence contains one
For example: independent clause and at least one dependent clause. DC
သြား (Object phrase) တိုက္ (Verb phrase) is the same as IC but it must contain a clause marker (CM)
ရန္ကုန္ တြင္ (Place phrase) ေနသည္ (Verb phrase)
in the end. A clause maker may be postpositions, particles
or conjunctions [8][9]. There are three dependent clauses
depending on the clause marker.
Myanmar phrases can be written in any order as long as the
verb phrase is at the end of the sentence.
(1)Noun DC (joined by postpositions such as မွာ၊က၊ကိ)ု
For example:
မမ ေစ်းသို႔ သြားသည္ ကို ကၽြန္မ ျမင္သည္။
ဦးဘသည္ မႏၱေလးမွ ျပန္လာသည္။ (Subject, Place, Verb)
မႏၱေလးမွ ဦးဘသည္ ျပန္လာသည္။ (Place, Subject, Verb)
I see that Ma Ma goes to the market.
Noun DC : မမ ေစ်းသို႔ သြားသည္ ကို
A simple sentence can be extended by placing many other IC : ကၽြန္မ ျမင္သည္။
phrases between subject phrase and verb phrase. All of the
following are simple sentences, because each contains only (2)Adjective DC (joined by particles such as ေသာ ၊ သည္ ့၊
one clause. It can be quite long. မည့္)
For example: မမ ေပးေသာ စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
ဦးဘသည္ ျပန္လာသည္။ I read the book that is given by Ma Ma.
U Ba comes back. Adjective DC :မမ ေပးေသာ (စာအုပ)္
ဦးဘသည္ မႏၱေလးမွ ျပန္လာသည္။ IC :စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
U Ba comes back from Mandalay.
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ ျပန္လာသည္။ (3)Adverb DC (joined by conjunctions such as ေသာေၾကာင့့္ ၊
U Ba comes back from Mandalay to Yangon. လ်က္ ၊ သျဖင့္)
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္ ျပန္လာသည္။
မိုးရြာေန ေသာေၾကာင့္ ကၽြန္မေစ်းသို႔ မသြားပါ။
U Ba comes back from Mandalay to Yangon by train. I do not go to the market because it is raining.
ဦးဘသည္ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္ မနက္က ျပန္လာသည္။
Adverb DC : မိုးရြာေန ေသာေၾကာင့္
U Ba comes back from Mandalay to Yangon by train in the IC : ကၽြန္မေစ်းသို႔ မသြားပါ။
morning.
ဦးဘသည္ ေမာင္ေမာင္ႏွင့္အတူ မႏၱေလးမွ ရန္ကုန္သို႔ မီးရထားျဖင့္
မနက္က ျပန္လာသည္။ 4. Corpus Creation
U Ba comes back from Mandalay to Yangon by train in the
morning with Mg Mg. Our corpus is to be built manually. We extended the
functional annotated tagged corpus that is proposed in
It is also constructed by adding noun phrases such as [2].We added sentences from newspapers and historical
subject phrase, object phrase, time phrase and verb phrase. books of Myanmar to the existing corpus. The corpus
These added noun phrases are called emphatic phrases. consists of approximately 5000 sentences with average
For example: word length 15 and it is not a balanced corpus that is a bit
ပါေမာကၡ ဦးဘသည္ သား ေမာင္ေမာင္ႏွင့္အတူ အထက္ မႏၱေလးမွ biased on Myanmar textbooks of middle school. The
ၿမိဳ႕ေတာ္ ရန္ကုန္သို႔ အျမန္ မီးရထားျဖင့္ မေန႔ နံနက္က ေခ်ာေခ်ာေမာေမာ corpus size is bigger and bigger because the tested
ျပန္လာသည္။
sentences are automatically added to the corpus. Myanmar
textbooks and historical books are text collections, as
Professor U Ba and his son Mg Mg came back safely
shown in Table 2. In our corpus, a sentence contains
from upper Mandalay to capital Yangon by express train
chunk, function tag, Myanmar word and its POS tag with
in yesterday morning.
category. Fig 2 shows the example corpus sentence.
3.2.2 Complex Sentence
Table 2: Corpus Statistics
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 93
Based Learning
Transformation-based learning starts with a supervised 6. Error Analysis for Function Tagging
training corpus that specifies the correct values for some
Transformation rules produced by TBL are then used to
linguistic feature of interest, a baseline heuristics for
change the incorrect tags produced by the Naive Bayesian's
predicting the values for that feature, and a set of rule
method. Interestingly it gave an increase of 0.7% for
templates that determine a space of possible features in the
Myanmar initially the accuracy decreased. This is due to
neighborhood surrounding a word, and their action is to
the agglutinative nature of Myanmar and the lack of
change the system’s current guess as to the feature for the
postpositional marker (PPM) in the sentences. There are
word. The lexical and the contextual rules are generated
about 1200 sentences in the test data for function tagging.
from the training corpus [10].
Error analysis for function tagging is shown in Table 4.
We are not concerned with finding the correct attachment Table 4: Error Analysis for function tagging
of prepositional phrases. We have stressed at several
points that the Naive Bayesian assumptions are crude for Actual Tags Assigned Tags Counts
many properties of natural language syntax. We describe a
PcomplS Subj 133
method for expressing lexical relations in function tagging
PcomplS Obj 108
that statistical function tagging [2] are currently unable to PcomplS Pla 52
express. One of the strengths of this method is that it can PcomplS Tim 24
exploit a wider range of lexical and syntactic regularities. PSubj Subj 28
In particular, tags can be conditioned on words and on PObj Obj 37
more contexts. Transformation-based tagging encodes PTim Tim 23
complex interdependencies between words and tags by PPla Pla 18
selecting and sequencing transformations that transform an Subj Obj 54
initial imperfect tagging into one with fewer errors [11].
The training of a transformation-based tagger requires an
order of magnitude fewer decisions than estimating the 7. Grammatical Relations
large number of parameters of a Naïve Bayesian model.
A transformation consists of two parts, a triggering Grammatical functions (or grammatical relations) refer to
environment and a rewrite rule. Table 3 shows examples of syntactic relationships between participants in a
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011
ISSN (Online): 1694-0814
www.IJCSI.org 94
Consider a complex sentence that is joined with particle The unrecognized grammatical relations occurs, are the
(ေသာ), “ကၽြန္ေတာ္ ဖတ္ေန ေသာ စာအုပ္ ကို အေဖ ဝယ္ခဲ့သည္္” (I am problem that were caused by the DC that are in the middle
reading the book that is bought by my father). This of IC and do not has a fixed format. DC may exist between
sentence is described as a sequence of function-tags as the subject phrase and verb phrase of IC. Consider a
“Subj[ကၽြန္ေတာ္]#Active[ဖတ္ေန]#CCA[ေသာ]#PObj[စာအုပ]္ #ObjP complex sentence “ေမာင္ဘ က ကၽြန္ေတာ္ စာက်က္ေနသည္ ဟု
[ကို]#Subj[အေဖ]#Active[ဝယ္ခဲ့သည္္]”. ေျပာသည္” (Mg Ba says that he is studying). This sentence is
described as a sequence of function-tags as
“PSubj[ေမာင္ဘ]#SubjP[က]#Subj[ကၽြန္ေတာ္]#
Active[စာက်က္ေနသည္] #CCP[ဟု ]#Active[ေျပာသည္]”.
No Criterion Score
1 if the output parse tree is completely wrong 0
format
2 if each Myanmar word can generate correct 1
function tag but the grammatical relations
are false
3 if each Myanmar word cannot generate 1.5
correct function tag but the grammatical
relations are true
4 if the output sentence is quite well in 2
function tagging and there are some errors
in grammatical relations
5 if the output parse tree is completely true 3
Fig 9: The result of the grammatical relations accuracy for each sentence
To the best of our knowledge, there has been no Myanmar- type
English machine translation before so that there is no
standard test set for evaluating Myanmar-English MT
system. The data set is derived from the Myanmar Table 7: The result for each sentence type from the score’s point of view
textbooks of middle school and Myanmar grammar books,
Ministry of Education. The data set consists of 65 Sentence Score Score Score Score Score
sentences for simple sentence, 54 sentences for complex Types 3 2 1.5 1 0
sentence joined with postpositions, 37 sentences for Simple 74.0% 8.5% 12.1% 5.4% 0.0%
complex sentence joined with particles, 44 sentences for Complex 67.9% 0.0% 29.4% 2.7% 0.0%
complex sentence joined with conjunctions and 29 (Noun_DC)
+(IC)
sentences for complicated complex sentence.
Complex 62.2% 6.3% 17.4% 14.1% 0.0%
(Adj_DC)
The system produces 94.36% score for simple sentences +(IC)
while 68.39% score for complicated complex sentences, as Complex 81.6% 8.2% 0.0% 10.2% 0.0%
shown in Table 6. (Adv_DC)
+( IC)
Table 6: The result of the score for each sentence type from data set Complicated 32.4% 1.8% 19.5% 46.3% 12%
Complex
No Sentence Types No. of Total Score Accuracy 63.5% 4.7% 15.6% 15.7% 2.4%
sentences Score (%)
1 Simple 65 184 94.36 Fig 10 to 14 shows the accuracy of grammatical relations
2 Complex 54 141 87.04 for simple and complex sentences. Fig 15 shows the total
(Noun_DC)+(IC) result of the grammatical relation accuracy from the score
3 Complex 37 96.5 86.94 point of view.
(Adj_DC) +(IC)
4 Complex 44 121 91.67
(Adv_DC) + (IC)
5 Complicated 29 59.5 68.39
Complex
Total 229 602 87.63
Fig 11: Accuracy for Complex Sentence (Noun_DC) + IC Fig 14: Accuracy for Complicated Complex Sentence
Fig 12: Accuracy for Complex Sentence (Adj_DC) + IC Fig 15: Grammatical relation accuracy for all sentence types from the
score point of view
9. Conclusion
We demonstrated the use of TBL for function tagging for
Myanmar language. Using TBL method further improved
accuracy and produced correct function tags that could not
be produced by previous method. Once studied the results
and analyzed the mistakes, it must be said that a correct
identification of the function tag is crucial in order to
obtain a good analysis. If the function tagging fails in this
process, the error is dragged throughout the analysis and
the result is a badly parse tree. The more accuracy for
Fig 13: Accuracy for Complex Sentence (Adv_DC) + IC function tagging increase, the more convenient for
grammatical relations of simple sentences and complex
sentences of Myanmar language are.
Particle Categories part.type, part.eg, part.number [12] E. Charniak, “Statistical parsing with a context-free
grammar and word statistics”. In Proceedings of the
Verb Categories v.common, v.compound Fourteenth National Conference on Artificial
Intelligence, pages 598-603, Menlo Park, 1997.
Sentence Final sf.declarative, sf.question,
[13] P. H. Myint, “Assigning automatically Part-of-Speech tags
Categories sf.negative,
to build tagged corpus for Myanmar language”, The Fifth
Conference on Parallel Soft Computing, Yangon, Myanmar,
2010.
Acknowledgments [14] P. H. Myint, “Chunk Tagged Corpus Creation for Myanmar
Language”. In Proceedings of the ninth International
We would like to thank Ministry of Science and Conference on Computer Applications, Yangon, Myanmar,
Technology, Department of Myanmar, Department of 2011.
English and the Republic of the Union of Myanmar, for
promoting a project on Myanmar to English Machine
Translation System, where this part of the work was Win Win Thant is a Ph.D research student. She received B.C.Sc
carried out. Large part of this work was carried out at (Bachelor of Computer Science) degree in 2004, B.C.Sc (Hons.)
degree in 2005 and M.C.Sc (Master of Computer Science) degree
University of Computer Studies, Yangon and our thanks go in 2007. She is now an Assistant Lecturer of U.C.S.Y (University
to all members of the project for their encouragement and of Computer Studies, Yangon). She has written one local paper for
support. Parallel and Soft Computing (PSC) conference in 2010, one
international paper for International Conference on Computer
Applications (ICCA) conference in 2011 and one journal paper for
International Journal of Computer Applications (IJCA) in July
References 2011. Her research interests include Natural Language
Processing and Machine Translation.
[1] D. Blaheta, and M. Johnson,” Assigning function tags to
parsed text”. In Proceedings of the 1st Annual Meeting of Tin Myat Htwe is an Associate Professor of U.C.S.Y. She
the North American Chapter of the Association for obtained Ph.D degree of Information Technlogy from University of
Computer Studies, Yangon. Her research interests include Natural
Computational Linguistics, 234–240, 2000. Language Processing, Data Mining and Artificial Intelligence. She
[2] W. W. Thant, T. M. Htwe, and N. L. Thein, “Function has published papers in International conferences and
Tagging for Myanmar Language”, Inernational Journal of International Journals.
Computer Applications, Vol. 26, No. 2, July, 2011
[3] Y. Park and H. Kwon, “Korean Syntactic Analysis using Ni Lar Thein is a Rector of U.C.S.Y. She obtained B.Sc. (Chem.),
Dependency Rules and Segmentation “, Proceedings of the B.Sc. (Hons) and M.Sc. (Computer Science) from Yangon
Seventh International Conference on Advanced Language University and Ph.D. (Computer Engg.) from Nanyang
Technological University, Singapore in 2003. Her research
Processing and Web Information Technology(ALPIT2008), interests include Software Engineering, Artificial Intelligence and
Vol.7, pp.59-63, China, July 23-25, 2008 Natural Language Processing. She has published papers in
[4] M. Nederhof and G. Satta, “Parsing Non-Recursive International conferences and International Journals.
Context-Free Grammars”. In Proceedings of the 40th
Annual Meeting of the Association for Computational
Linguistics (ACL ANNUAL'02), July 7-12, Pages 112-119,
Philadelphia, Pennsylvania, USA, 2002.
[5] J. Okell, A Reference Grammar of Colloquial Burmese,
London: Oxford University Press, 1969.
[6] Myanmar Thudda, vol. 1 to 5 in Bur-Myan, Text-book
Committee, Basic Edu., Min. of Edu., Myanmar, ca. 1986.
[7] S. P. Soe, Aspects of Myanmar Language, Myanmar
Department, University of Foreign Language, 2010.
[8] K. Lay, Construction of Myanmar Thudda. Ph.D.
Dissertation, Myanmar Department, University of Educaion,
2003.
[9] P. M, Tin, Some Features of the Burmese Language.
Myanmar Book Centre & Book Promotion & Service Ltd,
Bangkok, Thailand, 1954.
[10] E. Brill, and P. Resnik, "A transformation-based approach
to prepositional phrase attachment disambiguation." In
Proceedings, Fifteenth International Conference on
Computational Linguistics (COLING-1994), Kyoto, Japan.
[11] E. Brill, “Transformation-based error driven learning and
natural language processing: A case study in part-of-speech
tagging”. Computational Linguistics, 1995.