0% found this document useful (0 votes)

70 views29 pages

Overview of Myanmar-English Machine Translation System - NICT

The document discusses natural language processing (NLP) research being conducted at the NLP Lab at the University of Computer Studies, Yangon (UCSY) in Myanmar. It provides an overview of the lab's history and current research areas, which include machine translation, speech recognition, information retrieval and other tasks. It also describes collaboration with other institutions and the use of aligned language data to help address challenges in NLP for Myanmar.

Uploaded by

Lion Couple

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views29 pages

Overview of Myanmar-English Machine Translation System - NICT

Uploaded by

Lion Couple

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Myanmar NLP research and

Usefulness of ALT data

Dr. Khin Mar Soe

Professor
NLP Lab, UCSY
26-11-2015
Contents

 Introduction to UCSY
 Introduction to UCSY NLP Lab
 Current Myanmar NLP Research
 Usefulness of ALT Data
 Conclusion

2
http://www.ucsy.edu.mm 3
Natural Language Processing Lab in UCSY
 started in 2006 at University of Computer Studies, Yangon
(UCSY) under Ministry of Science and Technology.
 Some of the works of the NLP lab are available online:
◦ Network-based ASEAN Languages Translation Public Service
(http://www.aseanmt.org)
◦ English to Myanmar Statistical Machine Translation System
(http://www.nlpresearch-
ucsy.edu.mm/NLP_UCSY/mtapplication.html)
◦ Myanmar-English-Myanmar bilingual dictionary
(http://www.nlpresearch-
ucsy.edu.mm/NLP_UCSY/dictionaryapplication.html)
◦ Myanmar Word Segmentation
(http://www.nlpresearch-ucsy.edu.mm/NLP_UCSY/wsandpos.html)

4
Research Collaboration
 NECTEC (Thailand National Electronics and Computer
Technology Center)
 NICT (National Institute of Information and
Communication Technology)

 For the purpose of

◦ joint researches/projects,
◦ researcher exchange,
◦ publishing conference papers, journals and articles,
◦ doing joint NLP workshops.

5
NLP Lab

6
NLP Lab Members

7
NLP Research
Aim of Research
 to overcome language barrier
 to be applied conveniently in systems that are used by
Myanmar

 Domain of Research
◦ Myanmar-English-Myanmar Machine Translation
◦ Automatic Speech Recognition
◦ Text to Speech
◦ Myanmar Information Retrieval
◦ Myanmar Name Entity Recognition and Transliteration
◦ Myanmar Text Summarization
◦ Myanmar Text Categorization
8
Overview of the System

Alignment

Source Language
Analysis
Dictionary

Translation

Word Sense
Target Language
Disambiguation
Generation
Source Language Analysis
 For Myanmar-English translation phase, it is the process
of Myanmar Language Analyzer:
◦ Myanmar Part-of-Speech (POS) Tagging and
Chunking of Myanmar Language
◦ Syntactic Analysis
 Function Tagging and making Grammatical relation

• For English-Myanmar translation phase,

• English POS and Chunking
◦ Syntactic Analysis
 Function Tagging and making Grammatical relation
Myanmar POS Tagging and Chunking

Myanmar
Word Identification and Basic POS Tagging
Lexicon

POS Basic POS Tag Disambiguation

Tagged
Corpus

Normalization Normalization
Rules

Chunk Rules Chunking

Pre-tagged Corpus Format :
 Training Corpus

o Myanmar words are segmented and tagged with their respective

basic POS tags and categories as follows ::

 သူ/PRN.Person # ေက်ာင္း/NN.Building # သိ/ု႔ PPM.Direction #

သြား/VB.Common # သည္/SF.Declarative

 ေက်ာင္းသား/NN.Person # မ်ား/Part.Number # ထဲတင

ြ ္/PPM.Extract
# သူ/PRN.Person # အ/Part.Common # ေတာ္/JJ.Dem #
ဆုံး/Part.Common # ျဖစ္/VB.Common # သည္/SF.Declarative

 ဤ/PRN.Distobj # စာ/NN.Common # ကု/ိ PPM.Obj #

မည္သ/ူ PRN.Question # ေရး/VB.Common # ခဲ႔ /Part.Support #
သနည္း/SF.Interrogative
Example : Tagging
 Input Text
 သံလြင္ ျမစ္ သည္ ျမန္မာျပည္ ေတာင္ပုိင္း သုိ႔ ဦးတည္ စီးဆင္း သြား သည္။
(The river, Than Lwin, flows to south of Myanmar.)

• disambiguating all possible basic POS tags to

produce the correct tag.

• training Myanmar pre-tagged Corpus with HMMs

and LHMMs models.

• decoding using the Viterbi tagging algorithm to find

out the best probable path (best tag sequence) for a
given word sequence.
Example : Disambiguation
 Disambiguation and Assigning with Correct Tag on Each Word

 သံလြင္_#NNP.Location (Than Lwin)

 ျမစ္ _#NN.Location (The river)
 သည္ _#PPM.Subj (null)
 ျမန္မာျပည္ _#NNP.Location (Myanmar)
 ေတာင္ပုိင္း_#NN.Location (south)
 သု႔ိ _#PPM.Direction (to)
 ဦးတည္_#VB.Common (flows)
 စီးဆင္း _#VB.Common (flows)
 သြား_#Part.Support (flows)
 သည္ _#SF.Declarative (null)
Example : Normalization

• forming more meaningful words and annotating with

appropriate POS tags and categories.

 Before normalization,
"က်န္းမာ/VB.Common # ျခင္း/Part.Common # သည္ /PPM.Subj #
လာဘ္/NN.Common # တစ္/NN.Cardinal # ပါး/Part.Type #
ျဖစ္/VB.Common # သည္ /SF.Declarative"

 After normalization,
"က်န္းမာျခင္း/NN.VBConvert # သည္ / PPM.Subj # လာဘ္ / NN.Common #
တစ္ / NN.Cardinal # ပါး / Part.Type # ျဖစ္/ VB.Common # သည္ /
SF.Declarative "
Example : Chunking
• assemble the POS tagged words and identify chunk tag.

 Before chunking,
သူတုိ႔/NNR.Person # သည္/PPM.Subj # အတန္း/NN.Common #
ထဲတြင/္ PPM.Extract # အေတာ္ဆုံး/JJS.Common #
ေက်ာင္းသားမ်ား/NNR.Person# ျဖစ္/VB.Common # ၾက/Part.Support #
သည္/SF.Declarative

 After chunking,
NC [သူတုိ႔/NNR.Person] # PPC [သည္/PPM.Subj] # NC
[အတန္း/NN.Common] # PPC [ထဲတင
ြ /္ PPM.Extract] # NC
[အေတာ္ဆုံး/JJS.Common # ေက်ာင္းသားမ်ား/NNR.Person] # VC
[ျဖစ္/VB.Common # ၾက/Part.Support] # SFC [သည္/SF.Declarative]
Alignment
 Identifying word correspondence that are
translations of each other based on information
found on parallel text.

 Developing a Myanmar-English bilingual corpus:

◦ Dictionary lookup approach
◦ Corpus-based approach
Word Alignment Algorithm
Step 1: Accept pair of Myanmar and English sentences.
Step 2: Tag English sentence with Part-Of-speech (POS)
Tagger and it will produce tagged output also with
root word.
Step 3: Segment Myanmar sentence into words.
Removes the stop words.
Make morphological analysis of the noun and verb affixes
using trigram method.
Step 4: Align the output English and Myanmar words from
step 2 and 3 based on the first three IBM models and EM
algorithm using parallel corpus.
Step 5: Align the remaining words (i.e unaligned) using Myanmar-
English bilingual dictionary.
Example Alignment

သူ ေက်ာင္း သိ႕ု ေျခလ်င္ သြားသည္။

He goes to school on foot.

Problems in Alignment

 Scarce Resource
 No publicly available POS-tagged corpus for Myanmar and
English.
 The constructed POS-tagged corpus has a limited number in
size.

 Linguistic Problem
 Parallel sentence pairs might not be equal size.
 Myanmar and English word order could be significantly
different.
 Myanmar language is a morphologically rich and verb final
language. English is a verb-second language.

21
Translation

 Phrase/word Translation pairs Extraction

 Morphological Analysis
 Word Sense Disambiguation
Phrase/word Extraction
 For each phrase we identified by its start position, end
positions phrase length and target phrase to ensure that
there are no gaps and no overlap.
 Applying N-gram methods using Corpus,
Source Start End Phrase Target Translation
phrase position position Length phrase probability

ငွက္ 1 1 1 Bird 1.0

ငွက္မ်ား 1 2 2 Birds 1.0

ပ်ံ 4 4 1 Fly 1.0

ပ်ံၾကသည္ 4 6 3 Fly 1.0

 Translation
ငွက္မ်ား - birds
ပ်ံၾကသည္ - fly
Example : Morphological Analysis of verbs

• Myanmar unknown verb: ၾကည့္ခဲ့ပါသည္

• Main Verb: ၾကည့္
• Verb suffiex: ခဲ့ပါသည္
• Tense particle: ခဲ့
• Translation of main verb (using corpus): look
• Generation of surface word: ၾကည့္/look, ခဲ့/past
ပါသည္/null(suffix)

• ၾကည့္ခဲ့ပါသည္/looked
Word Sense Disambiguation for Myanmar
Language
 Purpose:
◦ to solve the ambiguity of Myanmar words for Myanmar-
English machine translation
Ambiguous Example
 Noun Examples
chopsticks

တူ nephew

hammer

 သူသည္တျူ ဖင့္ေခါက္ဆဲြစားသည္။ He eats the noodle with chopsticks.

 သူ႔မွာတူသံုးေယာက္ရွိသည္။ He has three nephews.
 လက္သမားသည္တူကိုသံုးသည္။ Carpenter uses the hammer.
WSD Algorithm for Myanmar Word
Step1:Preprocessing
-Segment input sentence
-Remove stop words from input sentence and create ambiguous vector
Step2:Multi-sense Look-up
-Retrieve all possible sense meanings of ambiguous word from corpus
-Collect training data concerning with these sense from corpus
Step3:Build context vectors for each sense based on collected
training data
-For all context vectors do
-Remove stop words
-Remove redundant words
-End For
Step4:Calculate the cosines between ambiguous vector and each
of the context vectors

where A represents each word in ambiguous vector

B represents each word in each context vector
Step5:Choose correct sense of the target word 27
s' = argmax score(si)
Conclusion
 The data sparseness is most important in many research
regarding NLP because of the followings:
◦ The rules only can not be solved for all problems for
many languages.
◦ So, the researches are coming based on the statistical
model.
◦ The more availability of data in developing the
system/tools, the more accuracy we can get.
 So, ALT data is very useful not only for Myanmar
language but also for all languages to be applied in various
kinds of NLP researches.
Thank you!

Thai Natural Language Processing: Chalermpol Tapsai Herwig Unger Phayung Meesad
No ratings yet
Thai Natural Language Processing: Chalermpol Tapsai Herwig Unger Phayung Meesad
194 pages
A Functional Reference Grammar of Cebuano PDF
50% (2)
A Functional Reference Grammar of Cebuano PDF
647 pages
English Grammar: (Simple, Practical yet Comprehensive) with Multiple Examples, Exercises and Key
From Everand
English Grammar: (Simple, Practical yet Comprehensive) with Multiple Examples, Exercises and Key
V P KANNAN
3/5 (17)
Webster's American English Dictionary (with pronunciation guides): With over 50,000 references (US English)
From Everand
Webster's American English Dictionary (with pronunciation guides): With over 50,000 references (US English)
Alice Grandison
5/5 (1)
Proto Indo European Trees
91% (11)
Proto Indo European Trees
200 pages
G. Penakova - English Morphology
100% (3)
G. Penakova - English Morphology
227 pages
F# High Performance
From Everand
F# High Performance
Eriawan Kusumawardhono
No ratings yet
1) Language Is A "System of Systems" Because It Consists of Interdependent
No ratings yet
1) Language Is A "System of Systems" Because It Consists of Interdependent
4 pages
Al-Ghazali As A Reformer PDF
No ratings yet
Al-Ghazali As A Reformer PDF
313 pages
LET English Major Language
No ratings yet
LET English Major Language
35 pages
Thu 2020
100% (1)
Thu 2020
6 pages
Lexeme Formation: The Familiar
100% (3)
Lexeme Formation: The Familiar
13 pages
Homework Helpers: English Language & Composition
From Everand
Homework Helpers: English Language & Composition
Maureen Lindner
5/5 (1)
Lecture 1 Lexicology As A Subject
No ratings yet
Lecture 1 Lexicology As A Subject
23 pages
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
Limbum – English Dictionary, English – Limbum Index and Grammar
From Everand
Limbum – English Dictionary, English – Limbum Index and Grammar
Francis Wepngong Ndi
No ratings yet
English Morphology
No ratings yet
English Morphology
17 pages
SAT Vocabulary Builder: Beyond the Basics
From Everand
SAT Vocabulary Builder: Beyond the Basics
William Brown
No ratings yet
Natural Language Processing with Java and LingPipe Cookbook
From Everand
Natural Language Processing with Java and LingPipe Cookbook
Krishna Dayanidhi
No ratings yet
Grammar and Linguistics: Core Concepts
From Everand
Grammar and Linguistics: Core Concepts
Saraswati Saini
No ratings yet
Structure of English
No ratings yet
Structure of English
17 pages
LEC 2b Suport Curs 5
No ratings yet
LEC 2b Suport Curs 5
5 pages
Grammar Sucks Guide to Mastering English
From Everand
Grammar Sucks Guide to Mastering English
Gary W. McCarty
No ratings yet
What Does Myanmar Mean
No ratings yet
What Does Myanmar Mean
7 pages
Non-finite clauses in English
From Everand
Non-finite clauses in English
Nadira Aljović
No ratings yet
Introduction To Linguistics Assignment 1
No ratings yet
Introduction To Linguistics Assignment 1
15 pages
The Effect of Instructional Reading Software on Developing English Reading Speed and Comprehension for It University Students
From Everand
The Effect of Instructional Reading Software on Developing English Reading Speed and Comprehension for It University Students
Sumar Ghizan PHD
No ratings yet
(Part 1) Absolute Beginner: Java 4 Selenium WebDriver: Come Learn How To Program For Automation Testing
From Everand
(Part 1) Absolute Beginner: Java 4 Selenium WebDriver: Come Learn How To Program For Automation Testing
Rex Jones II
No ratings yet
Natural Language Processing with Java
From Everand
Natural Language Processing with Java
Richard M Reese
No ratings yet
A Generative Lexicon Account of Bangla Complex Predicates
From Everand
A Generative Lexicon Account of Bangla Complex Predicates
Sanjukta Ghosh
No ratings yet
Understanding Chichewa Noun Classes and Morphological Structure
No ratings yet
Understanding Chichewa Noun Classes and Morphological Structure
8 pages
SSRN 4871732
No ratings yet
SSRN 4871732
11 pages
Spelling Lab 60 Crucial Spelling Lessons for Older Students with Over 3,000 Practice Words
From Everand
Spelling Lab 60 Crucial Spelling Lessons for Older Students with Over 3,000 Practice Words
Kayla Gassiott
No ratings yet
C# for Beginners: A Step-by-Step Tutorial to Learning C# Programming from Scratch
From Everand
C# for Beginners: A Step-by-Step Tutorial to Learning C# Programming from Scratch
Lena Neill
No ratings yet
Myanmar News Summarization Using Differe
No ratings yet
Myanmar News Summarization Using Differe
8 pages
A Fresh Look at Chota Valley Spanish: An Afro-Hispanic Dialect of Northern Ecuador
No ratings yet
A Fresh Look at Chota Valley Spanish: An Afro-Hispanic Dialect of Northern Ecuador
17 pages
Sid the Badger's Choice: Level One Activities for Targeted Revisualisation
From Everand
Sid the Badger's Choice: Level One Activities for Targeted Revisualisation
Dr Charles Potter
No ratings yet
Ntta 3
No ratings yet
Ntta 3
7 pages
(Constructional Approaches To Language) Alexander Bergs, Gabriele Diewald - Contexts and Constructions-John Benjamins Publishing Company (2009)
No ratings yet
(Constructional Approaches To Language) Alexander Bergs, Gabriele Diewald - Contexts and Constructions-John Benjamins Publishing Company (2009)
255 pages
A Study On A Joint Deep Learning Model F
No ratings yet
A Study On A Joint Deep Learning Model F
6 pages
Practice Makes Perfect Arabic Pronouns and Prepositions
From Everand
Practice Makes Perfect Arabic Pronouns and Prepositions
Otared Haidar
2.5/5 (2)
The New Practical Shorthand Manual - A Complete And Comprehensive Exposition Of Pitman Shorthand Adapted For Use In Schools, Colleges And For Home Instruction
From Everand
The New Practical Shorthand Manual - A Complete And Comprehensive Exposition Of Pitman Shorthand Adapted For Use In Schools, Colleges And For Home Instruction
Benn Pitman
5/5 (1)
Colonel Tortoise's Choice: Level Three Activities for Targeted Revisualisation
From Everand
Colonel Tortoise's Choice: Level Three Activities for Targeted Revisualisation
Dr Charles Potter
No ratings yet
20 Minute Phonemic Training for Dyslexia, Auditory Processing, and Spelling: A Complete Resource for Speech Pathologists, Intervention Specialists, and Reading Tutors
From Everand
20 Minute Phonemic Training for Dyslexia, Auditory Processing, and Spelling: A Complete Resource for Speech Pathologists, Intervention Specialists, and Reading Tutors
Vickie Dinsmore
5/5 (1)
Better Sentence Writing in 30 Minutes a Day
From Everand
Better Sentence Writing in 30 Minutes a Day
Diana Campbell
No ratings yet
Quantification, Definiteness, and Nominalization
No ratings yet
Quantification, Definiteness, and Nominalization
16 pages
Portuguese English Frequency Dictionary - Essential Vocabulary - 2.500 Most Used Words: Portuguese, #1
From Everand
Portuguese English Frequency Dictionary - Essential Vocabulary - 2.500 Most Used Words: Portuguese, #1
J.L. Laide
3.5/5 (4)
Burmese Language
No ratings yet
Burmese Language
47 pages
Grammatical Relations of Myanmar
No ratings yet
Grammatical Relations of Myanmar
10 pages
Assessing Linguistic Complexity
No ratings yet
Assessing Linguistic Complexity
46 pages
How To Write Pronunciation Activities
From Everand
How To Write Pronunciation Activities
Laura Patsko
No ratings yet
Phenomenal Intervention: The Playbook
From Everand
Phenomenal Intervention: The Playbook
Jeremiah Short
No ratings yet
English Grammar
From Everand
English Grammar
Manal Shedeed
No ratings yet
Master Portuguese Grammar Instantly: Tenses (Volume 2): Speak Portuguese with Confidence, #4
From Everand
Master Portuguese Grammar Instantly: Tenses (Volume 2): Speak Portuguese with Confidence, #4
Mohamed Elshenawy
No ratings yet
English Mantra: Spoken English, Elt Activities and Job Grooming
From Everand
English Mantra: Spoken English, Elt Activities and Job Grooming
Janardan Mishra
No ratings yet
A Guide to Cebuano - Level 3
From Everand
A Guide to Cebuano - Level 3
StreetWise
4.5/5 (2)
Friesen Dianne PHD 2022 PDF
No ratings yet
Friesen Dianne PHD 2022 PDF
401 pages
Python Text Processing with NLTK 2.0 Cookbook: LITE
From Everand
Python Text Processing with NLTK 2.0 Cookbook: LITE
Jacob Perkins
4/5 (1)
Morphology
No ratings yet
Morphology
11 pages
The Journal of Children Language
No ratings yet
The Journal of Children Language
262 pages
Myanmar Words Sorting
No ratings yet
Myanmar Words Sorting
11 pages
Eng210 Lecture Note Week 1
No ratings yet
Eng210 Lecture Note Week 1
1 page
ENG 509 MID & Final Term
No ratings yet
ENG 509 MID & Final Term
65 pages
TOEFL iBT Premium with 8 Practice Tests + Online Audio, Eighteenth Edition
From Everand
TOEFL iBT Premium with 8 Practice Tests + Online Audio, Eighteenth Edition
Barron's Educational Series
5/5 (2)
Abbreviations in English Medical Terminology and Their Adaptation To Croatian
No ratings yet
Abbreviations in English Medical Terminology and Their Adaptation To Croatian
35 pages
English Fluency in Six Months
From Everand
English Fluency in Six Months
Alex Carvalho
5/5 (1)
250 Essential Chinese Characters Volume 2: Revised Edition (HSK Level 2)
From Everand
250 Essential Chinese Characters Volume 2: Revised Edition (HSK Level 2)
Philip Yungkin Lee
1/5 (1)
Proposed Myanmar Word Tokenizer Based On LIPIDIPIKAR Treatise
No ratings yet
Proposed Myanmar Word Tokenizer Based On LIPIDIPIKAR Treatise
5 pages
Morpheme Based Myanmar Word Segmenter
No ratings yet
Morpheme Based Myanmar Word Segmenter
4 pages
Beyond Prepositions for ESL Learners: Mastering English Prepositions for Fluency
From Everand
Beyond Prepositions for ESL Learners: Mastering English Prepositions for Fluency
Thomas Celentano
4.5/5 (7)
Mnemonics for Study: Spanish edition: Study Skills
From Everand
Mnemonics for Study: Spanish edition: Study Skills
Fiona McPherson
No ratings yet
Grammaticalization Lehmann
No ratings yet
Grammaticalization Lehmann
13 pages
Tools For Analyzing Talk Part 3: Morphosyntactic Analysis: Brian Macwhinney Carnegie Mellon University
No ratings yet
Tools For Analyzing Talk Part 3: Morphosyntactic Analysis: Brian Macwhinney Carnegie Mellon University
88 pages
Contiguity Theory 1st Edition Norvin Richards Samuel Jay Keyser - The Newest Ebook Version Is Ready, Download Now To Explore
No ratings yet
Contiguity Theory 1st Edition Norvin Richards Samuel Jay Keyser - The Newest Ebook Version Is Ready, Download Now To Explore
67 pages
Proposal Revisi Bu Era Adnya
No ratings yet
Proposal Revisi Bu Era Adnya
89 pages
Morphology: Affixation: Unit 2
No ratings yet
Morphology: Affixation: Unit 2
23 pages
AP English Literature and Composition Premium, 2026: Prep Book with 8 Practice Tests + Online Practice
From Everand
AP English Literature and Composition Premium, 2026: Prep Book with 8 Practice Tests + Online Practice
Barron's Educational Series
No ratings yet
AP English Language and Composition Premium, 2026: Prep Book with 8 Practice Tests + Online Practice
From Everand
AP English Language and Composition Premium, 2026: Prep Book with 8 Practice Tests + Online Practice
Barron's Educational Series
No ratings yet
AP Computer Science A Premium, 12th Edition: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
From Everand
AP Computer Science A Premium, 12th Edition: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
Barron's Educational Series
No ratings yet
Term Paper Morphology
100% (1)
Term Paper Morphology
8 pages
Learn Khmer: Start Speaking Today. Absolute Beginner to Conversational Speaker Made Simple and Easy!
From Everand
Learn Khmer: Start Speaking Today. Absolute Beginner to Conversational Speaker Made Simple and Easy!
Languages World
No ratings yet
2000 Most Common Korean Words in Context: Get Fluent & Increase Your Korean Vocabulary with 2000 Korean Phrases
From Everand
2000 Most Common Korean Words in Context: Get Fluent & Increase Your Korean Vocabulary with 2000 Korean Phrases
Lingo Mastery
3/5 (5)
Mnemonics for Study: Italian edition: Study Skills
From Everand
Mnemonics for Study: Italian edition: Study Skills
Fiona McPherson
No ratings yet
Teach Reading with Orton-Gillingham: Early Reading Skills: A Companion Guide with Dictation Activities, Decodable Passages, and Other Supplemental Materials for Struggling Readers and Students with Dyslexia
From Everand
Teach Reading with Orton-Gillingham: Early Reading Skills: A Companion Guide with Dictation Activities, Decodable Passages, and Other Supplemental Materials for Struggling Readers and Students with Dyslexia
Kristina Smith
No ratings yet
English for the Real World: Essential Language Skills for Daily Life
From Everand
English for the Real World: Essential Language Skills for Daily Life
Ranjot Singh Chahal
No ratings yet
TOEFL iBT Writing (with online audio)
From Everand
TOEFL iBT Writing (with online audio)
Barron's Educational Series
No ratings yet
AP Chinese Language and Culture Premium, Fourth Edition: Prep Book with 2 Practice Tests + Comprehensive Review + Online Audio
From Everand
AP Chinese Language and Culture Premium, Fourth Edition: Prep Book with 2 Practice Tests + Comprehensive Review + Online Audio
Barron's Educational Series
No ratings yet
Exercises Ex.1. Consider The Following Words and Answer The Questions Below
No ratings yet
Exercises Ex.1. Consider The Following Words and Answer The Questions Below
4 pages
The Big Book of English Prepositions, Irregular Verbs, and English Articles for ESL and English Learners
From Everand
The Big Book of English Prepositions, Irregular Verbs, and English Articles for ESL and English Learners
Thomas Celentano
No ratings yet
The Big Book of English Prepositions, Irregular Verbs, and English Articles for ESL and English Learners: Focus on English Big Book Series
From Everand
The Big Book of English Prepositions, Irregular Verbs, and English Articles for ESL and English Learners: Focus on English Big Book Series
Thomas Celentano
No ratings yet

Overview of Myanmar-English Machine Translation System - NICT

Uploaded by

Overview of Myanmar-English Machine Translation System - NICT

Uploaded by

Myanmar NLP research and

Usefulness of ALT data

Dr. Khin Mar Soe

 For the purpose of

• For English-Myanmar translation phase,

POS Basic POS Tag Disambiguation

Chunk Rules Chunking

o Myanmar words are segmented and tagged with their respective

 သူ/PRN.Person # ေက်ာင္း/NN.Building # သိ/ု႔ PPM.Direction #

 ေက်ာင္းသား/NN.Person # မ်ား/Part.Number # ထဲတင

 ဤ/PRN.Distobj # စာ/NN.Common # ကု/ိ PPM.Obj #

 Tagging with All Possible Tags on Each Word

• disambiguating all possible basic POS tags to

• training Myanmar pre-tagged Corpus with HMMs

• decoding using the Viterbi tagging algorithm to find

 သံလြင္_#NNP.Location (Than Lwin)

• forming more meaningful words and annotating with

 Developing a Myanmar-English bilingual corpus:

သူ ေက်ာင္း သိ႕ု ေျခလ်င္ သြားသည္။

He goes to school on foot.

 Phrase/word Translation pairs Extraction

ငွက္ 1 1 1 Bird 1.0

ငွက္မ်ား 1 2 2 Birds 1.0

ပ်ံ 4 4 1 Fly 1.0

ပ်ံၾကသည္ 4 6 3 Fly 1.0

• Myanmar unknown verb: ၾကည့္ခဲ့ပါသည္

 သူသည္တျူ ဖင့္ေခါက္ဆဲြစားသည္။ He eats the noodle with chopsticks.

where A represents each word in ambiguous vector

You might also like