A Study of Indonesian-To-Malaysian MT System

Uploaded by

anakothman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

A Study of Indonesian-To-Malaysian MT System

Uploaded by

anakothman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

A Study of Indonesian-to-Malaysian

MT System

Septina Dian Larasati, Vladislav Kuboň

Inst. Of Formal and Applied Linguistics
Charles University
Prague, Czech Republic

Abstract—The paper presents an ongoing work on the was originally developed for European languages and one of
implementation of an MT system between Indonesian and the main goals of this paper is to describe the issues
Malaysian. The system uses a method of almost a direct encountered in the process of the application of the method to a
translation exploiting the similarity of both languages. This pair of Asian languages which are typologically different from
method was previously used on a number of language pairs of the European languages for which the method has been
European languages. The paper also makes an overview of originally developed (Slavic and Romance languages).
linguistic phenomena which can negatively influence the
translation quality and it suggests a solution for some of them. If we look at the experiments made so far for related
languages, we will find numerous experiments which have
Keywords-machine translation; related languages; direct been performed recently for various language groups:
translation; morphology; hybrid method
 for Slavic languages in [12] and [16],
I. INTRODUCTION  for Scandinavian languages in [3], [6], and [13],
Probably none other linguistic application area has attracted  for Turkic languages in [10]
as much research effort as the area of automatic translation of
texts between natural languages (a field usually called Machine  and for languages of Spain in [1].
Translation -- MT). After more than fifty years of research
The close relatedness of natural languages from one
during which there were periods of uncritical expectations
typological group (and sometimes even across the group
followed by long periods of bitter despair, the application of borders, cf., Czech-to-Lithuanian experiment described in [8])
stochastic methods brought new hopes into a field which
makes the translation task easier thus allowing for the
notoriously failed to provide acceptable results. The stochastic
application of methods which would not be good enough for
methods rejected traditional rule-based approaches and
the translation of unrelated language pairs. Using simpler
replaced them by the exploitation of bigger and bigger amounts
methods does not mean a lower translation quality - many of
of data. The lack of large coverage grammars was replaced by
the translation errors result from the imperfect attempts to parse
a lack of parallel data.
a source language fully, in some cases even to the deep
Although nowadays the expectations are yet again very syntactic level of representation. The accumulation of errors in
high, it is clear that not even the current breakthrough caused parsing, transfer and generation in the systems using the
by stochastic or hybrid approaches as, e.g., in the factored classical transfer-based architecture substantially decreases the
translation model described in [17], will solve all the problems, translation quality.
especially the problems of less represented languages.
One property which makes the translation task easier is the II. TYPOLOGY OF THE LANGUAGE
relatedness of the source and target languages. The relatedness Although spoken by millions of speakers, research on this
usually means a great deal of similarity at all levels, but the pair of languages has not been very enthusiastic compare to
experiments carried out in the past (cf. the references further in most of the European languages. This makes these two closely
the text) have shown that the most important level is the level related languages under question very compelling to be
of syntax closely followed by morphology. explored. Coming from the same language family,
Austronesian, the languages share similar behavior which
This article describes an experiment with the application of
usually being misapprehended by non-natives that they both
an existing model for the MT between related languages on a
are mutually intelligible. The languages are very dynamic
new language pair from a very different language group. The
where the evolution makes them differ from one another.
architecture of the system is based primarily on rule-based
approach which allows for a great deal of ambiguity in all Both of these agglutinative languages have similar
steps. This ambiguity is then resolved by a simple stochastic morphology mechanisms and share some words, both the
ranking of all translation hypotheses. The simple architecture words with exact or similar meaning and also the words with

This work was supported by an Erasmus Mundus Master Program Language

and Communication Technologies and partially supported by the grant MSM
0021620838 of the MŠMT ČR.
different meaning that can be misinterpreted by both native only between related languages but also can be extended for
speakers. Example on words that can be misinterpreted is the language pairs which are not closely related.
word „kereta‟ which means „car‟ in Malaysian and „train‟ in
Indonesian. That word can be inflected in the same way such as Apertium has a modular architecture [2] and in each
„berkereta‟ which means „having car‟ in Malaysian and module it provides various tool options depending on the
nature of the language. In this MT system some module are
„having train‟ in Indonesian. With these backgrounds, this
language pair is a suitable pair to apply this shallow rule-based skipped from the original setting. The modules that are being
MT method. kept in this MT system are

Orthography – The alphabet is basic modern Latin Morphological Analyser – the surface forms are
segmented and each form will be analyzed to get the lexical
alphabet with hyphen used to separate words on the
reduplication case and on special clitic case. unit, such as lemma, Part-of-Speech tags and morphological
inflection information. Apertium offers various morphological
Word Order – The word order is fixed and the position in analysis tools that can accommodate different nature of
the sentence is essential to determine the role of the word in the languages. For this particular language pair under question, the
sentence. morphological analyser are developed based on Xerox finite-
state tools (XFST) and high-level declarative language to
Tense – The languages do not have special inflection tense specify language lexicon (LEXC), which then compiled in
marking. The tense are marked by using additional word or Foma (http://foma.sourceforge.net/) [14], a finite state toolkit
temporal information in the sentence.
that implements Xerox xfst and lexc. This module includes the
Voice – The sentence voices are marked by different prefix source language monolingual dictionary as well.
of the inflected word.
Gender – Classification of gender is not common although
it occurs in some irregular cases marked by several suffixes.
This fashion is now rarely used and not productive any longer.
Number – The plurality is not only found in Nouns but
also in other Part-of-Speech (POS) where it marks the plurality
of the action or referring to plural entities.

III. ARCHITECTURE OF THE SYSTEM

Most of the systems mentioned in the introduction section
try to exploit the similarity of closely related languages. This
can apparently be done only in case that the system architecture
is reasonably simple. The more complicated the architecture is,
the higher number of errors is introduced into the translation
process by individual modules. These errors then negate the
advantage of working with closely related languages.
The most successful architecture for simple MT systems
had been developed for the system Česílko [7], and also used
by the system Apertium [1]. The fact that Apertium is an open-
source platform and thus can easily been adopted for
experiments with other language pairs led us to the decision to
use it for our experiments with two South-Asian languages,
Figure 1. MT System Modular Architecture
Malaysian and Indonesian.
As mentioned above, the architecture of Česílko and Part-of-Speech Tagger – trained using text corpus and
Apertium is relatively simple. The systems are in fact transfer tagger definition file to disambiguate the analysis.
based systems with the transfer being performed either at the
Lexical Transfer and Structural Transfer – reads each
morphological or shallow syntactic level (depending on the
source language word analysis and transfers it into the target
degree of syntactic similarity of a source and a target
language using bilingual dictionary. Structural transfer between
language). The role of morphology in such a system is really
source and target language can be done in three stages,
crucial.
Chunker, Interchunk, and Postchunk depending on the need.
Indonesian and Malaysian MT system is implemented on This MT system only utilizes one stage transfer.
Apertium (http://www.apertium.org), a free/open-source MT
Morphological Generator – the reverse direction of
platform for developing rule-based machine translation system
[15]. This platform is a shallow-transfer machine translation Morphological Analyser to generate the analysis results to their
surface forms.
engine word-to-word machine translation to produce fast,
reasonably intelligible and easily correctable translations not Ranker – is also added to choose the best translation
hypotheses statistically.
IV. MORPHOLOGICAL ANALYSIS AND GENERATION initially designed for. It works by defining exhaustive
Considering the typology of the languages under question, combination of the inflection forms that are possible in a
the extensive engineering task falls on the morphological language, called paradigm. We found that this tool cannot
analyser and generation compared to the other parts. Here accommodate well Indonesian and Malaysian morphology by
describes the morphological operations of the language these several limitations:
followed by how the analysis and generation are implemented.  The treatment for morphemes that precedes the
base word is not straightforward. The analysis
A. Morphological Operations expected from this module is in the form of lemma
The language pair has similar morphological mechanism. followed by morphological tag(s), for example
We broke down this mechanism into four morphological pesan<n><bare><sg>. The process of the
operations. Those operations that have to be handled are analysis is done on the position of the inflection.
Therefore the prefix analysis, which is the tag(s),
1. Affixation. This operation including prefix, suffix, will be in the front of the lemma. By this, a
and circumfix. There are several cases of infixes, separate additional reformatting needs to be done.
which now are rarely used. These special cases are Moreover, circumfix will be treated as
being handled differently in the language resource independent prefix and suffix.
part (see Language Resource).
 The morphophonemic are handled by expanding
2. Reduplication. The reduplication can occur on any the morpheme to its whole possible inflection
POS. It is divided into three different types, forms. For example for the pre-prefix „meN-‟ will
namely full reduplication, partial reduplication and be expanded to its several different forms
affixed reduplication. Partial reduplication is not considering to which base word it glued to. This
handled in the morphological analyser but treated morpheme will inflect into „menge-‟ for one
as an entry in the dictionary. syllable case, „meng-‟ for words starting with [a i
3. Clitic. Enclitic and proclitic are representing the u e o g h], „meny-‟ for words starting with [s, y]
pronouns. It can be kept as clitic or restored to its and so on.
corresponding independent pronoun, where both  This tool cannot handle reduplication cases.
ways are grammatically correct.
Therefore to encounter this we decided not to use Lttoolbox
4. Particle. Particle marks the stress, level of and initially employed an available Indonesian morphological
formality and constructing question words. analyser [4], which was developed in xfst/lexc platform. This
Shown in Figure 2, the schema of how the inflection around tool has already handled the reduplication and Indonesian
the lemma. The prefix itself is divided into two depending on morphophonemic. To incorporate this tool to Apertium we
the position and then named as pre-prefix and prefix. The compiled it in Foma, a finite-state toolkit.
reduplication can occur almost everywhere in the affixed This morphological analyser includes large number of
lemma. Indonesian lemmata, but unfortunately the coverage of how it
handles the inflections was not adequate enough for the task,
where
 It covers partly the morphological operations. The
morphological operation that it handles was
reduplication and several affixations, not including
clitic and particle. The uncovered cases will cause
the inflected word to be left un-translated.
 The tagset is underspecified for generation. It
consists of 17 general tags, which mostly tag the
Part-Of-Speech (POS) and the morphological
operation that occurs. The POS tag simply marks
Figure 2. Morphological Operations Schema three POS types, namely Verb, Noun, and
Adjective, while others are considered as Etc.
B. Morphological Tool  Several inflected words have the same analysis,
Since the morphological mechanisms are similar, we which is unfavorable for the translation since those
simply use the same morphological analyser for both different inflected words will be transferred to the
languages. The widely used tool to do analysis and generation same target analysis. For example in the case of
on Apertium platform is Lttoolbox, a toolbox for lexical the noun derivation „kiriman’, „pengirim‟ and
processing, morphological analysis and generation of words. „pengiriman’ from the verb „kirim‟ will have
This tool has been used on several language pairs and mostly kirim+Noun as the result of the analysis.
on languages that has the inflection on suffix as Apertium was
 Yet relating to the tagset problem, the generation <abstract> derived abstract noun DERNOUN
step generates a big number of inflected words, <actio> derived action noun DERNOUN
which will produce bigger numbers of translation <actor> derived actor noun DERNOUN
<ent> derived entity noun DERNOUN
hypotheses. For example, the analysis <theme> derived theme noun DERNOUN
kirim+Noun will generate words as showed in <positive> bare adjective DERADJ
Table I. superlative adjective DERADJ
<exceed> adjective that shows something exceeding DERADJ
<manner> adjective that shows similar manner DERADJ
TABLE I. PROBLEM IN THE ANALYSIS /GENERATION <uni> union adjective DERADJ
Analysis Result <possib> adjectival phrase DERADJ
kiriman <enc> enclitic CLITIC
pengirim > kirim+Noun <pro> proclitic CLITIC
pengiriman <appl> applicative TRANSITIVITY
<caus> causative TRANSITIVITY
Generation Result <cap> capitalization mark MARK
pengirim <pos> possesive mark MARK
pengiriman
*pemberkiriman Comparing to the previous example, with the current
*perkiriman morphological analyser the analysis are more precise.
kirim+Noun >
*kepengiriman
*keberkiriman
*kekiriman TABLE III. CURRENT ANALYSIS/GENERATION
kiriman
Analysis Result
*) marks the ungrammatical inflected words kiriman > kirim<vblex><ent><sg>
#) marks the un-generated inflected words
pengirim > kirim<vblex><actor><sg>
pengiriman > kirim<vblex><actio><sg>

Initiating from that we take the part where it handles the Generation Result
morphophonemic and reduplication. Then we build a kirim<vblex><actor><sg> > pengirim
morphological analyser with more extensive inflection kirim<vblex><actio><sg> > pengiriman
*#pemberkiriman
coverage. We also introduce more fine-grained tags and change *#perkiriman
the forms from +TAG into <TAG> to suit Apertium platform. *#kepengiriman
*#keberkiriman
*#kekiriman
TABLE II. MORPHOLOGICAL TAGSET
kirim<vblex><ent><sg> > kiriman
Tag Description Tag Type
<adj> adjective lemma POS *) marks the ungrammatical inflected words
<n> noun lemma POS #) marks the un-generated inflected words
<num> number lemma POS
<prn> pronoun POS
<det> determiner POS Here is the analysis for Indonesian sentence “apabila,
<cnjcoo> coordinating conjunction POS sebelum mengunduh, menginstal, mengaktifkan atau
<cnjsub> subordinating conjunction POS
<vblex> verb lemma POS
menggunakan piranti lunak, anda memutuskan bahwa anda
<part> particle POS tidak bersedia untuk menyetujui ketentuan-ketentuan
<mod> modal POS perjanjian ini, anda tidak bisa dan tidak berhak menggunakan
<ij> interjection POS piranti lunak ini” (“if, before downloading, installing,
<qst> question word POS activating or using the software, you decided that you are
<pr> preposition lemma POS unwilling to agree to this agreement terms, you cannot and do
<p1> first person PERSON not have right to use this software”).
<p2> second person PERSON
<p3> third person PERSON âpabila/apabila<cnjsub>$
<sg> singular NUM ,
<pl> plural NUM ^sebelum/sebelum<cnjsub>$
<card> cardinal number DERNUM ^mengunduh/unduh<vblex><actv><imp><sg>$
<ord> ordinal number DERNUM ,
<coll> collective number DERNUM ^menginstal/instal<vblex><actv><imp><sg>$
<ref> referential number DERNUM ,
<vbhaver> verb „to have‟ VERBVAR ^mengaktifkan/aktif<adj><actv><imp><caus><sg>$
<vbser> verb „to be‟ VERBVAR âtau/atau<cnjcoo>$
<actv> active voice VOICE ^menggunakan/guna<n><actv><imp><caus><sg>$
<pasv> passive voice VOICE ^piranti~lunak/piranti~lunak<n><bare><sg>$
<perf> perfective aspect ASPECT ,
<imp> imperfective aspect ASPECT ânda/anda<prn><p2><sg>$
<bare> bare noun DERNOUN ^memutuskan/putus<adj><actv><imp><caus><sg>$
^bahwa/bahwa<cnjsub>$ <e><l>apabila<s n="cnjsub"/></l>
ânda/anda<prn><p2><sg>$ <r>jika<s n="cnjsub"/></r></e>
^tidak~bersedia/enggan<adj><positive>$ <e><l>sebelum<s n="cnjsub"/></l>
ûntuk/untuk<pr>$ <r>sebelum
^menyetujui/setuju<vblex><actv><imp><appl><sg>$ <s n="cnjsub"/></r></e>
^ketentuan-ketentuan/tentu<adj><abstract><pl>$ <e><l>unduh<s n="vblex"/></l>
^perjanjian/janji<n><theme><sg>$ <r>muatturunkan
îni/ini<det>$
<s n="vblex"/></r></e>
,
ânda/anda<prn><p2><sg>$
<e><l>instal<s n="vblex"/></l>
^tidak/tidak<adv>$ <r>pasang<s n="vblex"/></r></e>
^bisa/bisa<mod>/bisa<n><bare><sg>$ <e><l>aktif<s n="adj"/></l>
^dan/dan<cnjcoo>$ <r>aktif<s n="adj"/></r></e>
^tidak/tidak<adv>$ <e><l>atau<s n="cnjcoo"/></l>
^berhak/hak<n><actv><perf><vbhaver><bare><sg>$ <r>atau<s n="cnjcoo"/></r></e>
^menggunakan/guna<n><actv><imp><caus><sg>$ <e><l>guna<s n="n"/></l>
^piranti~lunak/piranti~lunak<n><bare><sg>$ <r>guna<s n="n"/></r></e>
îni/ini<det>$ <e><l>piranti~lunak<s n="n"/></l>
<r>perisian<s n="n"/></r></e>
Figure 3. Analysis Example for Indonesian Sentence <e><l>anda<s n="prn"/></l>
“apabila, sebelum mengunduh, menginstal, mengaktifkan atau menggunakan <r>anda<s n="prn"/></r></e>
piranti lunak, anda memutuskan bahwa anda tidak bersedia untuk menyetujui <e><l>putus<s n="adj"/></l>
ketentuan-ketentuan perjanjian ini, anda tidak bisa dan tidak berhak <r>putus<s n="adj"/></r></e>
menggunakan piranti lunak ini” <e><l>bahwa<s n="cnjsub"/></l>
<r>bahawa<s n="cnjsub"/></r></e>
The generation process is simply the opposite direction of <e><l>enggan<s n="adj"/></l>
the analysis, where the surface forms are composed based on <r>enggan<s n="adj"/></r></e>
the analysis. <e><l>untuk<s n="pr"/></l>
<r>untuk<s n="pr"/></r></e>
V. DISAMBIGUATION <e><l>setuju<s n="vblex"/></l>
<r>bersetuju
Although the morphological analysis has been expanded to <s n="vblex"/></r></e>
prevent ambiguities, but cases such as homophones will still <e><l>tentu<s n="adj"/>
remain. The word ‘bisa’ in the previous analysis example <s n="abstract"/>
(Figure 3) will have two possible analyses since it is a <s n="pl"/></l>
homophone for the word „can/able to‟, a modal verb, and <r>terma<s n="n"/><s n="bare"/>
„snake venom‟, a noun. This several analyses are <s n="pl"/></r></e>
disambiguated statistically based on some probability. <e><l>janji<s n="n"/></l>
<r>janji<s n="n"/></r></e>
The disambiguation of the analyses is done in the POS <e><l>ini<s n="det"/></l>
tagger. There are several ways provided by Apertium to train <r>ini<s n="det"/></r></e>
the Tagger. We choose to use the target language tagger <e><l>tidak<s n="adv"/></l>
training, that provided by Apertium [5]. This training process is <r>tidak<s n="adv"/></r></e>
relatively faster and more suitable for our MT system which <e><l>hak<s n="n"/></l>
only has one-stage transfer. It trains the tagger based on the <r>hak<s n="n"/></r></e>
source and target language. Intend to do that we need to have a
text corpus in source and target languages, a tag definition file, Figure 4. Bilingual Dictionary Entries
and having the MT system running. In the tag definition file we
specify the sequence of tags that is enforced or forbidden to be The bilingual dictionary records the lemma and the
occurring in the analysis. The analysis of the word ‘bisa’ in necessary tags such as POS tag. Compound words are recorded
Figure 3 is being disambiguate into as one entry, for example the word “ibu kota” which translated
as capital city, will be mapped to “ibu negara” (which in
^bisa<mod>$ Indonesian will be misinterpreted as „first lady‟).
VI. TRANSFER
The translation to the target language takes place in the <e><l>ibu~kota<s n="n"/></l>
lexical and structural transfers. The analyses of the source <r>ibu~negara<s n="n"/>
language are transferred into the target language and then it is </r></e>
generated to the target surface form.
The transfer between the two languages is done using Figure 5. Bilingual Dictionary Entries – Compound words
transfer rules and bilingual dictionary. The sentence structure
of both languages is similar where reordering is not required. A preprocess is conducted to add tilde „~‟ character to
We use Lttoolbox to keep the bilingual dictionary. combine the compound words together so that Foma will
handle it as single word. This is because currently Foma does
not tokenize the sentence while doing the analysis which is a the first task which probably will help us to improve the system
functionality that other Apertium morphological tools have, in the future. The development of building the full pipeline of
such as Lttoolbox and HFST. the system didn‟t take most of the development time if
compared to the effort on developing the resources such as
VII. LANGUAGE RESOURCES morphological analyser and dictionaries.
In the analysis and generation step, monolingual It will be an interesting research to build the MT system in
dictionaries on both languages are needed. To build the the opposite direction, Malaysian to Indonesian, which appears
Indonesian monolingual dictionary, we take the list of lemmata to be somehow symmetrical. Another challenging research
that was available before on the previous Morphological would be to make Indonesian/Malaysian-English MT system
Analyser [4] and adapt it with the current setting. We keep only using this approach.
the lemmata that are tagged as Noun, Verb, and Adjectives.
Additionally, closed word entries such as prepositions or ACKNOWLEDGMENT
conjunctions are added and tagged. The problem in Malaysian
side is that we do not have list of Malaysian lemmata as we Thanks to Måns Huldén for his help in converting patent-
have in Indonesian side. We simply take the Malaysian entry encumbered and some other aspects of the Xerox syntax into
on the bilingual dictionary. Foma. Thanks to Francis Tyers for the support and his help
setting up the new Apertium language pair development
Indonesian and Malaysian dictionary is not yet available. environment.
To build a fast and cheap bilingual dictionary, we grabbed
available public online dictionary and also generating it from a REFERENCES
parallel corpus. Here describes the process of the dictionary
construction:
[1] A. M. Corbi-Bellot, M. L. Forcada, S. Ortiz-Rojas, J. A. Prez-Ortiz, G.
1) Online Dictionary. There are several online dictionary Ramirez-Sanchez, F. Sanchez-Martinez, I. Alegria, A. Mayor, and K.
Sarasola, “An open-source shallow-transfer machine translation engine
website available. We query the site for each Indonesian for the romance languages of spain,” Proceedings of the Tenth
lemma and grabbed the translation word if available. The Conference of the European Association for Machine Translation, pp.
source tag and the target tag are also recorded. 79–86, May 2005.
2) Statistical word pairing. Word pairs are also build by [2] F. M. Tyers, F. Sánchez-Martínez, S. Ortiz-Rojas, and M. L. Forcada,
“Free/open-source resources in the Apertium platform for machine
using statistical method. This is done by training a small size translation research and development,” The Prague Bulletin of
of parallel corpus composed from several sources such as Mathematical Linguistics No. 93, pp. 67-76, 2010.
manuals, recipes, agreements, and holy books. The tools used [3] F. M. Tyers, L. Wiechetek, and T. Trosterud, “Developing prototypes
is Moses (http://www.statmt.org/moses/) [18]. On the source for machine translation between two Sámi languages,”
Proceedings of the 13th Annual Conference of the European Association
language side, the words are being analyzed to get the analysis ofMachine Translation, EAMT09, 2009.
forms (lemma and morphological tags) while the target side [4] F. Pisceldo, R. Mahendra, R. Manurung, and I W. Arka, “A Two-Level
composed of sentences with words in surface forms. After we Morphological Analyser for Indonesian,” Abstract submitted to the
got the word pairs, the words morphems on the target side are Australasian Language Technology (ALTA) Workshop 2008, Tasmania,
2008.
stripped. This is done to get lemma-to-lemma pairs. [5] F. Sánchez-Martínez, J. A. Pérez-Ortiz, and M. L. Forcada, “Using
The results from both approaches are merged and target-language information to train part-of-speech taggers for machine
handpicked to retain the quality of the translation. translation,” Machine Translation, volume 22, numbers 1-2, pp.29-66.
[6] H. Dyvik, “Exploiting structural similarities in machine translation,”
Computers and Humanities 28, pp. 225–245, 1995.
VIII. CONCLUSIONS AND FUTURE WORK [7] J. Hajič, J. Hric, and V. Kuboň, “Machine translation of very close
languages,” Proceedings of the 6th Applied Natural Language
Although the experiment described in the paper is still work Processing Conference, 2000.
in progress and we are at the current stage unable to provide a [8] J. Hajič, P. Homola, and V. Kuboň, “A simple multilingual machine
standard quality evaluation, there are already some results translation system,” Proceedings of the MT Summit IX, New Orleans,
which may turn out to be important for further research. 2003.
[9] J. Vičič, “Rapid development of data for shallow transfer rbmt
First of all, the work on the system has led us to the translation systems for highly inflective languages,” Jezikovne
investigation of both languages in the direction of how certain tehnologije, language technologies : zbornik konference : proceedings of
phenomena may be handled from the point of view of machine the conference, pp. 98–103, 2008.
translation, which phenomena may cause problems in a [10] K. Altintas and I. Cicekli, “A machine translation system between a pair
relatively straightforward system etc. of closely related languages,” Proceedings of the 17th International
Symposium on Computer and Information Sciences (ISCIS 2002), 2002.
Second, the relatively high numbers of resources needed for [11] K. Oliva, “A Parser for Czech Implemented in Systems Q,” Explizite
building individual modules for the system made us think Beschreibung der Sprache und automatische Textbearbeitung XVI, MFF
about the methods how to obtain them in a reasonable quantity UK Prague, 1989.
and quality. This turned out to be a challenge especially [12] K. P. Scanell, “Machine translation for closely related language pairs,”
because for the European languages used in previous Unknown, 2008.
experiments there are many more resources available, nothing [13] K. Unhammer and T. Trosterud, “Reuse of free resources in machine
is usually built from scratch. Building better resources will be translation between Nynorsk and Bokmål,” Proceedings of the First
International Workshop on Free/Open-Source Rule-Based Machine
Translation / Edited by J. A. Pérez-Ortiz, F. Sánchez-Martínez, F. M.
Tyers, pp. 35-42, Alicante : Universidad de Alicante, Departamento de
Lenguajes y Sistemas Informáticos, 2009.
[14] M. Hulden, “Foma: a finite-state compiler and library,” Proceedings of
the 12th Conference of the European Chapter of the Association for
Computational Linguistics: Demonstrations Session, pp. 29-32, Athens,
Greece, April 03-03, 2009.
[15] M. L. Forcada, F. M. Tyers, and G. Ramírez-Sánchez, “The
free/opensource machine translation platform Apertium: Five years on,”
Proceedings of the First International Workshop on Free/Open-Source
Rule-Based Machine Translation FreeRBMT'09, pp. 3-10, November
2009.
[16] P. Homola and V. Kuboň, “A translation model for languages of
acceding countries,” Proceedings of the IX EAMT Workshop, La
Valetta, University of Malta, 2004.
[17] P. Koehn and H. Hoang, “Factored translation models,” Proceedings of
the 2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning (EMNLP-
CoNLL), pp. 868–876, 2007.
[18] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N.
Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A.
Constantin, E. Herbst, “Moses: Open Source Toolkit for Statistical
Machine Translation,” Annual Meeting of the Association for
Computational Linguistics (ACL): Demonstration session, Prague,
Czech Republic, June 2007.
[19] S. Marinov, “Structural Similarities in MT: A Bulgarian-Polish case,”
unknown, 2003.

Administrador,+Brita+Banitzr+CT+40 1 PdfA
No ratings yet
Administrador,+Brita+Banitzr+CT+40 1 PdfA
18 pages
Machine Translation Approaches and Survey For Indian Languages
No ratings yet
Machine Translation Approaches and Survey For Indian Languages
18 pages
Interlingual Machine Translation
No ratings yet
Interlingual Machine Translation
27 pages
Survey On Machine Translation Approaches Used in India: D S Rawat
No ratings yet
Survey On Machine Translation Approaches Used in India: D S Rawat
4 pages
A Retrospective
No ratings yet
A Retrospective
11 pages
Telugu To English Translation Using Direct Machine Translation Approach
No ratings yet
Telugu To English Translation Using Direct Machine Translation Approach
8 pages
1.1 General: Resourced" Languages. To Enhance The Translation Performance of Dissimilar Language
No ratings yet
1.1 General: Resourced" Languages. To Enhance The Translation Performance of Dissimilar Language
18 pages
Eng Arabic RBMT
No ratings yet
Eng Arabic RBMT
5 pages
Neural and Statistical Machine Translation: Confronting The State of The Art
No ratings yet
Neural and Statistical Machine Translation: Confronting The State of The Art
13 pages
Neural and Statistical Machine Translation: Confronting The State of The Art
No ratings yet
Neural and Statistical Machine Translation: Confronting The State of The Art
13 pages
JSeva-ODEP-PhD - PristupniRad - Automatic Language Translation
No ratings yet
JSeva-ODEP-PhD - PristupniRad - Automatic Language Translation
13 pages
Translator From Yoruba To English
No ratings yet
Translator From Yoruba To English
18 pages
4.1 Multilingual Versus Bilingual Systems
No ratings yet
4.1 Multilingual Versus Bilingual Systems
12 pages
Machine Translation: Michael Melese (PHD) Michael - Melese@Aau - Edu.Et
No ratings yet
Machine Translation: Michael Melese (PHD) Michael - Melese@Aau - Edu.Et
22 pages
On Application of Natural Language Processing in Machine Translation
No ratings yet
On Application of Natural Language Processing in Machine Translation
5 pages
Automated Machine Translation For Regional Languages: Problem Statement
No ratings yet
Automated Machine Translation For Regional Languages: Problem Statement
2 pages
English To Yorùbá Machine Translation System Using Rule-Based Approach
No ratings yet
English To Yorùbá Machine Translation System Using Rule-Based Approach
6 pages
Machine Translation: History and General Principles: 1. Basic Features and Terminology
No ratings yet
Machine Translation: History and General Principles: 1. Basic Features and Terminology
18 pages
Pivot-Based Hybrid Machine Translation To Support Multilingual Communication For Closely Related Languages
No ratings yet
Pivot-Based Hybrid Machine Translation To Support Multilingual Communication For Closely Related Languages
6 pages
Development of Bi-Directional English To Yoruba Translator For Real-Time Mobile Chatting
No ratings yet
Development of Bi-Directional English To Yoruba Translator For Real-Time Mobile Chatting
16 pages
The Statistical Machine Translation
No ratings yet
The Statistical Machine Translation
9 pages
Blanca Roig Allué: Entreculturas 9
No ratings yet
Blanca Roig Allué: Entreculturas 9
14 pages
Improving The Performance of English-Tamil Statistical Machine Translation System Using Source-Side Pre-Processing
No ratings yet
Improving The Performance of English-Tamil Statistical Machine Translation System Using Source-Side Pre-Processing
11 pages
Comparative Study of Machine Translation Techniques
No ratings yet
Comparative Study of Machine Translation Techniques
16 pages
Leeds 2006
No ratings yet
Leeds 2006
34 pages
Usability Analysis of The Concordia Tool Applying Novel Concordance Searching
No ratings yet
Usability Analysis of The Concordia Tool Applying Novel Concordance Searching
11 pages
Machine Translation For English To Kanna
No ratings yet
Machine Translation For English To Kanna
8 pages
Evaluation of Machine Translation
No ratings yet
Evaluation of Machine Translation
5 pages
Machine Translation Computer-Assisted Translation
No ratings yet
Machine Translation Computer-Assisted Translation
33 pages
Using Synonyms For Arabic-to-English Example-Based Translation
No ratings yet
Using Synonyms For Arabic-to-English Example-Based Translation
10 pages
Semantic-Based Malay-English Translation Using N-Gram Model
No ratings yet
Semantic-Based Malay-English Translation Using N-Gram Model
7 pages
qt2df5d55c Nosplash
No ratings yet
qt2df5d55c Nosplash
19 pages
Interactive English To Urdu Machine Translation Using Example-Based Approach
100% (2)
Interactive English To Urdu Machine Translation Using Example-Based Approach
8 pages
Machine Translation
No ratings yet
Machine Translation
11 pages
Machine Translation
No ratings yet
Machine Translation
11 pages
Machine Translation: Problems and Issues: John Hutchins
No ratings yet
Machine Translation: Problems and Issues: John Hutchins
18 pages
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
No ratings yet
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
7 pages
Linguistic Bases For Machine Translation: Christian Rohrer
No ratings yet
Linguistic Bases For Machine Translation: Christian Rohrer
3 pages
An SMT-driven Authoring Tool: Sriram Venkatapathy Shachar M Irkin
No ratings yet
An SMT-driven Authoring Tool: Sriram Venkatapathy Shachar M Irkin
8 pages
Haddow 等 - 2022 - Survey of Low-Resource Machine Translation
No ratings yet
Haddow 等 - 2022 - Survey of Low-Resource Machine Translation
60 pages
Tech 1st One
No ratings yet
Tech 1st One
54 pages
20533-21921941761-1-SM
No ratings yet
20533-21921941761-1-SM
22 pages
An Introduction To Machine Translation: Andy Way, DCU
No ratings yet
An Introduction To Machine Translation: Andy Way, DCU
23 pages
(IJCST-V9I1P20) :T. Madhavi Kumari, Dr. A. Vinaya Babu
No ratings yet
(IJCST-V9I1P20) :T. Madhavi Kumari, Dr. A. Vinaya Babu
6 pages
Translating Transliterations
No ratings yet
Translating Transliterations
15 pages
Arabic Malay Machine Translation
No ratings yet
Arabic Malay Machine Translation
7 pages
Seminar Sample Report
No ratings yet
Seminar Sample Report
20 pages
System Combination Using Joint, Binarised Feature Vectors: Christian F EDERMAN N
No ratings yet
System Combination Using Joint, Binarised Feature Vectors: Christian F EDERMAN N
8 pages
A Rule-Based English To Arabic Machine Translation Approach: December 2015
No ratings yet
A Rule-Based English To Arabic Machine Translation Approach: December 2015
8 pages
Contextual Trans Using Machine
No ratings yet
Contextual Trans Using Machine
14 pages
2017 Oct Conf Machine Translation PDF
No ratings yet
2017 Oct Conf Machine Translation PDF
9 pages
Machine Translation Approaches Issues An
No ratings yet
Machine Translation Approaches Issues An
7 pages
2023.nlp4tia-1.7
No ratings yet
2023.nlp4tia-1.7
6 pages
Machine Translation Spanish-To-English Translation System Using RNNs
No ratings yet
Machine Translation Spanish-To-English Translation System Using RNNs
9 pages
Interlingua in Machine Translation
No ratings yet
Interlingua in Machine Translation
5 pages
Error Types in The Computer-Aided Translation of Tourism Texts - Garbi
No ratings yet
Error Types in The Computer-Aided Translation of Tourism Texts - Garbi
5 pages
SLT 1997 FrederkiTranslation Memory Engines: A Look Under The Hood and Road Testng
No ratings yet
SLT 1997 FrederkiTranslation Memory Engines: A Look Under The Hood and Road Testng
6 pages
Contrastive Linguistics-Translation Studies-Machine Translations
100% (1)
Contrastive Linguistics-Translation Studies-Machine Translations
53 pages
564 - Sky Calendars of The Indo-Malay Archipelago - Regional Diversity:Local Knowledge
No ratings yet
564 - Sky Calendars of The Indo-Malay Archipelago - Regional Diversity:Local Knowledge
22 pages
6. Language Teaching and Dictionary Use_An Overview 2
No ratings yet
6. Language Teaching and Dictionary Use_An Overview 2
12 pages
588 - Some Ontroductory Notes On The Development and Characteristics of Sabah Malay
No ratings yet
588 - Some Ontroductory Notes On The Development and Characteristics of Sabah Malay
29 pages
The Migration of Non-Local Communities and Effects
No ratings yet
The Migration of Non-Local Communities and Effects
8 pages
Education and Social Mobility
No ratings yet
Education and Social Mobility
42 pages
Detecting Pre-Modern Lexical Influence From South India in Maritime Southeast Asia
No ratings yet
Detecting Pre-Modern Lexical Influence From South India in Maritime Southeast Asia
33 pages
Indonesian Morphology Tool (MorphInd) - Towards An Indonesian Corpus
No ratings yet
Indonesian Morphology Tool (MorphInd) - Towards An Indonesian Corpus
11 pages
618 - A Collection of Unstandardised Consistencies The Use of Jawi Script in A Few Early Malay Manuscripts From The Moluccas
No ratings yet
618 - A Collection of Unstandardised Consistencies The Use of Jawi Script in A Few Early Malay Manuscripts From The Moluccas
21 pages
The Science of Women and The Jewel - The Synthesis of Tantrism and Sufism in A Corpus of Mystical Texts From Aceh
No ratings yet
The Science of Women and The Jewel - The Synthesis of Tantrism and Sufism in A Corpus of Mystical Texts From Aceh
40 pages
The Lexicography of Regional Panguages in Indonesia
No ratings yet
The Lexicography of Regional Panguages in Indonesia
17 pages
Words by Women Words On Women - John Considine
No ratings yet
Words by Women Words On Women - John Considine
90 pages
On The History of Indonesian
No ratings yet
On The History of Indonesian
28 pages
475 - Where Does Malay Come From? Twenty Years of Discussions About Homeland, Migrations and Classification
No ratings yet
475 - Where Does Malay Come From? Twenty Years of Discussions About Homeland, Migrations and Classification
31 pages
Nutmeg and Mace From Fruit To Spice
No ratings yet
Nutmeg and Mace From Fruit To Spice
13 pages
Sampan Kajang The Orang Lauts Maritime Cultural H
No ratings yet
Sampan Kajang The Orang Lauts Maritime Cultural H
20 pages
476 - Jawi Language and Its Role in Establishment Civilization of Malayonesia - Noriah Mohamed
No ratings yet
476 - Jawi Language and Its Role in Establishment Civilization of Malayonesia - Noriah Mohamed
20 pages
Class 4
No ratings yet
Class 4
5 pages
TYBCom Computer MySQL Notes - 241111 - 123746
No ratings yet
TYBCom Computer MySQL Notes - 241111 - 123746
4 pages
UMAX07050 X
No ratings yet
UMAX07050 X
66 pages
L-36-39 - Biomimetics (Three Levels of Biomimicry)
No ratings yet
L-36-39 - Biomimetics (Three Levels of Biomimicry)
51 pages
Panagiotou 2003 Origem Da Swot
No ratings yet
Panagiotou 2003 Origem Da Swot
3 pages
Member's Change of Information Form
No ratings yet
Member's Change of Information Form
2 pages
D3796-90 S-Pitot Tube
0% (1)
D3796-90 S-Pitot Tube
13 pages
W9GA Sep22 Solution
No ratings yet
W9GA Sep22 Solution
18 pages
Chap 04
No ratings yet
Chap 04
18 pages
RESEARCH METHODLOGY AND Ipr LECTURE NOTES
No ratings yet
RESEARCH METHODLOGY AND Ipr LECTURE NOTES
3 pages
Netflix Goes To Bollywood: Donald Sull and Stefano Turconi
No ratings yet
Netflix Goes To Bollywood: Donald Sull and Stefano Turconi
19 pages
Mbeab Draw1102 Final Laboratory Module No.1
No ratings yet
Mbeab Draw1102 Final Laboratory Module No.1
13 pages
ONLINE SHOPPING +Daniel+Felipe+Estévez+Otero+5D
No ratings yet
ONLINE SHOPPING +Daniel+Felipe+Estévez+Otero+5D
3 pages
A Radical Approach To Cost Reduction at Climate Tech Companies Final
No ratings yet
A Radical Approach To Cost Reduction at Climate Tech Companies Final
9 pages
American Foreign Policy The Dynamics of Choice in The 21st Century Fourth Edition by Bruce W Jentleson Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
American Foreign Policy The Dynamics of Choice in The 21st Century Fourth Edition by Bruce W Jentleson Ebook and TestBank Bundle Unlocked Test Bank
337 pages
Trần Thị Anh BKC12132 Assignment 01 SDLC
No ratings yet
Trần Thị Anh BKC12132 Assignment 01 SDLC
45 pages
A91 - Rev5 - GB pc60 Data Sheet PDF
No ratings yet
A91 - Rev5 - GB pc60 Data Sheet PDF
2 pages
195 - Article Text-1304-2-10-20220815
No ratings yet
195 - Article Text-1304-2-10-20220815
8 pages
978-0-9766259-2-6 Chap 2 Excerpt
No ratings yet
978-0-9766259-2-6 Chap 2 Excerpt
36 pages
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
No ratings yet
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
22 pages
KX-TA824.Operating Manual
No ratings yet
KX-TA824.Operating Manual
188 pages
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
No ratings yet
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
51 pages
(Detectron2) Application of Convolutional Neural Network (CNN) To Recognize Ship Structures
No ratings yet
(Detectron2) Application of Convolutional Neural Network (CNN) To Recognize Ship Structures
16 pages
01 Fiche EOC Questions Et Reponses Stage Bac
No ratings yet
01 Fiche EOC Questions Et Reponses Stage Bac
4 pages
Semester Result
No ratings yet
Semester Result
1 page
Soft Computing Question Bank
No ratings yet
Soft Computing Question Bank
18 pages
Ficha de Tecnica DN Hg8431a
No ratings yet
Ficha de Tecnica DN Hg8431a
11 pages
LDF Example
No ratings yet
LDF Example
2 pages
Team Building Training Skill
No ratings yet
Team Building Training Skill
42 pages
Paperonc P
No ratings yet
Paperonc P
4 pages

A Study of Indonesian-To-Malaysian MT System

Uploaded by

A Study of Indonesian-To-Malaysian MT System

Uploaded by

A Study of Indonesian-to-Malaysian

Septina Dian Larasati, Vladislav Kuboň

This work was supported by an Erasmus Mundus Master Program Language

III. ARCHITECTURE OF THE SYSTEM

You might also like