0% found this document useful (0 votes)
6 views

Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming

Uploaded by

Belay Bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming

Uploaded by

Belay Bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267394306

Learning Morphological Rules for Amharic Verbs Using Inductive Logic


Programming

Conference Paper · May 2012


DOI: 10.13140/2.1.5171.2001

CITATIONS READS

22 4,438

2 authors:

Wondwossen Mulugeta Gewe Michael Gasser


Addis Ababa University Indiana University Bloomington
8 PUBLICATIONS 70 CITATIONS 74 PUBLICATIONS 1,557 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Wondwossen Mulugeta Gewe on 27 October 2014.

The user has requested enhancement of the downloaded file.


Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

Learning Morphological Rules for Amharic Verbs


Using Inductive Logic Programming

Wondwossen Mulugeta1 and Michael Gasser2


1
Addis Ababa University, Addis Ababa, Ethiopia
2
Indiana University, Bloomington, USA
1 2
E-mail: [email protected], [email protected]

Abstract
This paper presents a supervised machine learning approach to morphological analysis of Amharic verbs. We use Inductive Logic
Programming (ILP), implemented in CLOG. CLOG learns rules as a first order predicate decision list. Amharic, an under-resourced
African language, has very complex inflectional and derivational verb morphology, with four and five possible prefixes and suffixes
respectively. While the affixes are used to show various grammatical features, this paper addresses only subject prefixes and suffixes.
The training data used to learn the morphological rules are manually prepared according to the structure of the background
predicates used for the learning process. The training resulted in 108 stem extraction and 19 root template extraction rules from the
examples provided. After combining the various rules generated, the program has been tested using a test set containing 1,784
Amharic verbs. An accuracy of 86.99% has been achieved, encouraging further application of the method for complex Amharic
verbs and other parts of speech.

prepositions and conjunctions.


1. Introduction For Amharic, like most other languages, verbs have
Amharic is a Semitic language, related to Hebrew, the most complex morphology. In addition to the
Arabic, and Syriac. Next to Arabic, it is the second most affixation, reduplication, and compounding common to
spoken Semitic language with around 27 million other languages, in Amharic, as in other Semitic
speakers (Sieber, 2005; Gasser, 2011). As the working languages, verb stems consist of a root + vowels +
language of the Ethiopian Federal Government and template merger (e.g., sbr + ee + CVCVC, which leads
1
some regional governments in Ethiopia, most documents to the stem seber ‘broke’) (Yimam, 1995;
in the country are produced in Amharic. There is also an Bender, 1968). This non-concatenative process makes
enormous production of electronic and online accessible morphological analysis more complex than in languages
Amharic documents. whose morphology is characterized by simple affixation.
One of the fundamental computational tasks for a The affixes also contribute to the complexity. Verbs can
language is analysis of its morphology, where the goal is take up to four prefixes and up to five suffixes, and the
to derive the root and grammatical properties of a word affixes have an intricate set of co-occurrence rules.
based on its internal structure. Morphological analysis, For Amharic verbs, grammatical features are not only
especially for complex languages like Amharic, is vital shown using the affixes. The intercalation pattern of the
for development and application of many practical consonants and the vowels that make up the verb stem
natural language processing systems such as machine- will also be used to determine various grammatical
readable dictionaries, machine translation, information features of the word. For example, the following two
retrieval, spell-checkers, and speech recognition. words have the same prefixes and suffixes and the same
While various approaches have been used for other root while the pattern in which the consonants and the
languages, Amharic morphology has so far been vowels intercalated is different, resulting in different
attempted using only rule-based methods. In this paper, grammatical information.
we applied machine learning to the task. ?-sebr-alehu 1s pers.sing. simplex imperfective
?-seber-alehu 1stpers.sing.passive imperfective
2. Amharic Verb Morphology Figure 1: Stem template variation example
The different parts of speech and their formation In this second case, the difference in grammatical
along with the interrelationships which constitute the feature is due to the affixes rather than the internal root
morphology of Amharic words have been more or less template structure of the word.
thoroughly studied by linguists (Sieber, 2005; te-deres-ku  1st pers. sing. passive perfective
Dwawkins, 1960; Bender, 1968). In addition to lexical deres-ku  1st pers. sing. simplex perfective
information, the morphemes in an Amharic verb convey
Figure 2: Affix variation example
subject and object person, number, and gender; tense,
aspect, and mood; various derivational categories such
1
as passive, causative, and reciprocal; polarity Amharic is written in the Geez writing system. For our morphology learning
(affirmative/negative); relativization; and a range of system we romanize Amharic orthography, and we cite these romanized forms in
this paper.

7
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

As in many other languages, Amharic morphology is 4. ILP and Morphology Learning


also characterized by alternation rules governing the Inductive Logic Programming (ILP) is a supervised
form that morphemes take in particular environments. machine learning framework based on logic
The alternation can happen either at the stem affix programming. In ILP a hypothesis is drawn from
intersection points or within the stem itself. Suffix-based background knowledge and examples. The examples
alternation is seen, for example, in the second person (E), background knowledge (B) and hypothesis (H) all
singular feminine imperfect and imperative, shown in take the form of logic programs. The background
Table 1. The first two examples in Table 1 shows that, knowledge and the final hypothesis induced from the
the second person singular feminine imperative marker examples are used to evaluate new instances.
'i', if preceded by the character 'l', is altered to 'y'. The Since logic programming allows for the expression of
last two examples show that the same alternation rule arbitrary relations between objects, ILP is more
applies for imperfect roots. expressive than attribute-value representations, enabling
flexible use of background knowledge (Bratko & King,
No. Word Root Feature 1994; Mooney & Califf, 1995). It also has advantages
1 gdel gdl 2nd person sing. masc. imperative over approaches such as n-gram models, Hidden
2 gdey (gdel-i) gdl 2nd person sing. fem. imperative Markov Models, neural networks and SVM, which
represent examples using fixed length feature vectors
3 t-gedl-aleh gdl 2nd person sing. masc. imperfect
(Bratko & King, 1994). These techniques have difficulty
4 t-gedy-alex gdl 2nd person sing. fem. imperfect representing relations, recursion and unbounded
Table 1: Example of Amharic Alternation Rule structural representation (Mooney, 2003). ILP, on the
other hand, employs a rich knowledge representation
language without length constraints. Moreover, the first
3. Machine Learning of Morphology order logic that is used in ILP limits the amount of
Since Koskenniemi’s (1983) ground-breaking work on feature extraction required in other approaches.
two-level morphology, there has been a great deal of In induction, one begins with some data during the
progress in finite-state techniques for encoding training phase, and then determines what general
morphological rules (Beesley & Karttunen, 2003). conclusion can logically be derived from those data. For
However, creating rules by hand is an arduous and time- morphological analysis, the learning data would be
consuming task, especially for a complex language like expected to guide the construction of word formation
Amharic. Furthermore, a knowledge-based system is rules and interactions between the constituents of a
difficult to debug, modify, or adapt to other similar word.
languages. Our experience with HornMorpho (Gasser, There have been only a few attempts to apply ILP to
2011), a rule-based morphological analyser and morphology, and most of these have dealt with
generator for Amharic, Oromo, and Tigrinya, confirms languages with relatively simple morphology handling
this. For these reasons, there is considerable interest in few affixations (Kazakov, 2000; Manandhar et al, 1998;
robust machine learning approaches to morphology, Zdravkova et al, 2005). However, the results are
which extract linguistic knowledge automatically from encouraging.
an annotated or un-annotated corpus. Our work belongs While we focus on Amharic verb morphology, our
to this category. goal is a general-purpose ILP morphology learner. Thus
Morphology learning systems may be unsupervised we seek background knowledge that is plausible across
(Goldsmith, 2001; Hammarström & Borin, 2011; De languages that can be combined with language-specific
Pauw & Wagacha, 2007) or supervised (Oflazer et al examples to yield rule hypotheses that generalize to new
2001; Kazakov, 2000). Unsupervised systems are trained examples in the language.
on unprocessed word forms and have the obvious CLOG is a Prolog based ILP system, developed by
advantage of not requiring segmented data. On the other Manandhar et al (1998)2, for learning first order decision
hand, supervised approaches have important advantages lists (rules) on the basis of positive examples only. A
of their own where they are less dependent on large rule in Prolog is a clause with one or more conditions.
corpora, requires less human effort, relatively fast which The right-hand side of the rule (the body) is a condition
makes it scalable to other languages and that all rules in and the left-hand side of the rule (the head) is the
the language need not be enumerated. conclusion. The operator between the left and the right
Supervised morphology learning systems are usually hand side (the sign ‘:-’) means if. The body of a rule is a
based on two-level morphology. These approaches differ list of goals separated by commas, where commas are
in the level of supervision they use to capture the rules. understood as conjunctions. For a rule to be true, all of
A weakly supervised approach uses word pairs as input its conditions/goals must be evaluated to be true. In the
(Manandhar et al, 1998; Mooney & Califf, 1995; expression below, p is true if q and r are true or if s and t are
Zdravkova et al, 2005). Other systems may require true.
segmentation of input words or an analysis in the form
of a stem or root and a set of grammatical morphemes. 2
CLOG is freely available ILP system at:
http://www-users.cs.york.ac.uk/suresh/CLOG.html )

8
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

p :- q, r. a) Learning stem extraction:


p :- s, t.
p ⇔ (q ᴧ r) ᴠ (s ᴧ t) The background predicate 'set_affix' uses a
Where q, r, s and t could be facts or predicates with any combination of multiple ‘split‟ operations to
arity and p is a predicate with any number of arguments. identify the prefix and suffixes attached to the input
word. This predicate is used to learn the affixes
CLOG relies on output completeness, which assumes from examples presented as in Figure 3 by taking
that every form of an object is included in the example only the Word and the Stem (the first two arguments
and everything else is excluded (Mooney & Califf, from the example).
1995). We preferred CLOG over other ILP systems set_affix(Word, Stem, P1,P2,S1,S2):-
because it requires only positive examples and runs split(Word, P1, W11),
faster than the other variants (Manandhar et al, 1998). split(Stem, P2, W22),
split(W11, X, S1),
CLOG uses a hill climbing strategy to build the rules,
split(W22, X, S2),
starting from a simple goal and iteratively adding more not( (P1=[],P2=[],S1=[],S2=[])).
rules to satisfy the goal until there are no possible
improvements. The evaluation of the rules generated by Figure 4: Affix extraction predicate
the learner is validated using a gain function that The predicate makes all possible splits of Word and
compares the number of positively and negatively Stem into three segments to identify the prefix and
covered examples in the current and previous learning suffix substitutions required to unify Stem with
stages (Manandhar et al, 1998). Word. In this predicate, P1 and S1 are the prefix and
suffix of the Word; while P2 and S2 are the prefix
5. Experiment Setup and Data and suffix of the Stem respectively. For example, if
Learning morphological rules with ILP requires Word and Stem are tgedyalex and gedl respectively,
preparation of the training data and background then the predicate will try all possible splits, and
knowledge. To handle a language of the complexity of one of these splits will result in P1=[t], P2=[],
Amharic, we require background knowledge predicates S1=[yalex] and S2=[l]. That is, tgedyalex will be
that can handle stem extraction by identifying affixes, associated with the stem gedl, if the prefix P1 is
root and vowel identification and grammatical feature replaced with P2 and the suffix S1is replaced with
association with constituents of the word. S2.
The training data used during the experiment is of the The ultimate objective of this predicate is to identify
following form: the prefix and suffix of a word and then extract the
valid stem (Stem) from the input string (Word).
stem([s,e,b,e,r,k,u],[s,e,b,e,r],[s,b,r] [1,1]). Here, we have used the utility predicate ‘split‟ that
stem([s,e,b,e,r,k],[s,e,b,e,r],[s,b,r], [1,2]). segments any input string into all possible pairs of
stem([s,e,b,e,r,x],[s,e,b,e,r],[s,b,r], [1,3]). substrings. For example, the string sebr could be
Figure 3: Sample examples for stem and root learning segmented as {([]-[sebr]), ([s]-[ebr]), ([se]-[br]),
The predicate 'stem' provides a word and its stem to ([seb]-[r]), or ([sebr]-[])}.
permit the extraction of the affixes and root template
structure of the word. The first three parameters specify b) Learning Roots:
the input word, the stem of the word after affixes are The root extraction predicate, 'root_vocal‟, extracts
removed, and the root of the stem respectively. The Root and the Vowel with the right sequence from the
fourth parameter is the codification of the grammatical Stem. This predicate learns the root from examples
features (tense-aspect-mood and subject) of the word. presented as in Figure 3 by taking only the Stem and
Taking the second example in Figure 3, the word the Root (the second and third arguments).
seberk has the stem seber with the root sbr and is
perfective (the first element of the third parameter which root_vocal(Stem,Root,Vowel):-
is 1) with second person singular masculine subject (the merge(Stem,Root,Vowel).
second element of the third parameter is 2). merge([X,Y,Z|T],[X,Y|R],[Z|V]):-
We codified the grammatical features of the words merge(T,R,V).
and made them parameters of the training data set rather merge([X,Y|T],R,[X,Y|V]):-
than representing the morphosyntactic description as merge(T,R,V).
predicates as in approaches used for other languages merge([X|Y],[X|Z],W) :-
(Zdravkova et al, 2005). merge(Y,Z,W).
The background knowledge also includes predicates merge([X|Y],Z,[X|W]) :-
merge(Y,Z,W).
for string manipulation and root extraction. Both are
Figure 5: Root template extraction predicate
language-independent, making the approach adaptable
to other similar languages. We run three separate The predicate ‘root_vocal‟ performs unconstrained
training experiments to learn the stem extraction, root permutation of the characters in the Stem until the
patterns, and internal stem alternation rules. first part of the permutated string matches the Root
character pattern provided during the training. The

9
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

goal of this predicate is to separate the vowels and  ‘feature‟: used to associate the identified affixes
the consonants of a Stem. In this predicate we have and root CV pattern with the known
used the utility predicate ‘merge‟ to perform the grammatical features from the example. This
permutation. For example, if Stem is seber and the predicate uses a codified representation of the
example associates this stem with the Root sbr, then eight subjects and four tense-aspect-mood
„root_temp‟, using ‘merge‟, will generate many features (‘tam’) of Amharic verbs, which is also
patterns, one of which would be sbree. This, encoded as background knowledge. This
ultimately, will learn that the vowel pattern [ee] is predicate is the only language-dependent
valid within a stem. background knowledge we have used in our
implementation.
c) Learning stem internal alternations:
Another challenge for Amharic verb morphology
feature([X,Y],[X1,Y1]):-
learning is handling stem internal alternations. For tam([X],X1),
this purpose, we have used the background subj([Y],Y1).
predicate „set_internal_alter‟: Figure 9: Grammatical feature assignment predicate
set_internal_alter(Stem,Valid_Stem,St1,St2):-
split(Stem,P1,X1), 6. Experiments and Result
split(Valid_Stem,P1,X2),
split(X1,St1,Y1), For CLOG to learn a set of rules, the predicate and
split(X2,St2,Y1). arity for the rules must be provided. Since we are
Figure 6: stem internal alternation extractor learning words by associating them with their stem, root
and grammatical features, we use the predicate schemas
This predicate works much like the ‘set_affix’ rule(stem(_,_,_,_)) for set_affix and root_vocal, and
predicate except that it replaces a substring which is rule(alter(_,_)) for set_internal_alter. The training
found in the middle of Stem by another substring examples are also structured according to these predicate
from Valid_Stem. In order to learn stem alternations, schemas.
we require a different set of training data showing The training set contains 216 manually prepared
examples of stem internal alternations. Figure 7 Amharic verbs. The example contains all possible
shows some sample examples used for learning combinations of tense and subject features. Each word is
such rules. first romanized, then segmented into the stem and
alter([h,e,d],[h,y,e,d]).
grammatical features, as required by the ‘stem‟ predicate
alter([m,o,t],[m,e,w,o,t]). in the background knowledge. When the word results
alter([s,a,m],[s,e,?,a,m]). from the application of one or more alternation rules, the
Figure 7: Examples for internal stem alternation learning stem appears in the canonical form. For example, for the
word gdey, the stem specified is gdel (see the second
The first example in Figure 7 shows that for the example in Table 1).
words hed and hyed to unify, the e in the first Characters in the Amharic orthography represent
argument should be replaced with ye. syllables, hiding the detailed interaction between the
consonants and the vowels. For example, the masculine
Along with the three experiments for learning various
imperative verb ‘ግደል’ gdel can be made feminine by
aspects of verb morphology, we have also used two
adding the suffix ‘i’ (gdel-i). But, in Amharic, when the
utility predicates to support the integration between the
dental ‘l’ is followed by the vowel ‘i’, it is palatalized,
learned rules and to include some language specific
becoming ‘y’. Thus, the feminine form would be written
features. These predicates are ‘template‟ and ‘feature‟:
‘ግደይ’, where the character ‘ይ’ ‘y’ corresponds to the
 ‘template‟: used to extract the valid template for sequence ‘l-i’.
Stem. The predicate manipulates the stem to To perform the romanization, we have used our own
identify positions for the vowels. This predicate Prolog script which maps Amharic characters directly to
uses the list of vowels (vocal) in the language to sequences of roman consonants and vowels, using the
assign ‘0’ for the vowels and ‘1’ for the familiar SERA transliteration scheme. Since the
consonants. mapping is reversible, it is straightforward to convert
template([],[]). extracted forms back to Amharic script.
template([X|T1],[Y|B]):-
template(T1,B), After training the program using the example set,
(vocal(X)->Y=0;Y=1). which took around 58 seconds, 108 rules for affix
Figure 8: CV pattern decoding predicate extraction, 18 rules for root template extraction and 3
For the stem seber this predicate tries each rules for internal stem alternation have been learned. A
character separately and finally generates the sample rule generated for affix identification and
pattern [1,0,1,0,1] and for the stem sebr, it associating the word constituents with the grammatical
generates [1,0,1,1] to show the valid template of features is shown below:
Amharic verbs.

10
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

stem(Word, Stem, [2, 7]):- The above example shows that the suffix that needs to
set_affix(Word, Stem, [y], [], [u], []), be stripped off is [k,u] and that there is an alternation
feature([2, 7], [imperfective, tppn]), rule that changes ‘a’ to ‘?,a’ at the beginning of the
template(Stem, [1, 0, 1, 1]).
word.
Figure 10: Learned affix identification rule example
InputWord: [t, k, e, f, y, a, l, e, x]
The above rule declares that, if the word starts with y Stem: [k, e, f, l]
Template: [1,0, 1, 1]
and ends with u and if the stem extracted from the word Root: [k, f, l]
after stripping off the affixes has a CVCC ([1,0,1,1]) GrammaticalFeature: [imperfective, spsf*]
pattern, then that word is imperfective with third person
plural neutral subject (tppn). Figure 14: Sample Test Result (Internal alternation)
*spsf: second person singular feminine
alter(Stem,Valid_Stem):-
set_internal_alter(Stem,Valid_Stem, [o], [e, w, o]). The above example shows that the prefix and suffix
Figure 11: Learned internal alternation rule example that need to be stripped off are [t] and [a,l,e,x]
respectively and that there is an alternation rule that
The above rule will make a substitution of the vowel o changes ‘y’ to ‘l’ at the end of the stem after removing
in a specific circumstances (which is included in the the suffix.
program) with ewo to transform the initial stem to a The system is able to correctly analyze 1,552 words,
valid stem in the language. For example, if the Stem is resulting in 86.99% accuracy. With the small set of
zor, then o will be replaced with ewo to give zewor. training data, the result is encouraging and we believe
The other part of the program handles formation of that the performance will be enhanced with more
the root of the verb by extracting the template and the training examples of various grammatical combinations.
vowel sequence from the stem. A sample rule generated The wrong analyses and test cases that are not handled
to handle the task looks like the following: by the program are attributed to the absence of such
root(Stem, Root):- examples in the training set and an inappropriate
root_vocal(Stem, Root, [e, e]), alternation rule resulting in multiple analysis of a single
template(Stem, [1, 0, 1, 0, 1]) . test word.
Figure 12: Learned root-template extraction rule example Test Word Stem Root Feature
[s,e,m,a,c,h,u] [s,e,m,a,?] [s,m,?] perfective, sppn
The above rule declares that, as long as the consonant [s,e,m,a,c,h,u] [s,e,y,e,m] [s,y,m] gerundive, sppn
vowel sequence of a word is CVCVC and both vowels [l,e,g,u,m,u] [l,e,g,u,m] NA NA
are e, the stem is a possible valid verb. Our current Table 2: Example of wrong analysis
implementation does not use a dictionary to validate
Table 2 shows some of the wrong analyses and words
whether the verb is an existing word in Amharic.
that are not analyzed at all. The second example shows
Finally, we have combined the background predicates
that an alternation rules has been applied to the stem
used for the three learning tasks and the utility
resulting in wrong analysis (the stem should have been
predicates. We have also integrated all the rules learned
the one in the first example). The last example generated
in each experiment with the background predicates. The
a stem with vowel sequence of ‘eu’ which is not found
integration involves the combination of the predicates in
in any of the training set, categorizing the word in the
the appropriate order: stem analysis followed by internal
not-analyzed category.
stem alternation and root extraction.
After building the program, to test the performance of
the system, we started with verbs in their third person
7. Future work
singular masculine form, selected from the list of verbs ILP has proven to be applicable for word formation
transcribed from the appendix of Armbruster (1908)3. rule extraction for languages with simple rules like
We then inflected the verbs for the eight subjects and English. Our experiment shows that the approach can
four tense-aspect-mood features of Amharic, resulting in also be used for complex languages with more
1,784 distinct verb forms. The following are sample sophisticated background predicates and more examples.
analyses of new verbs that are not part of the training set While Amharic has more prefixes and suffixes for
by the program: various morphological features, our system is limited to
InputWord: [a, t, e, m, k, u] only subject markers. Moreover, all possible
Stem: [?, a, t, e, m] combinations of subject and tense-aspect-mood have
Template: [1,0, 1, 0, 1] been provided in the training examples for the training.
Root: [?, t, m]
GrammaticalFeature: [perfective, fpsn*] This approach is not practical if all the prefix and
suffixes are going to be included in the learning process.
Figure 13: Sample Test Result (with boundary alternation)
One of the limitations observed in ILP for
*fpsn: first person singular neuter
morphology learning is the inability to learn rules from
incomplete examples. In languages such as Amharic,
3
Available online at: http://nlp.amharic.org/resources/lexical/word-lists/verbs/c- there is a range of complex interactions among the
h-armbruster-initia-amharica/ (accessed February 12, 2012).

11
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

different morphemes, but we cannot expect every one of Generative Approach. Ph.D. thesis, Graduate School
the thousands of morpheme combinations to appear in of Texas.
the training set. When examples are limited to only Bratko, I. and King, R. (1994). Applications of Inductive
some of the legal morpheme combinations, CLOG is Logic Programming. SIGART Bull. 5, 1, 43-49.
inadequate because it is not able to use variables as part Dawkins, C. H., (1960). The Fundamentals of Amharic.
of the body of the predicates to be learned. Sudan Interior Mission, Addis Ababa, Ethiopia.
An example of a rule that could be learned from De Pauw, G. and P.W. Wagacha. (2007). Bootstrapping
partial examples is the following: “if a word has the Morphological Analysis of Gĩkũyũ Using Unsuper-
prefix 'te', then the word is passive no matter what the vised Maximum Entropy Learning. Proceedings of the
other morphemes are”. This rule (not learned by our Eighth INTERSPEECH Conference, Antwerp, Bel-
system) is shown in Figure 15. gium.
Gasser, M. (2011). HornMorpho: a system for morpho-
stem(Word, Stem, Root, GrmFeatu):- logical processing of Amharic, Oromo, and Ti-
set_affix(Word, Stem, [t,e], [], S, []), grinya. Conference on Human Language Technology
root_vocal(Stem, Root, [e, e]), for Development, Alexandria, Egypt.
template(Stem, [1, 0, 1, 0, 1]), Goldsmith, J. (2001). The unsupervised learning of
feature(GrmFeatu, [Ten, passive, Sub]). natural language morphology. Computational Lin-
Figure 15: Possible stem analysis rule with partial feature guistics, 27: 153-198.
Hammarström, H. and L. Borin. (2011). Unsupervised
That is, S is one of the valid suffixes, Ten is the Tense, learning of morphology. Computational Linguistics,
and Sub is the subject, which can take any of the 37(2): 309-350.
possible values. Kazakov, D. (2000). Achievements and Prospects of
Moreover, as shown in section 2, in Amharic verbs, Learning Word Morphology with ILP, Learning Lan-
some grammatical information is shown by various guage in Logic, Lecture Notes in Computer Science.
combinations of affixes. The various constraints on the Kazakov, D. and S. Manandhar. (2001). Unsupervised
co-occurrence of affixes are the other problem that needs learning of word segmentation rules with genetic al-
to be tackled. For example, the 2nd person masculine gorithms and inductive logic programming. Machine
singular imperfective suffix aleh can only co-occur with Learning, 43:121–162.
the 2nd person prefix t in words like t-sebr-aleh. At the Koskenniemi, K. (1983). Two-level Morphology: a Gen-
same time, the same prefix can occur with the suffix eral Computational Model for Word-Form Recogni-
alachu for the 2nd person plural imperfective form. To tion and Production. Department of General Linguis-
represent these constraints, we apparently need explicit tics, University of Helsinki, Technical Report No. 11.
predicates that are specific to the particular affix Manandhar, S. , Džeroski, S. and Erjavec, T. (1998).
relationship. However, CLOG is limited to learning only Learning multilingual morphology with CLOG. Pro-
the predicates that it has been provided with. ceedings of Inductive Logic Programming. 8th Inter-
We are currently experimenting with genetic national Workshop in Lecture Notes in Artificial Intel-
programming as a way to learn new predicates based on ligence. Page, David (Eds) pp.135–44. Berlin:
the predicates that are learned using CLOG. Springer-Verlag.
Mooney, R. J. (2003). Machine Learning. Oxford Hand-
8. Conclusion book of Computational Linguistics, Oxford Univer-
We have shown in this paper that ILP can be used to sity Press, pp. 376-394.
fast-track the process of learning morphological rules of Mooney, R. J. and Califf, M.E. (1995). Induction of first-
complex languages like Amharic with a relatively small order decision lists: results on learning the past tense
number of examples. Our implementation goes beyond of English verbs, Journal of Artificial Intelligence Re-
simple affix identification and confronts one of the search, v.3 n.1, p.1-24.
challenges in template morphology by learning the root- Oflazer, K., M. McShane, and S. Nirenburg. (2001).
template extraction as well as stem-internal alternation Bootstrapping morphological analyzers by combining
rule identification exhibited in Amharic and other human elicitation and machine learning. Computa-
Semitic languages. Our implementation also succeeds in tional Linguistics, 27(1):59–85.
learning to relate grammatical features with word Sieber, G. (2005). Automatic Learning Approaches to
constituents. Morphology, University of Tübingen, International
Studies in Computational Linguistics.
9. References Yimam, B. (1995). Yamarigna Sewasiw (Amharic
Grammar). Addis Ababa: EMPDA.
Armbruster, C. H. (1908). Initia Amharic: an Introduc-
Zdravkova, K., A. Ivanovska, S. Dzeroski and T. Er-
tion to Spoken Amharic. Cambridge: Cambridge Uni-
javec, (2005). Learning Rules for Morphological
versity Press.
Analysis and Synthesis of Macedonian Nouns. In
Beesley, K. R. and L. Karttunen. (2003). Finite State
Proceedings of SIKDD 2005, Ljubljana.
Morphology. Stanford, CA, USA: CSLI Publications.
Bender, M. L. (1968). Amharic Verb Morphology: A

View publication stats


12

You might also like