0% found this document useful (0 votes)
167 views

Development of Amharic Morphological Analyzer Using Memory-Based Learning

The document discusses the development of an Amharic morphological analyzer using memory-based learning. It presents the characteristics of the Amharic language and its complex morphology. It then describes the proposed system for morphological analysis using a memory-based supervised machine learning approach.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views

Development of Amharic Morphological Analyzer Using Memory-Based Learning

The document discusses the development of an Amharic morphological analyzer using memory-based learning. It presents the characteristics of the Amharic language and its complex morphology. It then describes the proposed system for morphological analysis using a memory-based supervised machine learning approach.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/300023701

Development of Amharic Morphological Analyzer Using Memory-Based


Learning

Conference Paper · September 2014


DOI: 10.1007/978-3-319-10888-9_1

CITATIONS READS

0 363

2 authors:

Mesfin Abate Yaregal Assabie


Ministry of Communications and Information Technology, Ethiopia Addis Ababa University
1 PUBLICATION   0 CITATIONS    33 PUBLICATIONS   54 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Offline handwritten Amharic word recognition View project

NLP tools for Ethiopian Languages View project

All content following this page was uploaded by Mesfin Abate on 24 December 2018.

The user has requested enhancement of the downloaded file.


Development of Amharic Morphological
Analyzer Using Memory-Based Learning

Mesfin Abate and Yaregal Assabie

Department of Computer Science, Addis Ababa University, Ethiopia


[email protected], [email protected]

Abstract. Morphological analysis of highly inflected languages like Am-


haric is a non-trivial task because of the complexity of the morphology. In
this paper, we propose a supervised data-driven experimental approach
to develop Amharic morphological analyzer. We use a memory-based su-
pervised machine learning method which extrapolates new unseen classes
based on previous examples in memory. We treat morphological analysis
as a classification task which retrieves the grammatical functions and
properties of morphologically inflected words. As the task is geared to-
wards analyzing the vowelled inflected Amharic words with their gram-
matical functions of morphemes, the morphological structure of words
and the way how they are represented in memory-based learning is ex-
haustively investigated. The performance of the model is evaluated using
10-fold cross-validation with IB1 and IGtree algorithms resulting in the
over all accuracy of 93.6% and 82.3%, respectively.

Keywords: Amharic morphology, memory-based learning, morpholog-


ical analysis.

1 Introduction
Morphological analysis helps to find the minimal units of a word which holds lin-
guistic information for further processing. Morphological analysis plays a critical
role in the development of natural language processing (NLP) applications. In
most practical language technology applications, morphological analysis is used
to perform lemmatization in which words can be segmented into its minimal
meaning [11]. In morphologically complex languages, morphological analysis is
also a core component in information retrieval, text summarization, question an-
swering, machine translation, etc. There are two broad categories of approaches
in computational morphology: rule-based and corpus-based. Currently, the most
widely applied rule-based approach to computational morphology uses the two-
level formalism. In rule-based approach, the formulation of rules for languages
makes the development of morphological analysis system costly and time con-
suming [4,11]. Because of a need of hand-crafted rules for the morphology of lan-
guages and intensive requirements of linguistic experts in rule-based approaches,
there is considerable interest in robust machine learning approaches to morphol-
ogy which extracts linguistic knowledge automatically from an annotated or

A. Przepiórkowski and M. Ogrodniczuk (Eds.): PolTAL 2014, LNAI 8686, pp. 1–13, 2014.

c Springer International Publishing Switzerland 2014
2 M. Abate and Y. Assabie

unannotated corpus. Machine learning approaches have two learning paradigm:


unsupervised and supervised learning. Supervised approach learns by example
whereas unsupervised approach is learning by patterns. Machine learning ap-
proaches that use supervised learning paradigm include inductive logic program-
ming (ILP), support vector machine (SVM), hidden Markov model (HMM) and
memory-based learning (MBL). These paradigms have been used to implement
low-level linguistic analysis such as morphological analysis [2,3,7]. Among various
alternatives, the choice of the approach depends on the problem at hand. In this
work, we employed MBL to develop morphological analyzer for Amharic, partly
motivated by the limitations of previous attempts using rule-based [7] and ILP
[10] approaches. Memory-based learning has a promising feature in analyzing
NLP tasks like part-of-speech tagging, text translation, chunking and morpho-
phonology due to its capabilities of in-cremental learning from examples. Among
the MBL algorithms, IB1 and IGtree are known to be popular. Both algorithms
rely on the k nearest neighbor classifier which uses some distance metric to
measure the distance between each neighbor of features [4,5,9].
The remaining part of the paper is organized as follows. Section 2 presents the
characteristics of Amharic language with special emphasis on its morphology. In
Sect. 3, we present the proposed system for morphological analysis. Section 4
presents experimental results, and conclusion and future works are highlighted
in Sect. 5. References are provided at the end.

2 Characteristics of Amharic Language

2.1 The Amharic Language

Amharic is an official working language of Ethiopia and it is widely spoken


throughout the country as a first and a second language. It is a Semitic language
related to Hebrew, Arabic and Aramaic. Amharic is the second most widely
spoken Semitic language, next to Arabic. It uses a unique script called ‘fidel’
which is conveniently written in a tabular format of seven columns. The first
column represents the basic form and the other orders are derived from it by more
or less regular modifications indicating the different vowels. Amharic has 34 base
characters and this leads to have a total of 238 (=34*7) Amharic characters. In
addition, there are about two scores of characters representing labialized sounds.

2.2 Amharic Morphology

Like other Semitic languages, Amharic is one of the most morphologically com-
plex languages. It exhibits a root-pattern morphological phenomenon [1]. Root is
a set of consonants (also called radicals) which has a basic lexical meaning [12].
A pattern consists of a set of vowels which are inserted among the consonants
of a root to form a stem. Semitic languages, particularly Amharic verbal stems,
consist of a ‘root + vowels + template’ merger. For instance, the root verb sbr
+ ee + CVCVC leads to form the stem seber (‘broke’). In addition to such
Development of Amharic Morphological Analyzer 3

non-concatenative morphological features, Amharic uses different affixes to cre-


ate inflectional and derivational morpheme. Affixation can be prefix, infix, suffix
and circumfix. The morphological complexity of the language is better under-
stood by looking at the ford formation process through inflection and derivation.
Amharic nouns are inflected for number, definiteness, cases (accusative/ ob-
jective, possessive/genitive) and gender. Amharic adjectives, in a similar affix-
ation process to that of nouns, can be marked for number, definiteness, cases
and gender. The affixation of morphemes to express numbers is similar with
nouns except with some plural formation. On the other hand, Amharic verbs
are inflected for any combinations of person, gender, number, case, tense/aspect
and mood. As a result of this, tens of thousands of verbs (in surface forms) are
generated from a single verbal root. As verbs are marked for various grammat-
ical units, a single verb can form a complete sentence as shown in the example
yisebreñal (‘he will break me’). This verb (sentence) is analyzed as follows.
verbal root: sbr (‘to break’)
verbal stem: sebr (‘will break’)
subject: yi…al (he)
object: eñ (me)
Amharic nouns can be derived from adjectives, verbal roots (by inserting
vowels between consonants), stems, stem-like verbs and nouns themselves. Few
primary adjectives (which are not derived) exist in the language. However, many
adjectives can be derived from nouns, stems, compound words and verbal roots.
Adjectives can also be derived either from roots by intercalation of vocalic ele-
ments or attaching a suffix to bound stems. Amharic verbs can also be derived
from different verbal stems in many ways.

3 The Proposed Amharic Morphological Analyzer


3.1 System Architecture
As memory-based learning is a machine learning approach, our morphological an-
alyzer contains a training phase which consists of morpheme annotation to manu-
ally annotate inflected Amharic words, feature extraction to create instances in a
fixed length of windows, parameter optimization and algorithm selection to tune
and select some of the parameters and algorithms. On the other hand, the mor-
phological analysis component contains the feature extraction to de-construct a
given text, morpheme identification to classify and extrapolate, stem and root
extraction to label segmented inflected words with their morpheme functions.
The architecture of the proposed Amharic morphological analyzer is depicted
in Fig. 1.

3.2 Training Phase


The training process requires sample patterns of words showing the changes in
the internal structures of words. Amharic morphemes may predominantly be
4 M. Abate and Y. Assabie

Training Phase Morphological Analysis

Inflected Words
Text
Source Document

Morpheme Feature
Annotation Extraction

Morphologically
Morpheme
Feature Identification
Annotated Words
Extraction
Classification

Memory-Based Extrapolation
Learning

Stem
Extraction
Learning
Reconstruction
Model

Morpheme Insertion

Root Extraction

Morphemes
With Functions

Fig. 1. Architecture of the proposed Amharic morphological analyzer

expressed by internal phonological changes in the root. These internal irregular


changes of phonemes make the morphological analysis cumbersome. It is not a
trivial task in finding the roots of Amharic verbs. Hence, we investigated the
morphological formation of Amharic language, particularly nouns and verbs.
Adjectives have similar derivation and inflection process to that of nouns. A
morphological database is built after identifying the common property of all
morphological formations of Amharic nouns (and adjectives) and grammatical
features of all the morphemes. As to Amharic verbs, it is too difficult to find a
single representation or patterns of verbs as they are different in types due to
a number of morphological and phonological processes. Therefore, we consider
the most significant part of the word stem which bears meaning next to the
roots. In Amharic grammar, the stem of a word is the main part which remains
unchanged when the ending changes. Thus, we manually annotate sample words
with their patterns where the data will be used as training data.
Development of Amharic Morphological Analyzer 5

Morpheme Annotation
Amharic nouns have more than 2 and 7 affixes in the prefix and suffix position,
respectively. The affixation is not somehow arbitrary, rather they affix in ordered
manner. An Amharic noun consists of a lexical part, or stem and one or more
grammatical parts. This is easy to see with a noun, for example, the Amharic
noun bEtocacewn (‘their houses’). The lexical part is the stem bEt (‘house’); this
conveys most of the important content in the noun. Since the stem cannot be
broken into smaller meaningful units [8], it is a morpheme (a primitive unit of
meaning). The word contains three grammatical suffixes, each of which provides
information that is more abstract and less crucial to the understanding of the
word than the information provided by the stem: -oc, -acew, and -n. Each of
these suffixes can be seen as providing a value for a particular grammatical
feature (or dimension along which Amharic nouns can vary): -oc (plural marker),
-acew (third person plural neuter), and -n: (accusative). Since each of these
suffixes cannot be broken down further, they can be considered as a morpheme.
Generally, these grammatical morphemes can have a great role in understanding
the semantics of the whole word [7,12].
The following tasks were identified and performed to prepare annotated
datasets used for training: identifying inflected words; segmenting the word into
prefix, stem, suffix ; putting boundary marker between each segment ; and describ-
ing the representation of each marker. Morphemes that are attached next to the
stem (as suffixes) may have seven purposes: plurality/possessions, derivation,
relativazation, definiteness, negation, causative and conjection. The annotation
is according to the prefix-stem-suffix ([P]-[S]-[S]) structure as shown in Table 1.
The bracket ([ ]) can be filled with the appropriate grammatical features for
each segmentation where S, M, 1, K, D, and O indicate end of stem, plural, pos-
session, preposition, derivative and object markers, respectively. Lexicons were
prepared manually in such a way to be suitable for extraction purpose.
Amharic verbs have four slots for prefixes and four slots for suffixes [1,7,10].
The positions of the affixes are shown as follows, where prep is for preposition;
conj is for conjunction; rel is for relativation; neg is for negation; subj is for
subject; appl is for applicative; obj is for objective; def is for definiteness; and
acc is for accusative.

Table 1. Example showing annotation of nouns


6 M. Abate and Y. Assabie

(prep|conj)(rel)(neg) subj STEM subj (appl)(obj|def)(neg|aux|acc)(conj)

In addition to analyzing all these affixes, the root template pattern of Amharic
verbs makes its morphological analysis complex. It is a challenging task repre-
senting its features into suitable memory-based learning approach. Generally,
Amharic verb stems are broken into verb roots and grammatical templates. A
given root can be combined with more than 40 templates [1]. The stem is the lex-
ical part of the verb and also the source of most of its complexity. To consider all
morphologically productive of the verb types, we need a morphologically anno-
tated word list with its possible inflection forms. Then, the tokens are manually
annotated in similar fashion what we did for nouns and adjectives like prefix[],
stem[] and suffix[] pattern. The ‘[]’ can be filled with the appropriate grammat-
ical features for each segmentation. The sample annotation for verbs is shown
in Table 2.

Table 2. Example showing annotation of nouns

Feature Extraction
Once the annotated words are stored in a database, instances are extracted au-
tomatically from the morphological database based on the concept of windowing
method [3] in a fixed length of left and right context. Each instance is associated
with a class. The class represents the morphological category in which the given
word posses. An instance usually consists of a vector of fixed length. The vector
is built up of n feature value pairs depending on the length of the vector. Each
example focuses on one letter, and includes a fixed number of left and right
neighbor letters using 8-1 to 8-1 windows which yields eighteen features. The
largest word length from the manually annotated data base is chosen to be the
length of windows size. The input character in focus, plus the eight preceding
and eight following characters are placed in the windows. Character based anal-
ysis gives concern for each character or letter to be considered. From the basic
annotation, instances were automatically extracted, to be suitable to memory-
based learning by sliding a window over the word in the lexicon. We used the
Algorithm 1 to extract feature based on character analysis.
Development of Amharic Morphological Analyzer 7

Input: Inflected words


Output: extracted features (instances) in a fixed-length of vector size

1. Define the length of window size.


2. Fix the middle positions of arrays as a focus letter (the focus character represents
where a character is started from that position on words).
3. Read from the DB and push one step forward each character until the right context
reached (filled).
4. Put 0(zero) at the class if there is no any special character like @, & and capital let-
ters, next to the characters placed in the focus letter; if any one of those symbols ex-
ist put the value as a class(in last index)
5. Push the previous focus letter to the left and start putting each letter (as in step 3)
6. Go until it finishes that line
7. Go to the next line and repeat 3, 4, 5, 6.

Algorithm 1. Algorithm for character-based feature extraction.

For instance, the character based representation of the word sleseberecw is


shown in Table 3. The ‘=’ sign is used as a filter symbol which shows there is no
character at that position. The construction of instances displays the 11 instances
derived from the Amharic word and its associated classes. The class of the third
instance is ‘K’ representing the preposition morpheme ‘sle’ ending with the prefix
‘e’. Therefore, character based representation of words exhaustively transcribes
their deep structure of phonological process and segments each character one at
a time.

Table 3. Character-based feature extraction of the word sleseberecw


8 M. Abate and Y. Assabie

Memory-Based Learning
Memory-based approaches borrow some of the advantages of both probabilistic
and knowledge-based methods to successfully implement it in NLP tasks [5]. It
performs classification by analogy. In order to learn any NLP classification prob-
lem, different algorithms and concepts are implemented by reusing data struc-
tures. We used TiMBL as a learning tool for our task [3] . There are a number of
parameters to be tuned in memory-based learning using TiMBL. Therefore, to
get an optimal accuracy of the model we used the default settings and also tuned
some of the parameters. The optimized parameters are the MVDM (modified
value difference metric) and chi-square from distance metrics, IG (information
gain) from weighting metrics, ID (inverse distance) from class voting weights,
and k from the nearest neighbor. These optimized parameters are used together
with the different classifiers. The classifier engines we used are IGtree and IB1
which construct databases of instances in memory during the learning process.
The procedure of building an IGtree is described in [6]. Instances are classified
by IGTree or by IBI by matching them to all instances in the instance base. As
a result of this process, we get a memory-based learning model which will be
used later during the morphological analysis phase.

3.3 Morphological Analysis


The training phase is the backbone of the morphological analysis module to
success-fully implement the system. The morphological analysis is implemented
by using the memory-based learning model. Therefore, in this phase, the feature
extraction is used to make the input words to be suitable for memory-based
learning classification, the morpheme identification is applied to classify and
extrapolate the class of new instances, the stem extraction process reconstructs
and inserts identified morphemes, and finally the root extraction is used to get
root forms and stems with their grammatical functions.

Feature Extraction
Memory-based learning learns new instances by storing previous training data
into memory. When a new word is given to be analyzed by the system, it accepts
and de-construct as instances to make similar representation with the one stored
in memory. Feature extraction in this section is different from the one described
in the training phase. The word is deconstructed in a fixed-length of instances
without listing (identifying) the class labels at the last index. For example, when
a new previously unseen word (which is not found in the memory) needs to be
segmented, the words are similarly deconstructed and represented as instances
using the same information. This instance is compared to each and every instance
in the training set, recorded by the memory-based learner. In doing so, the
classifier will try to find training instance in memory that most closely resembles
it. For instance, the word begoc is segmented and its features are extracted as
shown in Fig. 2.
Development of Amharic Morphological Analyzer 9

Fig. 2. Feature extraction for morphological analysis

Morpheme Identification
When new or unknown inflected words are deconstructed as instances and given
to the system to be analyzed, an extrapolation is performed to assign the most
likely neighborhood class with its morphemes based on their boundaries. The
extrapolation is based on the similarity metric applied on the training data. If
there is an exact match on the memory, the classifier returns (extrapolates) the
class of that instance to the new instance. Otherwise, new instance is classified
by analogy in memory with a similar feature vector, and extrapolating a decision
from their class. This instance is compared to each and every instance in the
training set, recorded by the memory-based learner. In doing so, the classifier
tries to find that training instance in memory that most closely resembles it.
Taking the feature of lenegerecw as shown in Fig. 3, this might be instance 10
in Table 3, as they share almost all features (L8, L7, L5, L3-L1, F, R1-R8),
except L6 and L4. In this case, the memory-based learner then extrapolates
the 9 classes of this training instance and predicts it to be the class of the new
instance.

Fig. 3. Instances for the unknown token lenegerecw

Stem Extraction
After appropriate morphemes are identified, the next step is the stem extraction
process. In stem extraction, reconstruction of individual instances into mean-
ingful morphemes (to their original word form) and insertions of identified mor-
phemes in their segmentation point are performed. After stem extraction, the
system searches resembling instances from previously stored patterns in mem-
ory. If there is no similar instance in memory, it uses a distance similarity matrix
to find more nearest neighbor. The modified value difference metric (MVDM)
which looks for the co-occurrence of the values with the target classes is used to
determine the similarity of the value of features. For example, the reconstruction
of the whole instances of the word slenegerecw is shown in Fig. 4. In the exam-
ple, four non-null classes are predicted in the classification step. In the second
10 M. Abate and Y. Assabie

Fig. 4. Reconstruction of the word slenegerecw

step the letter of the morphemic segments are concatenated and morphemes are
inserted. Then, root extraction can be performed in the third step.

Root Extraction
The smallest unit morpheme for nouns and adjectives is the stem. Thus, the root
extraction process will not be applied on nouns and adjectives. Root extraction
in verbal stems is not complex task in Amharic as roots are consonants of verbal
stems. In order to extract the root from verbal stems, we simply remove the
vowels from verbal stems. However, there are exceptions as vowels in some verbal
stems (e.g. when the verbal stems start with vowels) serve as consonants. In
addition, vowels should not be removed from mono and bi-radical verb types
since they have valid meaning when they end with vowels.

4 Experiment
4.1 The Corpus
In order to evaluate the performance of the model and the capability of learn-
ability of the dataset we conducted the experiment by combining nouns and
verbs. To get unbiased estimate of the accuracy of a model learned through
machine learning, it should be tested on unseen data which is not present in
the training set. Therefore, we split our data set into training and testing. The
total number of our corpus contains 1022 words, of which 841 are verbs and
181 are nouns (adjectives are considered as nouns as they have similar analysis).
The number of instances extracted from nouns and adjectives are 1356 and from
verbs are 6719 which accounts a total of 8075 instances. A total of 26 different
class labels occur within these instances.

4.2 Test Results


As discussed in Sect. 3.2, we used TiMBL as a learning tool for Amharic mor-
phological analysis. We also applied IGtree and IB1 algorithms to construct
databases of instances in memory during the learning process. To get an op-
timal accuracy of the model we tuned some of the parameters. The optimized
Development of Amharic Morphological Analyzer 11

parameters are the modified value difference metric and chi-square from distance
metrics, information gain from weighting metrics, inverse distance from class
voting weights, and k from the nearest neighbor. For various combinations of
parameter values, we tune the parameters until no better result is found.
Simply splitting the corpus into a single training and testing set may not
give the best estimate of the system’s performance. Thus, we used 10-fold cross-
validation technique to test the performance of the system with IB1 and IGtree
algorithms. This means that the data is split in ten equal partitions, and each
of these is used once as test set, with the other nine as corresponding train
set. This way, all examples are used at least once as a test item, while keeping
training and test data carefully separated, and the memory-based classifier is
trained each time on 90% of the available training data. We also used leave-
one-out cross-validation for IB1 algorithm, which uses all available data except
one (n-1) example as training material. It tests the classifier on the one held-out
example by repeating it for all examples. However, we found it tame consuming
to use leave-one-out cross-validation for IGtree algorithm. Table 4 shows the
performance of the system for optimized parameters.

Table 4. Test result for Amharic morphological analysis

Evaluation Type of Time taken Space requirement Accuracy


method algorithm (in seconds) (in bytes) (%)
Leave-out-one IB1 30.99000 1,327,460 96.40
IB1 0.82077 1,213,656 93.59
10-fold
IGtree 0.03711 1,136,582 82.26

In memory-based learning the minimum size of the training set to begin with
is not yet specified. However, the size of the training data matters the learning
performance of the algorithm. Hence, it is crucial to draw learning curves in
addition to reporting the experimental results. We perform a series of experi-
ments by systematically increasing amounts of training data up to the currently
available total dataset which is 1022. When drawing a learning curve, in most
cases, the learning can be measured by fixing the number of test data against
which the performance is measured. The learning curve of the system is shown
in Fig. 5.
As compared to previous works, our system performed well and provided
promising results. For example, in the work of Gasser [7], the system (which
is rule-based) does not consider unseen or unknown words. To overcome this
problem, Mulugeta and Gasser [10] developed Amharic morphological analyzer
using inductive logic programming. However, our system still performs better in
terms of accuracy.
12 M. Abate and Y. Assabie

Fig. 5. Learning curve of the system

5 Conclusion and Future Work

Many high-level NLP applications heavily rely on a good morphological ana-


lyzer. Few attempts have been made so far to develop an efficient morphological
analyzer for Amharic. However, due to the complexity of the inherent charac-
teristics of the language, it was found to be difficult. This research work is also
aimed at developing Amharic morphological analyzer using memory-based ap-
proach. Given the promising results, our work adds value in the overall effort
to dealing with the complex problem of developing Amharic morphological an-
alyzer. The performance of our system can be further enhanced by increasing
the training data. Future work is recommended to be directed at looking into
the morpheme segmentation on individual instances. Segmentation on the full
words and insertions of grammatical features in each segmented morphemes is
expected to boost the performance of the system.

References
1. Amsalu, S., Gibbon, D.: Finite state morphology of Amharic. In: Proc. of Inter.
Conf. on Recent Advances in Natural Language Processing, Borovets, pp. 47–51
(2005)
2. Bosch, A., Busserand, B., Canisius, E., Daelemans, W.: An efficient memory-based
morpho-syntactic tagger and parser for Dutch. In: Proc. of the 17th Meeting Comp.
Ling. in the Netherlands, Leuven, Belgium (2007)
3. Bosch, A., Daelemans, W.: Memory-based morphological analysis. In: Proc. of the
37th Annual Meeting of the Association for Computational Linguistics, Strouds-
burg (1999)
4. Clark, A.: Memory-Based Learning of Morphology with Stochastic Transducers.
In: Proc. of the 40th Annual Meeting of the Assoc. for Comp. Ling., Philadelphia
(2002)
Development of Amharic Morphological Analyzer 13

5. Daelemans, W., Bosch, A.: Memory-Based Language Processing. Cambridge Uni-


versity Press, Cambridge (2009)
6. Daelemans, W., Bosch, A., Weijters, T.: IGTree: Using Trees for Compression
and Classification in Lazy Learning Algorithms. Artificial Intelligence Review 11,
407–423 (1997)
7. Gasser, M.: HornMorpho: a system for morphological processing of Amharic,
Oromo, and Tigrinya. In: Proc. of Conf. on Human Lang. Tech. for Dev., Egypt
(2011)
8. Hammarstrom, H., Borin, L.: Unsupervised Learning of Morphology. Computa-
tional Linguistics 37(2), 309–350 (2011)
9. Marsi, E., Bosch, A., Soudi, A.: Memory-based morphological analysis generation
and part-of-speech tagging of Arabic. In: Proc. of the ACL Workshop on Compu-
tational Approaches to Semitic Languages, pp. 1–8 (2005)
10. Mulugeta, W., Gasser, M.: Learning Morphological Rules for Amharic Verbs Using
Inductive Logic Programming. In: Proc. of SALTMIL8/AfLaT (2012)
11. Pauw, G., Schryver, G.: Improving the Computational Morphological Analysis of
a Swahili Corpus for Lexicographic Purposes. Lexikos 18, 303–318 (2008)
12. Yimam, B.: Ye’amarigna sewasew (Amharic Grammar). Eleni Printing Press, Ad-
dis Ababa (2000)

View publication stats

You might also like