0% found this document useful (0 votes)

167 views

Development of Amharic Morphological Analyzer Using Memory-Based Learning

The document discusses the development of an Amharic morphological analyzer using memory-based learning. It presents the characteristics of the Amharic language and its complex morphology. It then describes the proposed system for morphological analysis using a memory-based supervised machine learning approach.

Uploaded by

እያኝ አፊጥጠህ

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

167 views

Development of Amharic Morphological Analyzer Using Memory-Based Learning

Uploaded by

እያኝ አፊጥጠህ

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/300023701

Development of Amharic Morphological Analyzer Using Memory-Based

Learning

Conference Paper · September 2014

DOI: 10.1007/978-3-319-10888-9_1

CITATIONS READS

0 363

2 authors:

Mesfin Abate Yaregal Assabie

Ministry of Communications and Information Technology, Ethiopia Addis Ababa University
1 PUBLICATION 0 CITATIONS 33 PUBLICATIONS 54 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Offline handwritten Amharic word recognition View project

NLP tools for Ethiopian Languages View project

All content following this page was uploaded by Mesfin Abate on 24 December 2018.

The user has requested enhancement of the downloaded file.

Development of Amharic Morphological
Analyzer Using Memory-Based Learning

Mesﬁn Abate and Yaregal Assabie

Department of Computer Science, Addis Ababa University, Ethiopia

[email protected], [email protected]

Abstract. Morphological analysis of highly inﬂected languages like Am-

haric is a non-trivial task because of the complexity of the morphology. In
this paper, we propose a supervised data-driven experimental approach
to develop Amharic morphological analyzer. We use a memory-based su-
pervised machine learning method which extrapolates new unseen classes
based on previous examples in memory. We treat morphological analysis
as a classification task which retrieves the grammatical functions and
properties of morphologically inflected words. As the task is geared to-
wards analyzing the vowelled inflected Amharic words with their gram-
matical functions of morphemes, the morphological structure of words
and the way how they are represented in memory-based learning is ex-
haustively investigated. The performance of the model is evaluated using
10-fold cross-validation with IB1 and IGtree algorithms resulting in the
over all accuracy of 93.6% and 82.3%, respectively.

Keywords: Amharic morphology, memory-based learning, morpholog-

ical analysis.

1 Introduction
Morphological analysis helps to ﬁnd the minimal units of a word which holds lin-
guistic information for further processing. Morphological analysis plays a critical
role in the development of natural language processing (NLP) applications. In
most practical language technology applications, morphological analysis is used
to perform lemmatization in which words can be segmented into its minimal
meaning [11]. In morphologically complex languages, morphological analysis is
also a core component in information retrieval, text summarization, question an-
swering, machine translation, etc. There are two broad categories of approaches
in computational morphology: rule-based and corpus-based. Currently, the most
widely applied rule-based approach to computational morphology uses the two-
level formalism. In rule-based approach, the formulation of rules for languages
makes the development of morphological analysis system costly and time con-
suming [4,11]. Because of a need of hand-crafted rules for the morphology of lan-
guages and intensive requirements of linguistic experts in rule-based approaches,
there is considerable interest in robust machine learning approaches to morphol-
ogy which extracts linguistic knowledge automatically from an annotated or

A. Przepiórkowski and M. Ogrodniczuk (Eds.): PolTAL 2014, LNAI 8686, pp. 1–13, 2014.

c Springer International Publishing Switzerland 2014
2 M. Abate and Y. Assabie

unannotated corpus. Machine learning approaches have two learning paradigm:

unsupervised and supervised learning. Supervised approach learns by example
whereas unsupervised approach is learning by patterns. Machine learning ap-
proaches that use supervised learning paradigm include inductive logic program-
ming (ILP), support vector machine (SVM), hidden Markov model (HMM) and
memory-based learning (MBL). These paradigms have been used to implement
low-level linguistic analysis such as morphological analysis [2,3,7]. Among various
alternatives, the choice of the approach depends on the problem at hand. In this
work, we employed MBL to develop morphological analyzer for Amharic, partly
motivated by the limitations of previous attempts using rule-based [7] and ILP
[10] approaches. Memory-based learning has a promising feature in analyzing
NLP tasks like part-of-speech tagging, text translation, chunking and morpho-
phonology due to its capabilities of in-cremental learning from examples. Among
the MBL algorithms, IB1 and IGtree are known to be popular. Both algorithms
rely on the k nearest neighbor classiﬁer which uses some distance metric to
measure the distance between each neighbor of features [4,5,9].
The remaining part of the paper is organized as follows. Section 2 presents the
characteristics of Amharic language with special emphasis on its morphology. In
Sect. 3, we present the proposed system for morphological analysis. Section 4
presents experimental results, and conclusion and future works are highlighted
in Sect. 5. References are provided at the end.

2 Characteristics of Amharic Language

2.1 The Amharic Language

Amharic is an oﬃcial working language of Ethiopia and it is widely spoken

throughout the country as a first and a second language. It is a Semitic language
related to Hebrew, Arabic and Aramaic. Amharic is the second most widely
spoken Semitic language, next to Arabic. It uses a unique script called ‘fidel’
which is conveniently written in a tabular format of seven columns. The first
column represents the basic form and the other orders are derived from it by more
or less regular modifications indicating the different vowels. Amharic has 34 base
characters and this leads to have a total of 238 (=34*7) Amharic characters. In
addition, there are about two scores of characters representing labialized sounds.

2.2 Amharic Morphology

Like other Semitic languages, Amharic is one of the most morphologically com-
plex languages. It exhibits a root-pattern morphological phenomenon [1]. Root is
a set of consonants (also called radicals) which has a basic lexical meaning [12].
A pattern consists of a set of vowels which are inserted among the consonants
of a root to form a stem. Semitic languages, particularly Amharic verbal stems,
consist of a ‘root + vowels + template’ merger. For instance, the root verb sbr
+ ee + CVCVC leads to form the stem seber (‘broke’). In addition to such
Development of Amharic Morphological Analyzer 3

non-concatenative morphological features, Amharic uses diﬀerent aﬃxes to cre-

ate inflectional and derivational morpheme. Affixation can be prefix, infix, suffix
and circumfix. The morphological complexity of the language is better under-
stood by looking at the ford formation process through inflection and derivation.
Amharic nouns are inflected for number, definiteness, cases (accusative/ ob-
jective, possessive/genitive) and gender. Amharic adjectives, in a similar affix-
ation process to that of nouns, can be marked for number, definiteness, cases
and gender. The affixation of morphemes to express numbers is similar with
nouns except with some plural formation. On the other hand, Amharic verbs
are inflected for any combinations of person, gender, number, case, tense/aspect
and mood. As a result of this, tens of thousands of verbs (in surface forms) are
generated from a single verbal root. As verbs are marked for various grammat-
ical units, a single verb can form a complete sentence as shown in the example
yisebreñal (‘he will break me’). This verb (sentence) is analyzed as follows.
verbal root: sbr (‘to break’)
verbal stem: sebr (‘will break’)
subject: yi…al (he)
object: eñ (me)
Amharic nouns can be derived from adjectives, verbal roots (by inserting
vowels between consonants), stems, stem-like verbs and nouns themselves. Few
primary adjectives (which are not derived) exist in the language. However, many
adjectives can be derived from nouns, stems, compound words and verbal roots.
Adjectives can also be derived either from roots by intercalation of vocalic ele-
ments or attaching a suffix to bound stems. Amharic verbs can also be derived
from different verbal stems in many ways.

3 The Proposed Amharic Morphological Analyzer

3.1 System Architecture
As memory-based learning is a machine learning approach, our morphological an-
alyzer contains a training phase which consists of morpheme annotation to manu-
ally annotate inflected Amharic words, feature extraction to create instances in a
fixed length of windows, parameter optimization and algorithm selection to tune
and select some of the parameters and algorithms. On the other hand, the mor-
phological analysis component contains the feature extraction to de-construct a
given text, morpheme identification to classify and extrapolate, stem and root
extraction to label segmented inflected words with their morpheme functions.
The architecture of the proposed Amharic morphological analyzer is depicted
in Fig. 1.

3.2 Training Phase

The training process requires sample patterns of words showing the changes in
the internal structures of words. Amharic morphemes may predominantly be
4 M. Abate and Y. Assabie

Training Phase Morphological Analysis

Inflected Words
Text
Source Document

Morpheme Feature
Annotation Extraction

Morphologically
Morpheme
Feature Identification
Annotated Words
Extraction
Classification

Memory-Based Extrapolation
Learning

Stem
Extraction
Learning
Reconstruction
Model

Morpheme Insertion

Root Extraction

Morphemes
With Functions

Fig. 1. Architecture of the proposed Amharic morphological analyzer

expressed by internal phonological changes in the root. These internal irregular

changes of phonemes make the morphological analysis cumbersome. It is not a
trivial task in finding the roots of Amharic verbs. Hence, we investigated the
morphological formation of Amharic language, particularly nouns and verbs.
Adjectives have similar derivation and inflection process to that of nouns. A
morphological database is built after identifying the common property of all
morphological formations of Amharic nouns (and adjectives) and grammatical
features of all the morphemes. As to Amharic verbs, it is too difficult to find a
single representation or patterns of verbs as they are different in types due to
a number of morphological and phonological processes. Therefore, we consider
the most significant part of the word stem which bears meaning next to the
roots. In Amharic grammar, the stem of a word is the main part which remains
unchanged when the ending changes. Thus, we manually annotate sample words
with their patterns where the data will be used as training data.
Development of Amharic Morphological Analyzer 5

Morpheme Annotation
Amharic nouns have more than 2 and 7 affixes in the prefix and suffix position,
respectively. The affixation is not somehow arbitrary, rather they affix in ordered
manner. An Amharic noun consists of a lexical part, or stem and one or more
grammatical parts. This is easy to see with a noun, for example, the Amharic
noun bEtocacewn (‘their houses’). The lexical part is the stem bEt (‘house’); this
conveys most of the important content in the noun. Since the stem cannot be
broken into smaller meaningful units [8], it is a morpheme (a primitive unit of
meaning). The word contains three grammatical suffixes, each of which provides
information that is more abstract and less crucial to the understanding of the
word than the information provided by the stem: -oc, -acew, and -n. Each of
these suffixes can be seen as providing a value for a particular grammatical
feature (or dimension along which Amharic nouns can vary): -oc (plural marker),
-acew (third person plural neuter), and -n: (accusative). Since each of these
suffixes cannot be broken down further, they can be considered as a morpheme.
Generally, these grammatical morphemes can have a great role in understanding
the semantics of the whole word [7,12].
The following tasks were identified and performed to prepare annotated
datasets used for training: identifying inflected words; segmenting the word into
prefix, stem, suffix ; putting boundary marker between each segment ; and describ-
ing the representation of each marker. Morphemes that are attached next to the
stem (as suffixes) may have seven purposes: plurality/possessions, derivation,
relativazation, definiteness, negation, causative and conjection. The annotation
is according to the prefix-stem-suffix ([P]-[S]-[S]) structure as shown in Table 1.
The bracket ([ ]) can be filled with the appropriate grammatical features for
each segmentation where S, M, 1, K, D, and O indicate end of stem, plural, pos-
session, preposition, derivative and object markers, respectively. Lexicons were
prepared manually in such a way to be suitable for extraction purpose.
Amharic verbs have four slots for prefixes and four slots for suffixes [1,7,10].
The positions of the affixes are shown as follows, where prep is for preposition;
conj is for conjunction; rel is for relativation; neg is for negation; subj is for
subject; appl is for applicative; obj is for objective; def is for definiteness; and
acc is for accusative.

Table 1. Example showing annotation of nouns

6 M. Abate and Y. Assabie

(prep|conj)(rel)(neg) subj STEM subj (appl)(obj|def)(neg|aux|acc)(conj)

In addition to analyzing all these affixes, the root template pattern of Amharic
verbs makes its morphological analysis complex. It is a challenging task repre-
senting its features into suitable memory-based learning approach. Generally,
Amharic verb stems are broken into verb roots and grammatical templates. A
given root can be combined with more than 40 templates [1]. The stem is the lex-
ical part of the verb and also the source of most of its complexity. To consider all
morphologically productive of the verb types, we need a morphologically anno-
tated word list with its possible inflection forms. Then, the tokens are manually
annotated in similar fashion what we did for nouns and adjectives like prefix[],
stem[] and suffix[] pattern. The ‘[]’ can be filled with the appropriate grammat-
ical features for each segmentation. The sample annotation for verbs is shown
in Table 2.

Table 2. Example showing annotation of nouns

Feature Extraction
Once the annotated words are stored in a database, instances are extracted au-
tomatically from the morphological database based on the concept of windowing
method [3] in a fixed length of left and right context. Each instance is associated
with a class. The class represents the morphological category in which the given
word posses. An instance usually consists of a vector of fixed length. The vector
is built up of n feature value pairs depending on the length of the vector. Each
example focuses on one letter, and includes a fixed number of left and right
neighbor letters using 8-1 to 8-1 windows which yields eighteen features. The
largest word length from the manually annotated data base is chosen to be the
length of windows size. The input character in focus, plus the eight preceding
and eight following characters are placed in the windows. Character based anal-
ysis gives concern for each character or letter to be considered. From the basic
annotation, instances were automatically extracted, to be suitable to memory-
based learning by sliding a window over the word in the lexicon. We used the
Algorithm 1 to extract feature based on character analysis.
Development of Amharic Morphological Analyzer 7

Input: Inflected words

Output: extracted features (instances) in a fixed-length of vector size

1. Define the length of window size.

2. Fix the middle positions of arrays as a focus letter (the focus character represents
where a character is started from that position on words).
3. Read from the DB and push one step forward each character until the right context
reached (filled).
4. Put 0(zero) at the class if there is no any special character like @, & and capital let-
ters, next to the characters placed in the focus letter; if any one of those symbols ex-
ist put the value as a class(in last index)
5. Push the previous focus letter to the left and start putting each letter (as in step 3)
6. Go until it finishes that line
7. Go to the next line and repeat 3, 4, 5, 6.

Algorithm 1. Algorithm for character-based feature extraction.

For instance, the character based representation of the word sleseberecw is

shown in Table 3. The ‘=’ sign is used as a ﬁlter symbol which shows there is no
character at that position. The construction of instances displays the 11 instances
derived from the Amharic word and its associated classes. The class of the third
instance is ‘K’ representing the preposition morpheme ‘sle’ ending with the preﬁx
‘e’. Therefore, character based representation of words exhaustively transcribes
their deep structure of phonological process and segments each character one at
a time.

Table 3. Character-based feature extraction of the word sleseberecw

8 M. Abate and Y. Assabie

Memory-Based Learning
Memory-based approaches borrow some of the advantages of both probabilistic
and knowledge-based methods to successfully implement it in NLP tasks [5]. It
performs classification by analogy. In order to learn any NLP classification prob-
lem, different algorithms and concepts are implemented by reusing data struc-
tures. We used TiMBL as a learning tool for our task [3] . There are a number of
parameters to be tuned in memory-based learning using TiMBL. Therefore, to
get an optimal accuracy of the model we used the default settings and also tuned
some of the parameters. The optimized parameters are the MVDM (modified
value difference metric) and chi-square from distance metrics, IG (information
gain) from weighting metrics, ID (inverse distance) from class voting weights,
and k from the nearest neighbor. These optimized parameters are used together
with the different classifiers. The classifier engines we used are IGtree and IB1
which construct databases of instances in memory during the learning process.
The procedure of building an IGtree is described in [6]. Instances are classified
by IGTree or by IBI by matching them to all instances in the instance base. As
a result of this process, we get a memory-based learning model which will be
used later during the morphological analysis phase.

3.3 Morphological Analysis

The training phase is the backbone of the morphological analysis module to
success-fully implement the system. The morphological analysis is implemented
by using the memory-based learning model. Therefore, in this phase, the feature
extraction is used to make the input words to be suitable for memory-based
learning classification, the morpheme identification is applied to classify and
extrapolate the class of new instances, the stem extraction process reconstructs
and inserts identified morphemes, and finally the root extraction is used to get
root forms and stems with their grammatical functions.

Feature Extraction
Memory-based learning learns new instances by storing previous training data
into memory. When a new word is given to be analyzed by the system, it accepts
and de-construct as instances to make similar representation with the one stored
in memory. Feature extraction in this section is different from the one described
in the training phase. The word is deconstructed in a fixed-length of instances
without listing (identifying) the class labels at the last index. For example, when
a new previously unseen word (which is not found in the memory) needs to be
segmented, the words are similarly deconstructed and represented as instances
using the same information. This instance is compared to each and every instance
in the training set, recorded by the memory-based learner. In doing so, the
classifier will try to find training instance in memory that most closely resembles
it. For instance, the word begoc is segmented and its features are extracted as
shown in Fig. 2.
Development of Amharic Morphological Analyzer 9

Fig. 2. Feature extraction for morphological analysis

Morpheme Identification
When new or unknown inflected words are deconstructed as instances and given
to the system to be analyzed, an extrapolation is performed to assign the most
likely neighborhood class with its morphemes based on their boundaries. The
extrapolation is based on the similarity metric applied on the training data. If
there is an exact match on the memory, the classifier returns (extrapolates) the
class of that instance to the new instance. Otherwise, new instance is classified
by analogy in memory with a similar feature vector, and extrapolating a decision
from their class. This instance is compared to each and every instance in the
training set, recorded by the memory-based learner. In doing so, the classifier
tries to find that training instance in memory that most closely resembles it.
Taking the feature of lenegerecw as shown in Fig. 3, this might be instance 10
in Table 3, as they share almost all features (L8, L7, L5, L3-L1, F, R1-R8),
except L6 and L4. In this case, the memory-based learner then extrapolates
the 9 classes of this training instance and predicts it to be the class of the new
instance.

Fig. 3. Instances for the unknown token lenegerecw

Stem Extraction
After appropriate morphemes are identified, the next step is the stem extraction
process. In stem extraction, reconstruction of individual instances into mean-
ingful morphemes (to their original word form) and insertions of identified mor-
phemes in their segmentation point are performed. After stem extraction, the
system searches resembling instances from previously stored patterns in mem-
ory. If there is no similar instance in memory, it uses a distance similarity matrix
to find more nearest neighbor. The modified value difference metric (MVDM)
which looks for the co-occurrence of the values with the target classes is used to
determine the similarity of the value of features. For example, the reconstruction
of the whole instances of the word slenegerecw is shown in Fig. 4. In the exam-
ple, four non-null classes are predicted in the classification step. In the second
10 M. Abate and Y. Assabie

Fig. 4. Reconstruction of the word slenegerecw

step the letter of the morphemic segments are concatenated and morphemes are
inserted. Then, root extraction can be performed in the third step.

Root Extraction
The smallest unit morpheme for nouns and adjectives is the stem. Thus, the root
extraction process will not be applied on nouns and adjectives. Root extraction
in verbal stems is not complex task in Amharic as roots are consonants of verbal
stems. In order to extract the root from verbal stems, we simply remove the
vowels from verbal stems. However, there are exceptions as vowels in some verbal
stems (e.g. when the verbal stems start with vowels) serve as consonants. In
addition, vowels should not be removed from mono and bi-radical verb types
since they have valid meaning when they end with vowels.

4 Experiment
4.1 The Corpus
In order to evaluate the performance of the model and the capability of learn-
ability of the dataset we conducted the experiment by combining nouns and
verbs. To get unbiased estimate of the accuracy of a model learned through
machine learning, it should be tested on unseen data which is not present in
the training set. Therefore, we split our data set into training and testing. The
total number of our corpus contains 1022 words, of which 841 are verbs and
181 are nouns (adjectives are considered as nouns as they have similar analysis).
The number of instances extracted from nouns and adjectives are 1356 and from
verbs are 6719 which accounts a total of 8075 instances. A total of 26 diﬀerent
class labels occur within these instances.

4.2 Test Results

As discussed in Sect. 3.2, we used TiMBL as a learning tool for Amharic mor-
phological analysis. We also applied IGtree and IB1 algorithms to construct
databases of instances in memory during the learning process. To get an op-
timal accuracy of the model we tuned some of the parameters. The optimized
Development of Amharic Morphological Analyzer 11

parameters are the modified value difference metric and chi-square from distance
metrics, information gain from weighting metrics, inverse distance from class
voting weights, and k from the nearest neighbor. For various combinations of
parameter values, we tune the parameters until no better result is found.
Simply splitting the corpus into a single training and testing set may not
give the best estimate of the system’s performance. Thus, we used 10-fold cross-
validation technique to test the performance of the system with IB1 and IGtree
algorithms. This means that the data is split in ten equal partitions, and each
of these is used once as test set, with the other nine as corresponding train
set. This way, all examples are used at least once as a test item, while keeping
training and test data carefully separated, and the memory-based classifier is
trained each time on 90% of the available training data. We also used leave-
one-out cross-validation for IB1 algorithm, which uses all available data except
one (n-1) example as training material. It tests the classifier on the one held-out
example by repeating it for all examples. However, we found it tame consuming
to use leave-one-out cross-validation for IGtree algorithm. Table 4 shows the
performance of the system for optimized parameters.

Table 4. Test result for Amharic morphological analysis

Evaluation Type of Time taken Space requirement Accuracy

method algorithm (in seconds) (in bytes) (%)
Leave-out-one IB1 30.99000 1,327,460 96.40
IB1 0.82077 1,213,656 93.59
10-fold
IGtree 0.03711 1,136,582 82.26

In memory-based learning the minimum size of the training set to begin with
is not yet speciﬁed. However, the size of the training data matters the learning
performance of the algorithm. Hence, it is crucial to draw learning curves in
addition to reporting the experimental results. We perform a series of experi-
ments by systematically increasing amounts of training data up to the currently
available total dataset which is 1022. When drawing a learning curve, in most
cases, the learning can be measured by ﬁxing the number of test data against
which the performance is measured. The learning curve of the system is shown
in Fig. 5.
As compared to previous works, our system performed well and provided
promising results. For example, in the work of Gasser [7], the system (which
is rule-based) does not consider unseen or unknown words. To overcome this
problem, Mulugeta and Gasser [10] developed Amharic morphological analyzer
using inductive logic programming. However, our system still performs better in
terms of accuracy.
12 M. Abate and Y. Assabie

Fig. 5. Learning curve of the system

5 Conclusion and Future Work

Many high-level NLP applications heavily rely on a good morphological ana-

lyzer. Few attempts have been made so far to develop an efficient morphological
analyzer for Amharic. However, due to the complexity of the inherent charac-
teristics of the language, it was found to be difficult. This research work is also
aimed at developing Amharic morphological analyzer using memory-based ap-
proach. Given the promising results, our work adds value in the overall effort
to dealing with the complex problem of developing Amharic morphological an-
alyzer. The performance of our system can be further enhanced by increasing
the training data. Future work is recommended to be directed at looking into
the morpheme segmentation on individual instances. Segmentation on the full
words and insertions of grammatical features in each segmented morphemes is
expected to boost the performance of the system.

References
1. Amsalu, S., Gibbon, D.: Finite state morphology of Amharic. In: Proc. of Inter.
Conf. on Recent Advances in Natural Language Processing, Borovets, pp. 47–51
(2005)
2. Bosch, A., Busserand, B., Canisius, E., Daelemans, W.: An eﬃcient memory-based
morpho-syntactic tagger and parser for Dutch. In: Proc. of the 17th Meeting Comp.
Ling. in the Netherlands, Leuven, Belgium (2007)
3. Bosch, A., Daelemans, W.: Memory-based morphological analysis. In: Proc. of the
37th Annual Meeting of the Association for Computational Linguistics, Strouds-
burg (1999)
4. Clark, A.: Memory-Based Learning of Morphology with Stochastic Transducers.
In: Proc. of the 40th Annual Meeting of the Assoc. for Comp. Ling., Philadelphia
(2002)
Development of Amharic Morphological Analyzer 13

5. Daelemans, W., Bosch, A.: Memory-Based Language Processing. Cambridge Uni-

versity Press, Cambridge (2009)
6. Daelemans, W., Bosch, A., Weijters, T.: IGTree: Using Trees for Compression
and Classiﬁcation in Lazy Learning Algorithms. Artiﬁcial Intelligence Review 11,
407–423 (1997)
7. Gasser, M.: HornMorpho: a system for morphological processing of Amharic,
Oromo, and Tigrinya. In: Proc. of Conf. on Human Lang. Tech. for Dev., Egypt
(2011)
8. Hammarstrom, H., Borin, L.: Unsupervised Learning of Morphology. Computa-
tional Linguistics 37(2), 309–350 (2011)
9. Marsi, E., Bosch, A., Soudi, A.: Memory-based morphological analysis generation
and part-of-speech tagging of Arabic. In: Proc. of the ACL Workshop on Compu-
tational Approaches to Semitic Languages, pp. 1–8 (2005)
10. Mulugeta, W., Gasser, M.: Learning Morphological Rules for Amharic Verbs Using
Inductive Logic Programming. In: Proc. of SALTMIL8/AfLaT (2012)
11. Pauw, G., Schryver, G.: Improving the Computational Morphological Analysis of
a Swahili Corpus for Lexicographic Purposes. Lexikos 18, 303–318 (2008)
12. Yimam, B.: Ye’amarigna sewasew (Amharic Grammar). Eleni Printing Press, Ad-
dis Ababa (2000)

View publication stats

Grammar Handbook For Primary 3 and 4
100% (7)
Grammar Handbook For Primary 3 and 4
278 pages
LATI BATCH 11
No ratings yet
LATI BATCH 11
39 pages
The Art of Syntax (Voigt)
No ratings yet
The Art of Syntax (Voigt)
13 pages
Nelson Grammar Pupil Book 6 Look Inside
56% (9)
Nelson Grammar Pupil Book 6 Look Inside
2 pages
Cambridge English Preliminary For Schools Teachers Handbook PDF
50% (2)
Cambridge English Preliminary For Schools Teachers Handbook PDF
57 pages
SentenceWriting PDF
100% (11)
SentenceWriting PDF
100 pages
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
No ratings yet
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
7 pages
Amharic Part-of-Speech Tagger For Factored Language Modeling
No ratings yet
Amharic Part-of-Speech Tagger For Factored Language Modeling
7 pages
Amharic Document Representation For Adhoc Retrieval: Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie
No ratings yet
Amharic Document Representation For Adhoc Retrieval: Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie
13 pages
An Application Oriented Arabic Morphological Analyzer
No ratings yet
An Application Oriented Arabic Morphological Analyzer
13 pages
Statistical Semantics: Fundamentals and Applications
From Everand
Statistical Semantics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture03-03
No ratings yet
Lecture03-03
7 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Development of Amharic Grammar Checker Using Morphological Features of Words and N-Gram Based Probabilistic Methods
No ratings yet
Development of Amharic Grammar Checker Using Morphological Features of Words and N-Gram Based Probabilistic Methods
7 pages
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
No ratings yet
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
6 pages
Morphological Analyzer For Amharic Language Using Rule Based and Neural Network
No ratings yet
Morphological Analyzer For Amharic Language Using Rule Based and Neural Network
7 pages
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYBRID APPROACH
No ratings yet
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYBRID APPROACH
12 pages
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
No ratings yet
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
12 pages
Unsupervised Learning of The Morphology PDF
No ratings yet
Unsupervised Learning of The Morphology PDF
46 pages
Unsupervised Learning of The Morphology of A Natural Language
No ratings yet
Unsupervised Learning of The Morphology of A Natural Language
46 pages
Morphological Processing of Semitic Languages
No ratings yet
Morphological Processing of Semitic Languages
14 pages
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Bidirectional Lstmbased Morphological Analyzer For Gujarati
No ratings yet
Bidirectional Lstmbased Morphological Analyzer For Gujarati
17 pages
Demorphy, German Language Morphological Analyzer
No ratings yet
Demorphy, German Language Morphological Analyzer
7 pages
Yitayal Abate
No ratings yet
Yitayal Abate
117 pages
Against The Consonantal Root in Tashlhit
No ratings yet
Against The Consonantal Root in Tashlhit
30 pages
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
Gasser-HornMorpho-A System Morphological Processing of Amharic-Conference On Human Language Technology For Development, Alexandria, Egypt-2011 PDF
No ratings yet
Gasser-HornMorpho-A System Morphological Processing of Amharic-Conference On Human Language Technology For Development, Alexandria, Egypt-2011 PDF
6 pages
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Development of An Amharic Text-to-Speech System PDF
No ratings yet
Development of An Amharic Text-to-Speech System PDF
7 pages
The Complexities of Morphology
No ratings yet
The Complexities of Morphology
411 pages
Automatic Relation Extraction Between Entities For Amharic Text
No ratings yet
Automatic Relation Extraction Between Entities For Amharic Text
12 pages
Development of Amharic Grammar Checker Using Morphological
50% (2)
Development of Amharic Grammar Checker Using Morphological
97 pages
Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach
No ratings yet
Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach
8 pages
Thesis Review On Morophological Analyzer For Geez Verbs
No ratings yet
Thesis Review On Morophological Analyzer For Geez Verbs
13 pages
Automatic Generation of Stopwords
No ratings yet
Automatic Generation of Stopwords
10 pages
7i4feed Forward Back Propagation Neural Network Method For Arabic Vowel Recognition Based On Wavelet Linear Prediction Coding Copyright Ijaet
No ratings yet
7i4feed Forward Back Propagation Neural Network Method For Arabic Vowel Recognition Based On Wavelet Linear Prediction Coding Copyright Ijaet
11 pages
Lecture 02
No ratings yet
Lecture 02
44 pages
A Lexicon of Arabic Verbs Constructed On
No ratings yet
A Lexicon of Arabic Verbs Constructed On
9 pages
Graph-Based Morphological Analysis
No ratings yet
Graph-Based Morphological Analysis
4 pages
Seminar Guidline
No ratings yet
Seminar Guidline
13 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Lecture 02
No ratings yet
Lecture 02
44 pages
Linguistic Search
No ratings yet
Linguistic Search
4 pages
Morp
No ratings yet
Morp
30 pages
Elixir Thesis
No ratings yet
Elixir Thesis
107 pages
Functions of Arabic Morphology
No ratings yet
Functions of Arabic Morphology
107 pages
S5-Automatic Arabic Text Summarisation System (AATSS) Based On Morphological Analysis
No ratings yet
S5-Automatic Arabic Text Summarisation System (AATSS) Based On Morphological Analysis
9 pages
Computational Morphological Analysis of Yorùbá Language Words
No ratings yet
Computational Morphological Analysis of Yorùbá Language Words
8 pages
0b1e7d36a1e0be3bb0ef0e77eccca89d8e56
No ratings yet
0b1e7d36a1e0be3bb0ef0e77eccca89d8e56
7 pages
The Morphological Analysis of Arabic Verbs by Using The Surface Patterns
No ratings yet
The Morphological Analysis of Arabic Verbs by Using The Surface Patterns
4 pages
Automatic Amharic Text News Classification: Aneural Networks Approach
No ratings yet
Automatic Amharic Text News Classification: Aneural Networks Approach
11 pages
A Framework To Automate The Parsing of Arabic Language Sentences
No ratings yet
A Framework To Automate The Parsing of Arabic Language Sentences
7 pages
Terminology Extraction: Fundamentals and Applications
From Everand
Terminology Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Rabra Hierpa - Article Review
No ratings yet
Rabra Hierpa - Article Review
7 pages
Wint Ner 2014
No ratings yet
Wint Ner 2014
24 pages
Yirdaw 2012
No ratings yet
Yirdaw 2012
8 pages
AI Assignment 1
No ratings yet
AI Assignment 1
31 pages
A Fast Morphological Algorithm With Unknown Word Guessing Induced by A Dictionary For A Web Search Engine
No ratings yet
A Fast Morphological Algorithm With Unknown Word Guessing Induced by A Dictionary For A Web Search Engine
8 pages
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
No ratings yet
3 Principles of Morphological Analysis Basics of Morphological Analysis Basics
7 pages
Research Study on Morphology
No ratings yet
Research Study on Morphology
4 pages
Natural Language Processing: Fundamentals and Applications
From Everand
Natural Language Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Andargachew Mekonnen Gezmu
No ratings yet
Andargachew Mekonnen Gezmu
113 pages
SV Engl 2007 03 23
No ratings yet
SV Engl 2007 03 23
374 pages
Lrec2014 Developing Egyptian Arabic Treebank PDF
No ratings yet
Lrec2014 Developing Egyptian Arabic Treebank PDF
7 pages
Owi Bre l03 U08 Ak Workbook
No ratings yet
Owi Bre l03 U08 Ak Workbook
12 pages
19661
No ratings yet
19661
40 pages
English Modal Verbs
No ratings yet
English Modal Verbs
10 pages
Lesson Parallelism PDF
No ratings yet
Lesson Parallelism PDF
2 pages
Our Findings Show
No ratings yet
Our Findings Show
35 pages
150 Test Yuxari Kurs 2024 - 123623
No ratings yet
150 Test Yuxari Kurs 2024 - 123623
7 pages
Error Recognition
No ratings yet
Error Recognition
7 pages
Academic Words
No ratings yet
Academic Words
408 pages
50 Most Used Phrasal Verbs
No ratings yet
50 Most Used Phrasal Verbs
44 pages
Grammar Connect
No ratings yet
Grammar Connect
59 pages
Unit 16 Noun Clause
No ratings yet
Unit 16 Noun Clause
3 pages
Duolingo Tips and Notes at Duome - Eu
No ratings yet
Duolingo Tips and Notes at Duome - Eu
104 pages
Handout 4º Básico Unidad 1 Inglés
No ratings yet
Handout 4º Básico Unidad 1 Inglés
10 pages
College of Teacher Education: Mariano Marcos State University Laoag City
No ratings yet
College of Teacher Education: Mariano Marcos State University Laoag City
13 pages
Direct Object vs. Indirect Object
No ratings yet
Direct Object vs. Indirect Object
45 pages
2024-BASIC-ENGLISH-COURSE-2-N-1
No ratings yet
2024-BASIC-ENGLISH-COURSE-2-N-1
68 pages
Vocabulary For Success
No ratings yet
Vocabulary For Success
21 pages
MacDonell - Sanskrit English Dictionary 50
No ratings yet
MacDonell - Sanskrit English Dictionary 50
202 pages
Clause Analysis
No ratings yet
Clause Analysis
3 pages
CA Presentation
No ratings yet
CA Presentation
96 pages
Language Analysis Lesson Plan 3
No ratings yet
Language Analysis Lesson Plan 3
3 pages
Detailed Teaching Syllabus (DTS) and Instructors Guide (Ig'S)
No ratings yet
Detailed Teaching Syllabus (DTS) and Instructors Guide (Ig'S)
12 pages
Grammar 1
No ratings yet
Grammar 1
4 pages
Transformation of Sentence
No ratings yet
Transformation of Sentence
33 pages

Development of Amharic Morphological Analyzer Using Memory-Based Learning

Uploaded by

Development of Amharic Morphological Analyzer Using Memory-Based Learning

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Development of Amharic Morphological Analyzer Using Memory-Based

Conference Paper · September 2014

Mesfin Abate Yaregal Assabie

SEE PROFILE SEE PROFILE

Offline handwritten Amharic word recognition View project

NLP tools for Ethiopian Languages View project

The user has requested enhancement of the downloaded file.

Mesﬁn Abate and Yaregal Assabie

Department of Computer Science, Addis Ababa University, Ethiopia

Abstract. Morphological analysis of highly inﬂected languages like Am-

Keywords: Amharic morphology, memory-based learning, morpholog-

unannotated corpus. Machine learning approaches have two learning paradigm:

2 Characteristics of Amharic Language

2.1 The Amharic Language

Amharic is an oﬃcial working language of Ethiopia and it is widely spoken

2.2 Amharic Morphology

non-concatenative morphological features, Amharic uses diﬀerent aﬃxes to cre-

3 The Proposed Amharic Morphological Analyzer

3.2 Training Phase

Training Phase Morphological Analysis

Fig. 1. Architecture of the proposed Amharic morphological analyzer

expressed by internal phonological changes in the root. These internal irregular

Table 1. Example showing annotation of nouns

(prep|conj)(rel)(neg) subj STEM subj (appl)(obj|def)(neg|aux|acc)(conj)

Table 2. Example showing annotation of nouns

Input: Inflected words

1. Define the length of window size.

Algorithm 1. Algorithm for character-based feature extraction.

For instance, the character based representation of the word sleseberecw is

Table 3. Character-based feature extraction of the word sleseberecw

3.3 Morphological Analysis

Fig. 2. Feature extraction for morphological analysis

Fig. 3. Instances for the unknown token lenegerecw

Fig. 4. Reconstruction of the word slenegerecw

4.2 Test Results

Table 4. Test result for Amharic morphological analysis

Evaluation Type of Time taken Space requirement Accuracy

Fig. 5. Learning curve of the system

5 Conclusion and Future Work

Many high-level NLP applications heavily rely on a good morphological ana-

5. Daelemans, W., Bosch, A.: Memory-Based Language Processing. Cambridge Uni-

View publication stats

You might also like