0% found this document useful (0 votes)

183 views10 pages

Amharic Sentence Parsing Using Base Phrase Chunking

Uploaded by

chala mitafa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views10 pages

Amharic Sentence Parsing Using Base Phrase Chunking

Uploaded by

chala mitafa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Amharic Sentence Parsing Using Base Phrase Chunking

Abeba Ibrahim and Yaregal Assabie

Department of Computer Science, Addis Ababa University

[email protected], [email protected]

Abstract. Parsing plays a significant role in many natural language processing

(NLP) applications as their efficiency relies on having an effective parser. This
paper presents Amharic sentence parser developed using base phrase chunker
that groups syntactically correlated words at different levels. We use HMM to
chunk base phrases where incorrectly chunked phrases are pruned with rules.
The task of parsing is then performed by taking chunk results as inputs. Bot-
tom-up approach with transformation algorithm is used to transform the chunk-
er to the parser. Corpus from Amharic news outlets and books was collected for
training and testing. The training and testing datasets were prepared using the
10-fold cross validation technique. Test results on the test data showed an aver-
age parsing accuracy of 93.75%.

Keywords: Amharic Parsing, Base Phrase Chunking, Bottom-up Parsing.

1 Introduction

To process and understand natural languages, the linguistic structures of texts are
required to be organized at different levels. A structured text increases the capability
of NLP applications [2], [4]. The syntactic level of linguistic analysis concerns how
words are put together to form correct sentences and determines what structural role
each word plays in the sentence. Broadly speaking, the syntactic level deals with ana-
lyzing a sentence that generally consists of segmenting a sentence into words, group-
ing these words into a certain syntactic structural units, and recognizing syntactic
elements and their relationships within a structure. Syntactic level also indicates how
the words are grouped together into phrases, what words modify other words, and
what words are of central importance in the sentence [2], [7]. Parsing can be de-
scribed as a procedure that searches through various ways of combining grammatical
rules to find a combination that generates a tree representing the syntactic structure of
the input sentence. Parsing uses the syntax of languages to determine the functions of
words in a sentence in order to generate a data structure that can help to analyze the
meaning of the sentence [7]. In addition to this, parsing deals with a number of sub-
problems such as identifying constituents that can fit together. In general, parsing
assists to understand how words are put together to form the correct phrases or sen-
tence along with the structural roles of the words, and it plays a significant role in
many NLP applications as it helps to reduce the overall structural complexity of sen-
tences [13]. Some of the NLP applications where parser is used as a component are

A. Gelbukh (Ed.): CICLing 2014, Part I, LNCS 8403, pp. 297–306, 2014.
© Springer-Verlag Berlin Heidelberg 2014

Abeba Ibrahim and Yaregal Assabie (2014). “Amharic Sentence Parsing Using Base Phrase
Chunking”, In Proceedings of the 15th International Conference on Intelligent Text Processing
and Computational Linguistics (CICLing 2014), Springer Lecture Notes in Computer Science
(LNCS), Vol. 8403, pp. 297-306, Kathmandu, Nepal.
298 A. Ibrahim and Y. Assabie

semantic analysis, grammar checking, automatic abstracting, text summarization,

machine translation, etc.
Over the years, many algorithms have been proposed to deal with parsing and they
can be broadly classified in to two as top-down and bottom-up parsing. Top-down
parsing starts with the sentence and then applies the grammar rules forward until the
symbols at the terminals of the tree correspond to the components of the sentence
being parsed. In the top-down approach, a parser tries to derive the given string from
the start symbol by rewriting non-terminals one-by-one using production (grammati-
cal) rules. The non-terminal on the left hand side of a production rule is replaced by
its right hand side in the string being parsed. On the other hand, bottom-up parsing
starts with words in a sentence and applies grammar rules backward to reduce the
sequence of symbols until it consists solely of the start symbol. It begins with the
sentence to be parsed and applies the grammar rules backward until a single tree
whose terminals are the words of the sentence and whose top node is the start symbol
has been produced. In the bottom-up approach, a parser tries to reduce the given
string to the start symbol step by step using production rules. The right hand side of a
production found in the string being parsed is replaced by its left hand side. Among
the widely known top-down and bottom-up parsers are Early parser and CYK parser,
respectively [5], [8]. Since automatic parsing is a complex task, various techniques
have been employed to improve its efficiency. One of such strategies is chunking
whose task is dividing a text into syntactically correlated parts of words. These words
are non-overlapping which means that a word can only be a member of one chunk and
non-exhaustive, i.e., not all words are in chunks [17]. Abney [1] introduced the con-
cept of chunk as an intermediate step providing input to further full parsing stages. In
addition to being a component in parsing, chunkers are also used for the development
of different NLP applications such as information retrieval, information extraction,
named entity recognition, etc. Although the inherent characteristics of grammatical
structures vary from one language to another, there are some models and algorithms
that are commonly used to develop chunkers for various languages. These include
conditional random fields, hidden Markov models, transformation-based learning,
maximum entropy principle, etc. These models and algorithms have been used to
develop chunkers for various languages around the world such as English, Chinese,
Turkish, Vietnamese, etc. [6], [10], [12], [13], [16], [17], [18].
Although Amharic is the working language of Ethiopia with a population of about
90 million at present, it is still one of less-resourced languages with few linguistic
tools available for Amharic text processing. Since chunkers are identified as key
components in many NLP applications, we have developed Amharic base phrase
chunker using a hybrid of HMM and rule-based methods [9]. Thus, in this work, we
used the chunker to develop an Amharic parser. The remaining part of this paper is
organized as follows. Section 2 presents Amharic language with emphasis to its
phrase structure. Amharic base phrase chunking along with error pruning is discussed
in Section 3. In Section 4, we present sentence parsing by making use of chunk results
as inputs. Experimental results are presented in Section 5, and conclusion and future
works are highlighted in Section 6. References are provided at the end.
Amharic Sentence Parsing Using Base Phrase Chunking 299

2 Structures of Amharic Language

2.1 Amharic Language

There are over 80 languages spoken in Ethiopia which has a population of over 90
million at present. Amharic is the working language of the federal government of the
country. Amharic is spoken as a mother tongue by a large segment of the population
and it is the most commonly learned second language throughout the country. As a
result, Amharic is the lingua franca of the country in the modern era [11]. The
language is believed to be evolved from Geez which has been used over many years
as the liturgical language of Ethiopia. Along with dozens of other Ethiopian
languages, Amharic is written using Ethiopic script which has a total of a total of 435
characters, with several languages having their own special sets of characters
representing the unique sounds of the respective languages. Out of the whole set of
Ethiopic characters, Amharic uses 33 consonants (base characters) from which six
other orders of characters representing combinations of vowels and consonants are
derived for each base character. The Amharic alphabet is conveniently written in a
tabular format of seven columns where the first column represents the base characters
and others represent their derived vocal sounds. The vowels of the alphabet are not
encoded explicitly but appear as modifiers of the base characters. In addition, there
ሏ ሟ ሯ
are about two scores of labialized characters such as (lwa), (mwa) (rwa) (swa), ሷ
etc. used by the language for writing. Part of a Amharic alphabet is shown in Table 1.

Table 1. Part of the Amharic alphabet

Base
Base character Orders of the base character
sound ä( ) ኧ ኡ
u( ) ኢ
i( ) ኣ
a( ) ኤ
e( ) እ
ĭ( ) ኦ
o( )
1 h hä ( )ሀ ሁ
hu ( ) ሂ
hi ( ) ሃ
ha ( ) ሄ
he ( ) ህ
hĭ ( ) ሆ
ho ( )
2 l lä (ለ) ሉ
lu ( ) ሊ
li ( ) ላ
la ( ) ሌ
le ( ) ል
lĭ ( ) ሎ
lo ( )
3 h hä (ሐ) hu (ሑ) hi (ሒ) ha (ሓ) he (ሔ) hĭ (ሕ) ho (ሖ)
4 m mä (መ) mu (ሙ) mi (ሚ) ma (ማ) me (ሜ) mĭ (ም) mo (ሞ)
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
33 p ፐ
pä ( ) ፑ
pu ( ) ፒ
pi ( ) ፓ
pa ( ) ፔ
pe ( ) ፕ
pĭ ( ) ፖ
po ( )

2.2 Grammatical Rules of Amharic

Yimam [19] and Amare [3] classified phrase structures of the Amharic language as:
noun phrases (NP), verb phrases (VP), adjectival phrases (AdjP), adverbial phrases
(AdvP) and prepositional phrases (PP). These phrases have principal word classes as
heads. For example, an Amharic noun phrase has a noun as its head; an Amharic verb
phrase has a verb as its head; etc. Amharic phrases, except prepositional phrases, can
300 A. Ibrahim and Y. Assabie

be made from a single head word or with a combination of other words. Unlike other
phrase constructions, prepositions cannot be taken as a phrase. Instead they should be
combined with other constituents and the constituents may come either previous to or
subsequent to the preposition. If the complements are nouns or NPs, the position of
prepositions is in front of the complements whereas if the complements are PPs, the
position will shift to the end of the phrase. Examples are: እንደ ሰው (ĭndä säw/like a
human), ከቤቱ አጠገብ (käbetu aţägäb/close to the house), etc. In Amharic phrase
construction, the head of the phrase is always found at the end of the phrase except for
prepositional phrases.
Amharic language follows subject-object-verb grammatical pattern unlike, for ex-
ample, English language which has subject-verb-object sequence of words [3], [19].
For instance, the Amharic equivalent of sentence “John killed the lion” is written as
“ጆን (jon/John) አንበሳውን (anbäsawn/the lion) ገደለው (gädäläw/killed)”. Amharic
sentences can be constructed from simple or complex NP and simple or complex VP.
Simple sentences are constructed from simple NP followed by simple VP which con-
tains only a single verb. Complex sentences are sentences that contain at least one
complex NP or complex VP or both complex NP and complex VP. Complex NPs are
phrases that contain at least one embedded sentence in the phrase construction. The
embedded sentence can be complements.

3 Base Phrase Chunking

This section discusses about the Amharic base phrase chunker we used as a compo-
nent to develop the parser. The Amharic chunker system is exposed further in detail in
[9]. The output of the system, i.e. the tag of chunks can be noun phrases, verb phrases,
adjectival phrases, etc. in line with the natural language construction rule. In order to
identify the boundaries of each chunk in sentences, the following boundary types are
used [15]: IOB1, IOB2, IOE1, IOE2, IO, “[”, and “]”. The first four formats are com-
plete chunk representations which can identify the beginning and ending of phrases
while the last three are partial chunk representations. All boundary types use “I” tag
for words that are inside a phrase and an “O” tag for words that are outside a phrase.
They differ in their treatment of chunk-initial and chunk-final words.

− IOB1: the first word inside a phrase immediately following another phrase
receives B tag.
− IOB2: all phrases- initial words receive B tag.
− IOE1: the final word inside a phrase immediately preceding another same
phrase type receives E tag.
− IOE2: all phrases- final words receive E tag.
− IO: words inside a phrase receive I tag, others receive O tag.
− “[”: all phrase-initial words receive “[” tag other words receive “.” tag.
− “]”: all phrase-final words receive “]” tag and other words receive “.” tag.
Amharic Sentence Parsing Using Base Phrase Chunking 301

We considered six different kinds of chunks, namely noun phrase (NP), verb
phrase (VP), Adjective phrase (AdjP), Adverb phrase (AdvP), prepositional phrase
(PP) and sentence (S). To identify the chunks, it is necessary to find the positions
where a chunk can end and a new chunk can begin. The part-of-speech (POS) tag
assigned to every token is used to discover these positions. We used the IOB2 tag set
to identify the boundaries of each chunk in sentences extracted from chunk tagged
text. Using the IOB2 tag set along with the chunk types considered, a total of 13
phrase tags were used in this work. These are: B-NP, I-NP, B-VP, I-VP, B-PP, I-PP,
B-ADJP, I-ADJP, B-ADVP, I-ADVP, B-S, I-S and O. For example, the IOB2 chunk
representation for the sentence ካሳ ያመጣው ትንሽ ልጅ እንደ አባቱ በጣም ታመመ (kasa
yamäţaw tĭnĭš lĭj ĭndä abatu bäţam tamämä/The little boy that Kassa has brought
became very sick like his father) is shown in Table 2. Accordingly, the chunk tagged
ካሳ
sentence would be “ ያመጣው
N B-S ትንሽ
VREL I-S ADJ B-NP ልጅ N I-NP እንደ
P B-PP አባቱ N I-PP በጣም ADJ B-VP ታመመ V I-VP”.

Table 2. IOB2 chunk representation for “ካሳ ያመጣው ትንሽ ልጅ እንደ አባቱ በጣም ታመመ ”
Words IOB2 chunk representation
B-S
ካሳ (kasa/Kassa)
I-S
ያመጣው (yamäţaw/that [Kassa] has brought)
B-NP
ትንሽ (tĭnĭš/little)
I-NP
ልጅ (lĭj/boy)
B-PP
እንደ (ĭndä/like)
I-PP
አባቱ (abatu/his father)
B-VP
በጣም (bäţam/very)
I-VP
ታመመ (tamämä/became sick)

To implement the chunker component, we use hidden Markov model (HHM) en-
hanced by a set of rule used to prune errors. In the training phase of HMM, the system
first accepts words with POS tags and chunk tags. Then, the HMM is trained with a
training set built from sentences where words are tagged with part-of-speeches and
chunks. Likewise in the test phase, the system accepts words with POS tags and out-
puts appropriate chunk tag sequences against each POS tag using HMM model. We
use POS tagged sentence as input from which we observe sequences of POS tags.
However, we also hypothesize that the corresponding sequences of chunk tags form
hidden Markovian properties. Thus, we used a hidden Markov model (HMM) with
POS tags serving as states. The HMM model is trained with sequences of POS tags
and chunk tags extracted from the training corpus. The HMM model is then used to
predict the sequence of chunk tags for a given sequence of POS tag by making use of
the Viterbi algorithm. The output of the decoder is the sequence of chunks tags which
group words based on syntactical correlations. The output chunk sequence is then
analyzed to improve the result by applying linguistic rules derived from the grammar
302 A. Ibrahim and Y. Assabie

1. If POS(w)=ADJ and POS(w+1)=NPREP, NUMCR ,then chunk

tag for w is O
2. If POS(w)=ADJ and POS(w-1)!= ADJ and POS(w+1)= AUX,V,
then chunk tag for w is B-VP
3. If POS(w)=NPREP and POS(w+1)=N ,then chunk tag for w
is B-NP
4. If POS(w)=NUMCR and POS(w+1)=NPREP, then chunk tag for
w is O
5. If POS(w)=N and POS(w+1)=VPREP and POS(w-1)=N, ADJ,
PRON,NPREP, then chunk tag for w is B-VP
6. If POS(w)=ADJ and POS(w+1)=ADJ, then chunk tag for w
is B-ADJP
Algorithm 1. Sample rules used to prune chunk errors

of Amharic. For a given Amharic word w, linguistic rules (from which sample rules
are shown in Algorithm 1) were used to correct wrongly chunked words (“w-1” and
“w+1” are used to mean the previous and next word, respectively).

4 Sentence Parser

In this work, bottom-up approach is employed for sentence parsing by using the out-
put of the chunker as an input and recursively remove the head words to make new
phrases until individual words are reached. The parse tree is then constructed while
head words are recursively removed and new phrases are formed. When we obtain no
new phrases during the recursive process, it means that we complete the process of
parsing. The algorithm that is used for parsing is given in Algorithm 2.

1. Take the tagged document

2. Use chunker to identify base phrases
3. If the base phrase is VP, NP, AdjP, or AdvP
Replace all identified phrases with their head
Else {if the base phrase is PP or S}
The current phrase takes the word next to it
and makes new phrase by taking the new word as
a head
4. Find base phrases in the new data stream
5. If Step 4 discovered new phrases
Repeat Steps 3-5
Else
Stop

Algorithm 2. Algorithm for parsing a sentence

Amharic Sentence Parsing Using Base Phrase Chunking 303

CHUNKER
Word, POS tag and
chunk tag sequences

HMM
Model
Training

Testing

Word and POS Chunking Error pruning Chunked

tag sequences using HMM with rules text

PARSER
Yes
New base Replace phrases No Base phrase
phrase? with their heads PP or S?

No Yes

Sequence of tagged Form new phrase

head words with the next word

Fig. 1. Overall architecture of the Amharic parser

The Amharic base phrase chunker was integrated in the parser. The overall archi-
tecture of the parser including the chunker is shown in Figure 1.
The following example shows how parsing is performed using the proposed algo-
ወንበዴዎች በጎፈቃደኞች
rithm for a given POS tagged sentence: " N NPREP የገነቡትን
VREL ድርጅት N ከጥቅም NPREP ውጭ PREP አደረጉት V".

Step1: ወንበዴዎች N በጎፈቃደኞች NPREP የገነቡትን VREL ድርጅት N

ከጥቅም NPREP ውጭ PREP አደረጉት V
304 A. Ibrahim and Y. Assabie

Step2: [(' ወንበዴዎች', 'N'), ('በጎፈቃደኞች NPREP የገነቡትን VREL', 'S'),

('ድርጅት', 'N'), ('ከጥቅም NPREP ውጭ PREP', 'PP'), ('አደረጉት', 'V')]

Step3: [' ወንበዴዎች N', ('በጎፈቃደኞች NPREP የገነቡትን VREL ድርጅት N',
'NP'), ('ከጥቅም NPREP ውጭ PREP አደረጉት V', 'VP')]

Step4: [(' ወንበዴዎች', 'N'), ('ድርጅት N አደረጉት V', 'VP')]

Step5: [' ወንበዴዎች N', 'አደረጉት V']
In the above example, Step1 is taking the tagged sentence as input for the chunker.
The first output of the chunker is generated in Step2 which identifies possible base
phrases. In this example, (' በጎፈቃደኞች
NPREP የገነቡትን
VREL', 'S') and (' ከጥቅም
NPREP ውጭ PREP', 'PP') are the base phrases identified in Step2 of the algorithm.
In Step3, the prepositional phrase and the subordinate clause or sentence are com-
bined with the next word and converted to verb phrase and noun phrase, respectively.
A new sentence [(' ወንበዴዎች
', 'N'), (' N ድርጅት አደረጉት
V', 'VP')] is now generated
as a result of looking for new base phrases in Step4. Here, the new sentence will be
processed recursively until there are no new base phrases discovered in Step4. The
parse tree is built in the process by taking base phrases as nodes of the tree. The parse
tree representing the parsing process for the aforementioned example is shown in
Figure 2.

·[à½x N [úĮFÜx NPREP Õô\p VREL á0ñp N ČH NPREP ¼Ĕ PREP Ü+õp V

('[úĮFÜx NPREP Õô\p VREL', 'S') ('ČH NPREP ¼Ĕ PREP', 'PP')

(('[úĮFÜx NPREP Õô\p VREL', 'S') á0ñp N, 'NP') (('ČH NPREP ¼Ĕ PREP', 'PP') Ü+õp V, VP')

((('[úĮFÜx NPREP Õô\p VREL', 'S') á0ñp N', 'NP') (('ČH NPREP ¼Ĕ PREP', 'PP') Ü+õp V, 'VP'), 'VP')

'·[à½x N','NP') ((('[úĮFÜx NPREP Õô\p VREL', 'S') á0ñp N', 'NP') (('ČH NPREP ¼Ĕ PREP', 'PP') Ü+õp V', 'VP'), 'VP'), 'S' )

Fig. 2. Parse tree representing the parsing process of an Amharic sentence

Amharic Sentence Parsing Using Base Phrase Chunking 305

5 Experiment

5.1 The Corpus

The major source of the dataset we used for training and testing the system was Walta
Information Center (WIC) news corpus which is at present widely used for research
on Amharic natural language processing. The corpus contains 8067 sentences where
words are annotated with POS tags. Furthermore, we also collected additional text
from an Amharic grammar book authored by Yimam [19]. The sentences in the cor-
pus are classified as training data set and testing data set using 10 fold cross validation
technique.

5.2 Test Result

In 10-fold cross-validation, the original sample is randomly partitioned into 10 equal

size subsamples. Of the 10 subsamples, a single subsample is used as the validation
data for testing the model, and the remaining 9 subsamples are used as training data.
The cross-validation process is then repeated 10 times, with each of the 10 subsam-
ples used exactly once as the validation data. Accordingly, we obtain 10 results from
the folds which can be averaged to produce a single estimation of the model’s predic-
tive potential. By taking the average of all the ten results the overall chunking accura-
cy of the system was 85.31% for the HMM chunking model. However, the result was
improved to an accuracy of 93.75% when the HMM was pruned with rules. Test re-
sults also show that the parser correctly parses all sentences that are chunked correct-
ly. However, the parser fails to correctly parse sentences that are chunked wrongly.
So, the overall accuracy of the parser is the same as that of the hybrid chunker, i.e.
93.75%.

6 Conclusion

Amharic is one of the most morphologically complex and less-resourced languages.

This complexity poses difficulty in the development of natural language processing
applications for the language. Despite the efforts being undertaken to develop various
Amharic NLP applications, only few usable tools are publicly available at present.
One of the main reasons frequently cited by researchers is morphological complexity
of the language. Amharic text parsing also suffers from this problem. In this work, we
tried to overcome this problem by employing chunker. Test results have shown that
all sentences that are correctly chunked are parsed correctly which is a promising
result. The performance of the parser we developed can be enhanced by improving
the effectiveness of the chunking module. It appears that chunking is more managea-
ble problem than parsing because the chunker does not require deeper analysis of
texts which will be less affected by the morphological complexity of the language.
Thus, future work is recommended to be directed at improving the chunking compo-
nent of the parser.
306 A. Ibrahim and Y. Assabie

References
1. Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-
Based Parsing. Kluwer Academic Publishers (1991)
2. Abney, S.: Chunks and Dependencies: Bringing Processing Evidence to Bear on Syntax.
In: Computational Linguistics and the Foundations of Linguistic Theory. CSLI (1995)
3. Amare, G.: ዘመናዊ የአማርኛ ሰዋስው በቀላል አቀራረብ (Modern Amharic Grammar in a Sim-
ple Approach), Addis Ababa, Ethiopia (2010)
4. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Relly Media
Inc., Sebastopol (2009)
5. Earley, J.: An efficient context-free parsing algorithm. Communications of the
ACM 13(2), 94–102 (1970)
6. Grover, C., Tobin, R.: Rule-based chunking and reusability. In: Proceedings of the Fifth
International Conference on Language Resources and Evaluation, LREC 2006 (2006)
7. Jurafsky, D., Martin, H.: Speech and Language Processing: An Introduction to Natural
Language Processing, Speech Recognition, and Computational Linguistics, 2nd edn.
Prentice-Hall (2009)
8. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages,
And Computation, ch. 7, pp. 228–302. Addison-Wesley (2001)
9. Ibrahim, A., Assabie, Y.: Hierarchical Amharic Base Phrase Chunking Using HMM With
Error Pruning. In: Proceedings of the 6th Conference on Language and Technology,
Poznan, Poland, pp. 328–332 (2013)
10. Kutlu, M.: Noun phrase chunker for Turkish using dependency parser. Doctoral dissertation.
Bilkent University (2010)
11. Lewis, P., Simons, F., Fennig, D.: Ethnologue: Languages of the World, 17th edn. SIL
International, Dallas (2013)
12. Li, S.J.: Chunk parsing with maximum entropy principle. Chinese Journal of Computers:
Chinese Edition 26(12), 1722–1727 (2003)
13. Manning, C., Schuetze, H.: Foundations of Statistical Natural Language Processing.
MIT Press, Cambridge (1999)
14. Molina, A., Pla, F.: Shallow parsing using specialized HMMs. The Journal of Machine
Learning Research 2, 595–613 (2002)
15. Ramshaw, A., Marcus, P.: Text chunking using transformation-based learning. In:
Proceedings of the Third ACL Workshop on Very Large Corpora, pp. 82–94 (1995)
16. Thao, H., Thai, P., Minh, N., Thuy, Q.: Vietnamese noun phrase chunking based on condi-
tional random fields. In: International Conference on Knowledge and Systems Engineering
(KSE 2009), pp. 172–178. IEEE (2009)
17. Tjong, E.F., Sang, K., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunk-
ing. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th
Conference on Computational Natural Language Learning, vol. 7, pp. 127–132 (2000)
18. Xu, F., Zong, C., Zhao, J.: A Hybrid Approach to Chinese Base Noun Phrase Chunking.
In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney
(2006)
19. Yimam, B.: የአማርኛ ሰዋስው (Amharic Grammar), Addis Ababa, Ethiopia (2000)

NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
The Gender Issue in Language and Society - Trandafir Dorinel-Laurentiu
No ratings yet
The Gender Issue in Language and Society - Trandafir Dorinel-Laurentiu
44 pages
International Convention On Economic, Social and Cultural
100% (1)
International Convention On Economic, Social and Cultural
29 pages
Cryptography and Network Security: Third Edition by William Stallings
No ratings yet
Cryptography and Network Security: Third Edition by William Stallings
17 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Duolingo English Test Sample Questions and Answers
No ratings yet
Duolingo English Test Sample Questions and Answers
14 pages
Grinding (Lecture 3)
No ratings yet
Grinding (Lecture 3)
27 pages
Inglés Gramática 3º Eso Sandra
No ratings yet
Inglés Gramática 3º Eso Sandra
7 pages
CS602 Handouts PDF
No ratings yet
CS602 Handouts PDF
437 pages
Parsing and Parsing Techniques in Compiler Construction
No ratings yet
Parsing and Parsing Techniques in Compiler Construction
12 pages
2274-Article Text-6535-1-10-20241008
No ratings yet
2274-Article Text-6535-1-10-20241008
14 pages
HausaGrammar 10845680
No ratings yet
HausaGrammar 10845680
217 pages
Generativism
100% (7)
Generativism
5 pages
Chapter 3 Syntax Analysis Full Reading Material
No ratings yet
Chapter 3 Syntax Analysis Full Reading Material
76 pages
Natural Language Parsers - A Course in Cooking
No ratings yet
Natural Language Parsers - A Course in Cooking
87 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
66 pages
Bayesian Networks: A Tutorial
No ratings yet
Bayesian Networks: A Tutorial
73 pages
(A) - An Interactive Parser Generator For Context-Free Grammars
No ratings yet
(A) - An Interactive Parser Generator For Context-Free Grammars
7 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Compiler Design - Parser Design With Lex and Yacc
No ratings yet
Compiler Design - Parser Design With Lex and Yacc
8 pages
Cephalopelvic Disproportion
60% (5)
Cephalopelvic Disproportion
2 pages
A709a 709M-17 PDF
No ratings yet
A709a 709M-17 PDF
8 pages
Unit - 4 Ai
No ratings yet
Unit - 4 Ai
35 pages
AI Notes Part-3
No ratings yet
AI Notes Part-3
29 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
HHW Xi 24-25
No ratings yet
HHW Xi 24-25
33 pages
CD Module2 16 03 23 PDF
No ratings yet
CD Module2 16 03 23 PDF
36 pages
Basic Parsing Techniques
No ratings yet
Basic Parsing Techniques
34 pages
Syntax and Parsing: Natural Language Processing (Cse 5321)
No ratings yet
Syntax and Parsing: Natural Language Processing (Cse 5321)
32 pages
An Introduction To Minimalist Grammar
No ratings yet
An Introduction To Minimalist Grammar
34 pages
PCD - Unit Ii
No ratings yet
PCD - Unit Ii
31 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Tesla Coils - Unleash The Aether
No ratings yet
Tesla Coils - Unleash The Aether
29 pages
Parsing Algorithms
No ratings yet
Parsing Algorithms
20 pages
Chapter #3 Syntax Analysis
No ratings yet
Chapter #3 Syntax Analysis
22 pages
Alternative Proposal 20160912 - Mtentu (Rev.2)
No ratings yet
Alternative Proposal 20160912 - Mtentu (Rev.2)
17 pages
Basis For Comparison Top-Down Parsing Bottom-Up Parsing
No ratings yet
Basis For Comparison Top-Down Parsing Bottom-Up Parsing
23 pages
12.2unit 2
No ratings yet
12.2unit 2
25 pages
External Waterproofing Brochure 0
No ratings yet
External Waterproofing Brochure 0
13 pages
Morphosyntactic Analysis of Georgian
No ratings yet
Morphosyntactic Analysis of Georgian
21 pages
Transformational Grammar.
No ratings yet
Transformational Grammar.
19 pages
Chapter 13.1. LL Parsing
No ratings yet
Chapter 13.1. LL Parsing
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
7 pages
The Book of Daniel-Chapter 1-6
No ratings yet
The Book of Daniel-Chapter 1-6
15 pages
CC LL
No ratings yet
CC LL
15 pages
CH9 - Dynamic Programming and The Earley Parser
No ratings yet
CH9 - Dynamic Programming and The Earley Parser
16 pages
Intelligent Natural Language Interface For A Signal Processing System
No ratings yet
Intelligent Natural Language Interface For A Signal Processing System
9 pages
Automatic Amharic Text News Classification: Aneural Networks Approach
No ratings yet
Automatic Amharic Text News Classification: Aneural Networks Approach
11 pages
Types of Parsing
No ratings yet
Types of Parsing
12 pages
(IJCST-V4I4P2) :walelign Tewabe Sewunetie, Eshete Derb Emiru
No ratings yet
(IJCST-V4I4P2) :walelign Tewabe Sewunetie, Eshete Derb Emiru
8 pages
A Framework To Automate The Parsing of Arabic Language Sentences
No ratings yet
A Framework To Automate The Parsing of Arabic Language Sentences
7 pages
BGMSuccessPath - MLM 210609 093420
No ratings yet
BGMSuccessPath - MLM 210609 093420
14 pages
Compiler Design: 4. Language Grammars
No ratings yet
Compiler Design: 4. Language Grammars
14 pages
LAB Tortoise Hare
0% (1)
LAB Tortoise Hare
2 pages
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
No ratings yet
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
13 pages
Parsing 2
No ratings yet
Parsing 2
7 pages
Development of An Amharic Text-to-Speech System PDF
No ratings yet
Development of An Amharic Text-to-Speech System PDF
7 pages
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
No ratings yet
Learning Morphological Rulesfor Amharic Verbsusing Inductive Logic Programming
7 pages
Gasser-A Dependency Grammar For Amharic-Workshop On Language Resources and Human Language Technologies For Semitic Languages-2010 PDF
No ratings yet
Gasser-A Dependency Grammar For Amharic-Workshop On Language Resources and Human Language Technologies For Semitic Languages-2010 PDF
7 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Natural Language Processing (NLP) - Chomsky's Theories of Syntax
No ratings yet
Natural Language Processing (NLP) - Chomsky's Theories of Syntax
9 pages
Makalah Sociolinguistics
No ratings yet
Makalah Sociolinguistics
8 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
FUSE: A Microservice Approach To Cross-Domain Federation Using Docker Containers
No ratings yet
FUSE: A Microservice Approach To Cross-Domain Federation Using Docker Containers
10 pages
FiNC 401 Exam 2018
No ratings yet
FiNC 401 Exam 2018
7 pages
Module 14
No ratings yet
Module 14
7 pages
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
No ratings yet
Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic Wordnet
6 pages
IJERT Possibility of Amharic Query Proce
No ratings yet
IJERT Possibility of Amharic Query Proce
6 pages
7
No ratings yet
7
4 pages
Thesis Review On Information Extraction From Text Knowledge Poor Approach
No ratings yet
Thesis Review On Information Extraction From Text Knowledge Poor Approach
7 pages
Phoneme Based English Amharic Statistica
No ratings yet
Phoneme Based English Amharic Statistica
6 pages
Lecture03 03
No ratings yet
Lecture03 03
7 pages
Priyanjana Ghosh - Automata
No ratings yet
Priyanjana Ghosh - Automata
6 pages
English Assignment - Gandhi
No ratings yet
English Assignment - Gandhi
6 pages
Shewa - NLP Project Report PDF
No ratings yet
Shewa - NLP Project Report PDF
7 pages
Development of Amharic Grammar Checker Using Morphological Features of Words and N-Gram Based Probabilistic Methods
No ratings yet
Development of Amharic Grammar Checker Using Morphological Features of Words and N-Gram Based Probabilistic Methods
7 pages
Answer Key: Unit 01: On The Street
No ratings yet
Answer Key: Unit 01: On The Street
14 pages
Syntax Analysis: Role of Parsers
No ratings yet
Syntax Analysis: Role of Parsers
6 pages
Petronas Twin Towers
No ratings yet
Petronas Twin Towers
6 pages
Overview of LEX and YACC
No ratings yet
Overview of LEX and YACC
6 pages
Philippine National Police: Id Application Form (PNP Dependent)
No ratings yet
Philippine National Police: Id Application Form (PNP Dependent)
1 page
Parsing
No ratings yet
Parsing
4 pages
Refrigerant Piping Sample-2
No ratings yet
Refrigerant Piping Sample-2
4 pages
Parsing Assignment
No ratings yet
Parsing Assignment
6 pages
Xavier-Kuangchi Exemplary Alumni: Objectives
No ratings yet
Xavier-Kuangchi Exemplary Alumni: Objectives
5 pages
Kiran 2019
No ratings yet
Kiran 2019
4 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
3 pages
Grouping
No ratings yet
Grouping
5 pages
30 2090 0109 - 707PlateFinisher EN
No ratings yet
30 2090 0109 - 707PlateFinisher EN
1 page
The Air in The Kitchen Hung Thick and Heavy
No ratings yet
The Air in The Kitchen Hung Thick and Heavy
1 page
Mukta
No ratings yet
Mukta
1 page
The Genetic Code of All Languages; Part-5 (Hebrew)
From Everand
The Genetic Code of All Languages; Part-5 (Hebrew)
Moni Kanchan Panda
No ratings yet
Logic Programming: Fundamentals and Applications
From Everand
Logic Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

Amharic Sentence Parsing Using Base Phrase Chunking

Uploaded by

Amharic Sentence Parsing Using Base Phrase Chunking

Uploaded by

Amharic Sentence Parsing Using Base Phrase Chunking

Abeba Ibrahim and Yaregal Assabie

Department of Computer Science, Addis Ababa University

Abstract. Parsing plays a significant role in many natural language processing

Keywords: Amharic Parsing, Base Phrase Chunking, Bottom-up Parsing.

semantic analysis, grammar checking, automatic abstracting, text summarization,

2 Structures of Amharic Language

2.1 Amharic Language

Table 1. Part of the Amharic alphabet

2.2 Grammatical Rules of Amharic

3 Base Phrase Chunking

1. If POS(w)=ADJ and POS(w+1)=NPREP, NUMCR ,then chunk

1. Take the tagged document

Algorithm 2. Algorithm for parsing a sentence

Word and POS Chunking Error pruning Chunked

Sequence of tagged Form new phrase

Fig. 1. Overall architecture of the Amharic parser

Step1: ወንበዴዎች N በጎፈቃደኞች NPREP የገነቡትን VREL ድርጅት N

Step2: [(' ወንበዴዎች', 'N'), ('በጎፈቃደኞች NPREP የገነቡትን VREL', 'S'),

Step4: [(' ወንበዴዎች', 'N'), ('ድርጅት N አደረጉት V', 'VP')]

('[úĮFÜx NPREP Õô\p VREL', 'S') ('ČH NPREP ¼Ĕ PREP', 'PP')

Fig. 2. Parse tree representing the parsing process of an Amharic sentence

5.1 The Corpus

5.2 Test Result

In 10-fold cross-validation, the original sample is randomly partitioned into 10 equal

Amharic is one of the most morphologically complex and less-resourced languages.

You might also like

('[úĮFÜx NPREP Õô\p VREL', 'S') ('ČH NPREP ¼Ĕ PREP', 'PP')