Natural Language Processing

The document discusses developing an implementation of Amharic sentence parsing using context free grammar and probabilistic context free grammar approaches. It describes the methodology, challenges, and provides code examples of parsing Amharic sentences and displaying the parse trees.

Uploaded by

mekuriaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views

Natural Language Processing

Uploaded by

mekuriaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Natural Language Processing (NLP)

Project Report paper on

“Amharic sentence Parse Tree”
.
Contents

1. Introduction............................................................................................................................1
2. Approaches used for this work..............................................................................................1
3. Methodology............................................................................................................................2
4. Implementation.......................................................................................................................3
5. Challenges...............................................................................................................................4
6. References...............................................................................................................................5
1. Introduction
Parsing, one of the steps to design a functional NLP application and which can work in
cooperation and as input to other many NLP application like grammar and spell checker, spell
correction, and etc. In parsing the central point involves in manipulation, understanding, and
parsing (breaking down to manageable components), understand their context, relation with each
other to successfully identify their correctness. Sentences are the starting point when we come to
analyzing a written material or documents [1]. Syntax refers to the way words are related to each
other in a sentence. Then we can say that sentence parsing, which is also called syntactic parsing,
is the process of identifying how words can be put together to form correct sentence and
determining what structural role (lexical category) each word plays in the sentence and what
phrases are subparts of what other phrases or what other words modify which words of the
central point of the whole sentence constructed. A sentence parser outputs a parse structure that
could be used as a component in many applications including semantic analysis, machine
translation, information storage and retrieval of textual data etc., [2]. Today, parsers of different
kinds (e.g. probabilistic, rule based) have been developed for languages, which have relatively
wider use nationally and/or internationally (e.g., English, German, Chinese, etc. [3] My project
work is focused on the implementation of Amharic sentence that displays the parse tree for the
sentence. To do sentence parsing there are different methods, some of them are Context free
Grammar (CFG) from rule-based approach and Probability Context Free Grammar (PCFG) from
statistical approach. Hence my work is done using these two approaches, i.e., CFG and PCFG
[4].

2. Approaches used for this work

The approaches I have used for this implementation as I mentioned on the above section, are
CFG and PCFG form statistical and non-statistical methods.
Context-free Grammar
A context-free grammar (CFG) is a formal system that describes a language by specifying how
any legal text can be derived from a distinguished symbol called the axiom, or sentence symbol.
[2] CFGs are a very important class of grammars for two reasons: The formalism is powerful
enough to describe most of the structure in natural languages, yet it is restricted enough so that
efficient parsers can be built to analyze sentences [3].

Probabilistic Context-Free Grammars (PCFG) Parsing

PCFG is a context free grammar that associates a probability with each of its productions. It
generates the same set of parses for a text that the corresponding context free grammar does, and
assigns a probability to each parse. The probability of a parse generated by a PCFG is simply the
product of the probabilities of the productions used to generate it [1]. They produce a model of a
language based on real data, and therefore do not have to worry about things like grammatical
mistakes, which occur in real-life situations. Although PCFGs have many advantages, a critical
disadvantage is that context is not taken into account at all. In fact, a tri-gram (sequence of three
words in this case) model of a language would probably achieve better results, even though it
takes no account of internal structures in the language, more applicable to language like Amharic
[3].

3. Methodology
The methodology I used to develop the implementation of Amharic Parse tree is, takes a set of
sample grammars 4 from simple to complex grammar production rules, and assigned those
probabilities for probabilistic approach parsing and draws their parse tree and specifies their
parsing structure based on the grammar.

To develop the implementation, talking source code wise: I have used a collection tools working
and supporting the main application for different purposes [2]. Below I have listed out the
names.
 Python 3.7
 NLTK 3.2 Python Based Natural Language Processing Toolkit. (www.nltk.org)
 KeyMan Keyboard for Unicode Keyboard Writer (Amharic)
 PyScripter 3.7 for an interactive IDE for python.
In order to Setup my implementation, on a local environment, first python 3.7 must be installed
and then download NLTK 3.2 and install it under the python directory, because this used as
library inside a python code. Then you need to download NLTK data using python itself.

4. Implementation
The first sample implementation of my work is the CFG approach for Amharic sentence parsing
tree. The source code and the output of the implementation is as follows: An example of a CFG
is given below. For a Sentence Like "አበበ የ ሰዉ አጥር ላይ ሆኖ አየ" can be represented using the
following grammar.

S -> NP VP
VP -> V NP | V NP PP | NP V
PP -> P NP | P P
V -> "አየ" | "በላ" | "ተራመዳ"
NP -> "አበበ" | "ከበደ" | "ጫላ" | Det N| Det N N | Det N PP | N N | Det N N PP
Det -> "የ" | "ለ"
N -> "ሰዉ" | "ውሻ" |"አጥር"| "ድመት" | "መናፈሻ"
P -> "በ" | "ላይ" | "በኩል"|"ሆኖ"| "ከ"

The Syntax Parse Structure for the above example and its Parse Tree Using the developed
application looks like the following respectively: (S (NP አበበ) (VP (NP (Det የ) (N ሰዉ) (N አጥር)
(PP (P ላይ) (P ሆኖ))) (V አየ)))

Output is:

And the second implementation of my work is PCFG approach for Amharic sentence parsing
tree. The source code and the output of the implementation is as follows:

Example of PCFG grammar is shown below and, the approach is explained in a topic below the
figure.

S -> NP VP [1.0]
VP -> V NP [0.2] VP -> V NP PP [0.3] VP -> NP V [0.1] VP -> NP Adj V [0.4]
PP -> P NP [0.2] PP -> P P [0.8]
V -> "አየ" [0.8] V -> "በላ" [0.1] V -> "ተራመደ" [0.1]
NP -> "አበበ" [0.2] NP -> "ከበደ" [0.1] NP ->"ጫላ" [0.1] NP -> Det N [0.1] NP -> Det N N [0.1]
NP -> Det N PP [0.1] NP -> N N [0.1] NP -> Det N N PP [0.2]
Det -> "የ" [0.9] Det -> "ለ" [0.1] N -> "ሰዉ [0.4]
N -> "ውሻ" [0.1] N -> "አጥር" [0.2] N -> "ድመት" [0.1] N -> "መናፈሻ" [0.1]
P -> "በ" [0.1] P ->"ላይ" [0.4] P -> "በኩል" [0.1] P ->"ሆኖ" [0.3] P ->"ከ" [0.1]
Adj ->"ትንሽ" [1.0]
The Syntax Parsed Structural Output using Viteberi algorithm using the above grammar is shown
below, with a final summed up probabilistic value.

Code Example Using Python

viterbi_parser = nltk.ViterbiParser(grammer)
sent = "አበበ የ ሰዉ አጥር ላይ ሆኖ ትንሽ አየ".split()
print (viterbi_parser.parse(sent))

Output of the above grammar and Viterberi_Parser in My application using Python

(S (NP አበበ) (VP (NP (Det የ) (N ሰዉ) (N አጥር) (PP (P ላይ) (P ሆኖ))) (Adj ትንሽ) (V አየ)))
(p=8.84736e-05)

5. Challenges
There are some challenges that occurred when doing the projects.
1. This study uses a very small sample prepared for the purpose of the work due to lack of
time and finding well organized corpus, machine editable dictionary, POS tagged words
and unable to find specially a POS tagger application for Amharic.
2. The prototype developed in the report/study parses is assumed to be supporting a 10 and
more composed -word Amharic sentences but, the to gain the real outcome of the
prototype developed, again due mainly to time constraint, lack of linguistic ability to
possibility determine grammar rules and probabilistic rules.
3. This report does not incorporate more advanced topic like ambiguity resolution, but showed
sample parsing using probabilistic approaches.

6. References
[1] A. Alemu, "Automatic Sentence Parsing For Amharic Text An Experiment Using
Probabilistic Context Free Grammars," A Thesis Submited In Partial Fulfilment Of The
Requirement For The Degree Of Master Of Scinece In Information Science, 2002.
[2] "Natural language processing toolkit" Accessed from http://www.nltk.org/.
[3] Daniel Jurafsky & James H. Martin, "Speech and Language Processing: An introduction
to natural language processing, Computational linguistics, and speech recognition", 2007.

[4] Abiyot Bayou, "Design and Development of Word Parser for Amharic Language",
Masters Thesis, Addis Ababa University. 2000.

Windows 7 Activation Key
No ratings yet
Windows 7 Activation Key
2 pages
Shewa - NLP Project Report PDF
No ratings yet
Shewa - NLP Project Report PDF
7 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
NLP Qb-Ese
No ratings yet
NLP Qb-Ese
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
20 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
NLP Iat QB
No ratings yet
NLP Iat QB
10 pages
NLP Lab Expdoc New
No ratings yet
NLP Lab Expdoc New
103 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Augmented Transition Networks: An Augmented Transition Network (ATN) Is A Type of Graph
No ratings yet
Augmented Transition Networks: An Augmented Transition Network (ATN) Is A Type of Graph
8 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
AI Assignment 1
No ratings yet
AI Assignment 1
31 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Inverted File Assignment
No ratings yet
Inverted File Assignment
6 pages
Krr Unit i Notes
No ratings yet
Krr Unit i Notes
32 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
Nano Computing
100% (1)
Nano Computing
22 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
Natural Language Processing (Synopsis)
No ratings yet
Natural Language Processing (Synopsis)
8 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
2 Syntax Directed Transiation
No ratings yet
2 Syntax Directed Transiation
9 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
Com713 Advanced Data Structures and Algorithms
No ratings yet
Com713 Advanced Data Structures and Algorithms
13 pages
Natural Language Processing (NLP) : Chapter 1: Introduction To NLP
No ratings yet
Natural Language Processing (NLP) : Chapter 1: Introduction To NLP
96 pages
Compiler Design - Chapter 4 - Syntax Directed Translation
No ratings yet
Compiler Design - Chapter 4 - Syntax Directed Translation
49 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Internship Report
No ratings yet
Internship Report
13 pages
Unit v Discourse Analysis and Lexical Resources
100% (1)
Unit v Discourse Analysis and Lexical Resources
14 pages
Semantic Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Semantic Analysis: Natural Language Processing (CSE 5321)
35 pages
Ch11 3 Tries
No ratings yet
Ch11 3 Tries
11 pages
SS & OS LAB Manual-1 PDF
No ratings yet
SS & OS LAB Manual-1 PDF
73 pages
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
No ratings yet
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
53 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
Unit -1 NLP-R20
No ratings yet
Unit -1 NLP-R20
10 pages
Multilingual Information Retrieval
No ratings yet
Multilingual Information Retrieval
18 pages
hpc qb with answer
No ratings yet
hpc qb with answer
17 pages
NLP Based Automatic Answer Script Evaluation
No ratings yet
NLP Based Automatic Answer Script Evaluation
9 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
Information Retrieval Systems U6
No ratings yet
Information Retrieval Systems U6
13 pages
Compiler Design LectureNotes
No ratings yet
Compiler Design LectureNotes
45 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Create Database ABC
No ratings yet
Create Database ABC
2 pages
Part of Speech Tagging
100% (2)
Part of Speech Tagging
13 pages
Part of Speech Tagging
100% (2)
Part of Speech Tagging
13 pages
Data Model: Types of Data Models
No ratings yet
Data Model: Types of Data Models
3 pages
Basic Apparel Production
No ratings yet
Basic Apparel Production
1 page
Wireless Transmission
No ratings yet
Wireless Transmission
39 pages
Lesson One - Introduction To History
No ratings yet
Lesson One - Introduction To History
3 pages
IB french tips
No ratings yet
IB french tips
1 page
Story Writing - B1
No ratings yet
Story Writing - B1
29 pages
Class 1 Ki Nautanki
No ratings yet
Class 1 Ki Nautanki
6 pages
English 9 - Anglo-American Literature: Aubrey V. Maglayon
No ratings yet
English 9 - Anglo-American Literature: Aubrey V. Maglayon
30 pages
Manabadi AP Inter 1st yr Sanskrit 2025 set-1
No ratings yet
Manabadi AP Inter 1st yr Sanskrit 2025 set-1
4 pages
English Lesson Plan - Class 3 - Week 1
No ratings yet
English Lesson Plan - Class 3 - Week 1
5 pages
Review Exercises Unit 5-Unit 8
No ratings yet
Review Exercises Unit 5-Unit 8
16 pages
E5 LSK Listening 1.pptx
No ratings yet
E5 LSK Listening 1.pptx
23 pages
Linguistics Short Q Answer UoS
No ratings yet
Linguistics Short Q Answer UoS
8 pages
Level 3 Unit 7 Crossword Level 3 Unit 7 Fill in The Blank
No ratings yet
Level 3 Unit 7 Crossword Level 3 Unit 7 Fill in The Blank
7 pages
Basic Present and Past Tenses
No ratings yet
Basic Present and Past Tenses
3 pages
HTTP Corpus - Quran.com Lemmas
No ratings yet
HTTP Corpus - Quran.com Lemmas
74 pages
Unit 5-Text Analysis & Translation Notes Mark
No ratings yet
Unit 5-Text Analysis & Translation Notes Mark
5 pages
Exercises On Word Structure and Word Formation
No ratings yet
Exercises On Word Structure and Word Formation
16 pages
Revision Question Reasoning (PrashantChaturvedi)
No ratings yet
Revision Question Reasoning (PrashantChaturvedi)
19 pages
Disagreement in SLA
No ratings yet
Disagreement in SLA
38 pages
Lesson 11 - 13 Renting An Apartment
No ratings yet
Lesson 11 - 13 Renting An Apartment
13 pages
Ingles
No ratings yet
Ingles
4 pages
Y3 Lesson 134 TS25 13.10
No ratings yet
Y3 Lesson 134 TS25 13.10
7 pages
General Education Actual Let 202
No ratings yet
General Education Actual Let 202
13 pages
May 16
No ratings yet
May 16
4 pages
Compiled Balvatika To V
No ratings yet
Compiled Balvatika To V
59 pages
Annotated Lesson Plan Sample
No ratings yet
Annotated Lesson Plan Sample
9 pages
Pte Scoring Guide 67fc
No ratings yet
Pte Scoring Guide 67fc
13 pages
Conversation Course II SMAG
No ratings yet
Conversation Course II SMAG
34 pages
De Thi Thu Vao Lop 10 Mon Tieng Anh Hoc Truong THPT Chuyen Nguyen Hue Ha Noi
No ratings yet
De Thi Thu Vao Lop 10 Mon Tieng Anh Hoc Truong THPT Chuyen Nguyen Hue Ha Noi
8 pages
Holiday Assignment 2022
No ratings yet
Holiday Assignment 2022
5 pages
BITSAT_English_LR_Practice_Apr19
No ratings yet
BITSAT_English_LR_Practice_Apr19
2 pages
Resumos TESTE INGLÊS
100% (1)
Resumos TESTE INGLÊS
4 pages

Natural Language Processing

Uploaded by

Natural Language Processing

Uploaded by

Natural Language Processing (NLP)

Project Report paper on

2. Approaches used for this work

Probabilistic Context-Free Grammars (PCFG) Parsing

Code Example Using Python

Output of the above grammar and Viterberi_Parser in My application using Python

You might also like