0% found this document useful (0 votes)
91 views15 pages

NLP Steemer

This document presents the 3rd phase implementation of an affix removal stemmer for Afaraf text. It discusses: 1. The consecutive implementation of the proposed stemming algorithm including stop word removal, tokenization, normalization, and stemming. 2. The implementation is divided into two sections - section A discusses rules development from the 2nd phase and section B discusses preprocessing text by removing stop words and punctuation, and creating a GUI. 3. The proposed stemming algorithm first removes stop words and tokenizes words, then applies prefix rules, suffix rules, or displays the stem if no rules match to stem words.

Uploaded by

minichel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views15 pages

NLP Steemer

This document presents the 3rd phase implementation of an affix removal stemmer for Afaraf text. It discusses: 1. The consecutive implementation of the proposed stemming algorithm including stop word removal, tokenization, normalization, and stemming. 2. The implementation is divided into two sections - section A discusses rules development from the 2nd phase and section B discusses preprocessing text by removing stop words and punctuation, and creating a GUI. 3. The proposed stemming algorithm first removes stop words and tokenizes words, then applies prefix rules, suffix rules, or displays the stem if no rules match to stem words.

Uploaded by

minichel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

An Affix Removal Stemmer for Afaraf Text

3rd Phase Implementation Presentation

Prepared by: Wubie Abiye

March/2018
Consecutive Implementation of
proposed algorithm
• Stop word removal
• Tokenization
• Normalization
• Stemming
Implementation Sections
• Section A./ in 2nd phase implementation
 Collecting and arranging rules for development of algorithm
 Java library for pdf file extraction
 Writing codes for the collected rules and experiment with
some collection of Afaraf words
 Collecting and make ready stop words and punctuation which
will remove from files.

• Section B.
 Remove stop words, punctuation (tokenize text) and normalize.
 Create GUI
 Evaluate final result
Proposed algorithm
1. Let x = total number of input text
// Preprocessing
Remove stop words
Tokenize words
Normalize words
// Stemming
2. For all “x” repeat 3 - 5
3. Check by prefix rules
If match founds apply rules // prefix matching
Else go to step 5
4. Check by suffix rules
If match founds apply rules // suffix matching
Else go to step 5
5. Display stem of words
Collected stop words
Stop word con..

Note: Total collected stop words are: 197


Tokenize and Normalize

2. Tokenization = “. , ? / | \ @* =^& ( ) +_ ; : “
‘ ! # $ % [ ] { }< > - 1 2 3 4 5 6 7 8 9 0”

3. Normalization: change any upper cases in


the file in to lower case example: - Xaagu to
xaagu, Baaxo to baaxo, Dagge to dagge
Input file contains:
Stop word, punctuation, upper case and
non stemmed words
GUI
GUI with example
performance measure
Accuracy =[(Total words – Total errors) / Total words ]*100

• I did experiment on Afaraf text file which contains 1500 words


• After apply stop word removal 1350 words remained , hence 150 stop words
removed .
The experiment accuracy shows as follow by counting :
• 1280 words are stemmed correctly , and 59 and 11 words are stemmed
incorrectly due to over stemming and under stemming

• Accuracy = (1350 – 70/ 1350)*100 = (1280/1350)*100 = 94.81%


Example of stem process.
Future tense:
• Gexeyyo (I will go),
• Gexele (she/he will go),
• Gexetto (you will go)
• Gexelon (they will go)
Past tense:
• Gexeh (he went)
• Gexxeh (she went)
• Gexeenih (they went)
Present continuous tense :
• Gexah (he is going)
• Gexxah (you/she is going)
• Gexaanah (they are going)

 stem form : Gex (Go)


Working paper status
Survey paper status

You might also like