0% found this document useful (0 votes)

40 views8 pages

Information Retrieval For Kafficho Language

Uploaded by

yuti6211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views8 pages

Information Retrieval For Kafficho Language

Uploaded by

yuti6211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Title: - Information Retrieval for Kafficho Language

ABSTRACT
Information retrieval for Kafficho language presents unique challenges due to its limited
resources and lack of standardized tools. In this work, we explore the development of
information retrieval systems tailored specifically for Kafficho language, with the support and
collaboration of native speakers and linguists. We aim to investigate the Information retrieval
system for Kafficho language, ultimately creating the platform for access the information for
Kafficho speakers.

For information retrieval experiment purpose, we use 220 Kafficho text files with fifteen sample
queries. Data pre-processing steps (tokenization, normalization, stop word removal and
stemming) with other tasks like term weighting were preconditions for the vector space model to
represent both each document and a given query. Our evaluation conducted using the three well-
known metrics named as Precision, Recall, and F-measure measured 87%, 28%, and 35%
respectively. This shows that the performance of Kaffiho information retrieval was very using
vector space model.

KEYWORDS: - Kafficho, Information Retrieval, Vector Space Model.

1. Introduction

The amount of information throughout the net drastically increasing from time to time with the
revolution of technology. Especially, with the initiation of the Internet, huge volumes of data
available online through the network. As the amount of information doubled, utilizing and
accessing the relevant document becomes the challenging task. Since 99% of all the information
is not interesting for the 99% of all users. Thus, information retrieval comes into view for
accessing it. The information may be accessible to visitors, police officer, expert analysts and
increasingly being read by any intended users from anywhere and anytime after the materials
available on the net. This information easily accessible using a simple search (Search is the top
activity on the Internet nowadays, 92 % online users use a search engine, including 59 % who
use it daily.) with low effort and a short period of time when any mechanisms are applied for
effective and efficient retrieval.

It is well known that everyone wants to use his/her own language for a different purpose in a day
to day life. For communication, learning, accessing any available information, knowledge
sharing etc. if so, manipulating or processing this language using a computer make simple both
heavy and tedious works. This done by what we call natural language processing (NLP), it
enables computer systems to analyze the natural language (the language which is spoken by a
human) as a human being so that difficult and complex tasks are being simplest.

The newly advanced applications of NLP are paramount like information extraction, information
retrieval, part of tagging, spell checker, text summarizer, question answering etc. Even though
the feeling of getting these applications being sounded on one’s mind, but the limitation of
computational linguistic resources prevents from doing so. Therefore, try the best for these
works on each local language is the task of researchers in a specific area like us.

Information retrieval is an interesting area, which mainly deals with obtaining information
resources relevant to an information need (specifically a user query) from a collection of the
entire database. Normally, information retrieval incorporates two basic subsystems named as
indexing and searching. During indexing, document keywords are selected by applying different
preprocessing tasks; including tokenization, stop word removal, stemming, term weight, and it is
an offline process of representing text documents and organizing large document collection using
indexing structure such as an inverted file, sequential files and signature file to save storage
memory space and speed up searching time. In the same case; the given user query also passes
these steps before it compared from the given documents term. After that searching subsystem
being activated; Searching is the process of relating index terms to query terms and return
relevant hits to user’s query. Both indexing and searching are interconnected and dependent on
each other for enhancing effectiveness and efficiency of IR system.
Information retrieval for Kafficho language involves the development of systems and tools that
enable users to access and retrieve relevant information from a collection of texts or documents
written in Kafficho. This process is essential for linguistic research, language preservation, and
communication within the Kafficho-speaking community.

For Linguistic Diversity, Kafficho is a unique language spoken by a specific community in

Southwestern Ethiopia, and it is crucial to preserve and promote linguistic diversity. Information
retrieval systems for Kafficho can help in documenting and analyzing the language, contributing
to the understanding and preservation of this linguistic heritage.

For Knowledge Dissemination; Information retrieval systems for Kafficho language enable the
dissemination of knowledge, resources, and educational materials within the community. By
making information more accessible in Kafficho, researchers can empower individuals to access,
share, and contribute to a wealth of knowledge in their native language.

Cultural significance, Information retrieval for Kafficho language goes beyond linguistic
analysis; it also plays a role in preserving cultural knowledge, traditions, and practices embedded
in the language. By enabling access to relevant information in Kafficho, this system contributes
to the transmission of cultural heritage within the community throughout the country.

Community Engagement, Involving the Kafficho-speaking community in the development of

information retrieval systems is essential to ensure that the tools meet their specific needs and
preferences. Collaboration with native speakers, linguists, and technology experts can lead to
more effective and culturally sensitive solutions.

Research Opportunities; Information retrieval for Kafficho language offers opportunities for
interdisciplinary research, bringing together experts from linguistics, computer science,
anthropology, and other fields.

Overall, the motivations to engage in Information Retrieval for Kafficho language are rooted in a
commitment to cultural preservation, knowledge dissemination, empowerment, linguistic
research, community engagement, educational opportunities, and promoting multilingualism.

The main objective of this work is; designing Information retrieval for Kafficho language. The
following terms, Kafa, Kefa, Kaffa, Keffa, Kafi-noonoo, Caffino, and Kaficho know the
language in the literature. So, use these names, as an alternative are possible in different
documents depending on the authors intuitive. However, to avoid confusion Kafficho was used
and selected here. Kafficho language used as a working language for Keffa zone and it is a
normal teaching language in school until grade ten. In addition to that, it is given at diploma
level in Bonga teaching college. Moreover, the information, which exchanged using this
language progressively, protracted in different places.

2. Methodology

This thesis methodology; refer to the general procedures that we followed to solve the basic
challenges which discussed above, and they are necessary techniques and tools for the
accomplishment of the work. Alternatively, the mechanisms settled in systematic manner to
achieve the objectives of this work more appropriately.

2.1 Literature review

In order to gain a deep understanding of Information retrieval, different models and techniques
developed for different languages such as English, Amharic, Afaan-Oromo, Wolayitta and others
were reviewed. To understand the morphology of Kafficho reading different books, guidance
materials, dictionary, and discussing with linguistics were incorporated.

2.2 Corpus collection and preparation

Corpus is mandatory for any natural language processing research. However, like other local
under-resourced languages, Kafficho also has no any organized or standard corpora. So, we
collected some data from different sources and consider as a corpus for experimental purpose.
Sample texts of different disciplines were collected from teaching materials (from student
textbooks), from the following papers like (Kidane, 2012), (Kassa, 2012), (Teso, 2009),(Leikola,
2014) from Kafi Televizhinee new articles, and dictionary. Linguistics was a participant in the
preparation of a corpus.

2.3 Experimentation method

Any NLP research should be measurable in terms of performance; which refers the evaluation of
the experimental phase. The performance of information retrieval evaluated using precision,
recall, and F-measure metrics based on test collections.
2.4 Tools and techniques

For the coding part, we were used Python programming language as a tool. Python is more
suitable for NLP application development and the researchers have a better experience and more
familiar than other languages.as a technique we were use tokenization, normalization, stop word
removal and term weighting.

3. Result

We checked the performance of Kafficho information retrieval using the three metrics. Test
collections of 220 documents with 15 queries were taken. As we got; Kafficho text retrieval
performance registers 87% precision, 28% recall, and 35% F-measure. Indeed, this result is very
good using selective model called vector space model.

Table 3.1 sample results using the three metrics

No Query Precision Recall F-measure

1. Dojjechino immimmi 1 0.16 0.27

hajjiyoon dabbii qoppeheete

2. asho woco 1 0.14 0.24

3. ceechii katinooch kotiyoo 1 0.25 0.4

booyeehe.

4. mooyon ariyoo 0.59 0.73 0.65

5. gonde shuriyoon qayiyee 1 0.25 0.4

qashee kexooch kotee
kotoon doyibeeta.

6. ashittino hallee biiye 1 0.035 0.067

wushee iiqqoon
niriyoonane.

7. booyoon immeehote? 1 0.16 0.275

8. kechi hinnoo 0.25 0.52 0.333

9. heete shafiroona dichooch 1 0.66 0.79

10. viidiyee deekkoon 1 0.66 0.79

qechiyeete

11. gaachee boono digeneti 0.33 0.019 0.036

mooyon

12. yaach wotta barooch 1 0.25 0.4

deewiyeete.

13. daggooch shoodeyoo 1 0.096 0.175

14. gommona biriyoo hakkeehe 1 0.16 0.27

15. doyee kexooch gimo 1 0.134 0.236

Average 0.87 0.28 0.35

4. Discussion

The result which is depicted the above table shows that Information Retrieval for Kafficho
registers a very good in terms of performance. Precision and recall hold an approximate inverse
relationship: higher precision is often coupled with lower recall [70]. The precision was
0.87(87%), but the recall was to 0.28(28%). This shows that the two basic metrics are mostly
make trade-off.

Conclusion and Recommendation

In conclusion, information retrieval for Kafficho language holds immense significance in
preserving cultural heritage, promoting linguistic diversity, empowering communities, and
advancing educational opportunities. By developing tools and resources that facilitate access to
information in Kafficho, researchers and language advocates contribute to the preservation and
revitalization of a unique linguistic tradition. Through community engagement, collaboration,
and a commitment to promoting multilingualism, information retrieval initiatives for Kafficho
language can foster inclusivity, knowledge dissemination, and linguistic research. By
recognizing the value of minority languages like Kafficho and investing in their preservation and
promotion, we can create a more inclusive and diverse linguistic landscape that celebrates the
richness of global language diversity. In essence, information retrieval for Kafficho language
serves as a vital tool for cultural preservation, empowerment, and knowledge sharing, ultimately
contributing to the enrichment of our collective linguistic heritage.

We did the evaluation. Test collections of 220 documents with 15 queries were taken. As we
got; Kafficho text retrieval performance measures 87% precision, 28% recall, and 35% F-
measure.

Recommendation

This research was the first trial on Kafficho language. We hope other voluntary guys might
conduct on this language by attaining better performance than our result. Due to the
morphological complexity of the language, there was a cumbersome task to come up with
enough rules. Consequently, the researcher recommends the following points.

 To enhance the performance (precision and recall) of Kafficho IR system the researcher
recommends integrating mechanisms of controlling synonyms, polysemy terms and query
expansion in the VSM model. In addition to that; beyond VSM, figure out the best model
that works for Kafficho retrieval system also another research area.
 Develop a Kafficho-specific search engine.
 Build a comprehensive Kafficho language corpus
 Provide feedback mechanism for the user to rate the relevance of search results so that it
can help in continuously improving the performance of information retrieval system in
Kaffich language.
Acknowledgement

Primarily, I would like to thanks the Almighty GOD and my mother St. Mary
for giving me the strength and capability to complete this work.
To my friends and linguistic experts, thanks for all that you providing your
time, sharing knowledge and relevant resources with politeness for analyzing
Kaffichomorphology especially, Kero Kochito, Belachewu, Kifle, Magnecho, I
have no word for your guidance.

References

Kassa, T. (2012). Case system of Kafinoonoo by Taye Kassa A Thesis submitted to The
Department of Linguistics Presented in partial fulfillment of the Requirements for the
Degree of Masters of Arts in general linguistics School of graduate studies. June.

Kidane, W. (2012). Addis ababa university college of language studies, humanities, journalism
and communication department of foreign languages and literature.

Krovetz, R. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1–

2), 277–294. https://doi.org/10.1016/S0004-3702(99)00101-0

Leikola, K. (2014). Talking Manjo linguistic repertoires as means of negotiating marginalization

(Issue May).

Moral, C., de Antonio, A., Imbert, R., & Ramírez, J. (2014). A survey of stemming algorithms in
information retrieval. Information Research, 19(1), 76–80. https://doi.org/10.9790/0661-
17367680

Teso, T. F. (2009). A typology of verbal derivation in Ethiopian Afro-Asiatic languages.

Cybersource Guide
No ratings yet
Cybersource Guide
107 pages
ScadaBR-Developers - CERTI - ScadaBR2
100% (1)
ScadaBR-Developers - CERTI - ScadaBR2
20 pages
Data Retrival Systems
No ratings yet
Data Retrival Systems
3 pages
Mikrotik VRRP and Load Sharing
No ratings yet
Mikrotik VRRP and Load Sharing
12 pages
L13 - Business Process Management Perspective
100% (2)
L13 - Business Process Management Perspective
76 pages
KATALK
No ratings yet
KATALK
59 pages
ModuleLearningSpanishNESMNEW (Lasin)
100% (1)
ModuleLearningSpanishNESMNEW (Lasin)
152 pages
18csc310j Unit 5
No ratings yet
18csc310j Unit 5
300 pages
Afaan Oromo-English Cross-Lingual Information Retrieval
No ratings yet
Afaan Oromo-English Cross-Lingual Information Retrieval
95 pages
Citrix Netscaler Data Sheet
No ratings yet
Citrix Netscaler Data Sheet
12 pages
Drawing Upon e Learning Tools in English As A Foreign
No ratings yet
Drawing Upon e Learning Tools in English As A Foreign
5 pages
Chapter-1 Introduction To PLC: Types of PLC Avialable in Market
No ratings yet
Chapter-1 Introduction To PLC: Types of PLC Avialable in Market
50 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
Inglés Inter Busisness Program Level 1 Enero 30 2013
No ratings yet
Inglés Inter Busisness Program Level 1 Enero 30 2013
6 pages
64t64r Massive Mimo Remote Radio Unit
100% (1)
64t64r Massive Mimo Remote Radio Unit
2 pages
GSM Channels
No ratings yet
GSM Channels
44 pages
WS 2.4
0% (1)
WS 2.4
3 pages
Spa-Ingles Basico I 2016
No ratings yet
Spa-Ingles Basico I 2016
21 pages
Effective Techniques For Indonesia Text Retrieval
No ratings yet
Effective Techniques For Indonesia Text Retrieval
286 pages
Enhancing VNF Performance by Exploiting SR IOV and DPDK Packet Processing Acceleration
No ratings yet
Enhancing VNF Performance by Exploiting SR IOV and DPDK Packet Processing Acceleration
6 pages
Application of Computational Linguistics
No ratings yet
Application of Computational Linguistics
19 pages
Stockwell 2007 Review of Tech Choice PDF
No ratings yet
Stockwell 2007 Review of Tech Choice PDF
17 pages
Description: Tags: Lrcprofiles2002-05
No ratings yet
Description: Tags: Lrcprofiles2002-05
45 pages
AS-IS & GAP Analysis: Mobile POS With NFC
No ratings yet
AS-IS & GAP Analysis: Mobile POS With NFC
7 pages
Afaan Oromo Text Retrieval
No ratings yet
Afaan Oromo Text Retrieval
79 pages
Cybersecurity For Smart Cities: A Brief Review
No ratings yet
Cybersecurity For Smart Cities: A Brief Review
10 pages
Example Based Search 2001
No ratings yet
Example Based Search 2001
7 pages
Designing Textbook Professionally Oriented Introductory Cource Databases in English Within Clil Framework in Technical University
No ratings yet
Designing Textbook Professionally Oriented Introductory Cource Databases in English Within Clil Framework in Technical University
8 pages
University Technology Mara (Uitm) Faculty of Education: Computer Asissted Language Learning
No ratings yet
University Technology Mara (Uitm) Faculty of Education: Computer Asissted Language Learning
12 pages
Untitled
No ratings yet
Untitled
3 pages
Review On The B Ed Syllabuses EFLU Hyder
No ratings yet
Review On The B Ed Syllabuses EFLU Hyder
4 pages
Faculty Name: Dr. Humera Khanam Subject Name:NLP
No ratings yet
Faculty Name: Dr. Humera Khanam Subject Name:NLP
206 pages
Brief Introduction To Tamil Verb Teaching Package (PDFDrive)
No ratings yet
Brief Introduction To Tamil Verb Teaching Package (PDFDrive)
46 pages
Seminar: "Multilingual Web Search and Navigation"
No ratings yet
Seminar: "Multilingual Web Search and Navigation"
15 pages
All Research Reports Use Roughly The Same Format
No ratings yet
All Research Reports Use Roughly The Same Format
10 pages
Iv It B Timetable
No ratings yet
Iv It B Timetable
1 page
Parcel Fabric - Migrating and Administrating Parcels With Arcgis Pro
No ratings yet
Parcel Fabric - Migrating and Administrating Parcels With Arcgis Pro
42 pages
Multilingual Information Retrieval
No ratings yet
Multilingual Information Retrieval
18 pages
A Descriptive Analysis of The Morphology of Malawian Languages
No ratings yet
A Descriptive Analysis of The Morphology of Malawian Languages
8 pages
Information Survey
No ratings yet
Information Survey
35 pages
Huawei MSC Pool
No ratings yet
Huawei MSC Pool
4 pages
Excel Core 2016 Lesson 09
No ratings yet
Excel Core 2016 Lesson 09
115 pages
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
No ratings yet
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
12 pages
Mother Tongue Lessons Reviewer
No ratings yet
Mother Tongue Lessons Reviewer
185 pages
Understanding Computer Hardware and Peripherals
No ratings yet
Understanding Computer Hardware and Peripherals
58 pages
The Effects of Code-Mixing Among Béchar University Students in Learning EFL Case of Study - First Year Master Students of EFL
No ratings yet
The Effects of Code-Mixing Among Béchar University Students in Learning EFL Case of Study - First Year Master Students of EFL
18 pages
Ir Assignment
No ratings yet
Ir Assignment
12 pages
Finalpublishresearch
No ratings yet
Finalpublishresearch
8 pages
LINGO TALK MANUSCRIPT Original File
No ratings yet
LINGO TALK MANUSCRIPT Original File
27 pages
IMC651
No ratings yet
IMC651
8 pages
574-102 4100V8 Prog
No ratings yet
574-102 4100V8 Prog
33 pages
LuidKapampangan Paper
No ratings yet
LuidKapampangan Paper
47 pages
EScholarship UC Item 98v325t8
No ratings yet
EScholarship UC Item 98v325t8
228 pages
Eco Strip 050
No ratings yet
Eco Strip 050
17 pages
Peculiarities in Teaching Engineers of A Technological University English For Professional Purposes2021
No ratings yet
Peculiarities in Teaching Engineers of A Technological University English For Professional Purposes2021
5 pages
Bimbingan 1 6 Mei 2024
No ratings yet
Bimbingan 1 6 Mei 2024
3 pages
Understanding The Perspectives of Cambodian Provincial English Learners On Using AI Chatbots To Enhance Self-Paced Learning Paths
No ratings yet
Understanding The Perspectives of Cambodian Provincial English Learners On Using AI Chatbots To Enhance Self-Paced Learning Paths
7 pages
CommonCore Gateway
No ratings yet
CommonCore Gateway
26 pages
FRD Template
No ratings yet
FRD Template
20 pages
Olalekan Et Al 2
No ratings yet
Olalekan Et Al 2
8 pages
1 AI Intro
No ratings yet
1 AI Intro
27 pages
Question A
No ratings yet
Question A
4 pages
AI Unit 5
No ratings yet
AI Unit 5
22 pages
Comparing Words, Stems, and Roots As Index Terms in An Arabic Information Retrieval System
No ratings yet
Comparing Words, Stems, and Roots As Index Terms in An Arabic Information Retrieval System
13 pages
Presentation IT Ext
No ratings yet
Presentation IT Ext
149 pages
PPS Question Bank - Updated
No ratings yet
PPS Question Bank - Updated
2 pages
Lecture 4 Advanced Search Techniques For Bibliographic Research
No ratings yet
Lecture 4 Advanced Search Techniques For Bibliographic Research
18 pages
Developing Cascading Style
No ratings yet
Developing Cascading Style
19 pages
Appendix
No ratings yet
Appendix
10 pages
Compiling Apertium Morphological Diction
No ratings yet
Compiling Apertium Morphological Diction
112 pages
HTML2
No ratings yet
HTML2
73 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Informational Retrival
No ratings yet
Informational Retrival
1 page
CLC Demo Final
No ratings yet
CLC Demo Final
22 pages
Estimate Cost
No ratings yet
Estimate Cost
27 pages
Working 2
No ratings yet
Working 2
6 pages
Notes On Applied Ling.
No ratings yet
Notes On Applied Ling.
4 pages
354 1285 1 PB
No ratings yet
354 1285 1 PB
16 pages
Impact of Proficiency in English On The Intuitive Understanding of Computer Science Concepts
No ratings yet
Impact of Proficiency in English On The Intuitive Understanding of Computer Science Concepts
13 pages
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
No ratings yet
Design and Development of Morphological Analyzer For Tigrigna Verbs Using Hybrid Approach
12 pages
SOW - Fixed Cost Contract
No ratings yet
SOW - Fixed Cost Contract
17 pages
2020 Lrec-1 258
No ratings yet
2020 Lrec-1 258
7 pages
Model Data Object
No ratings yet
Model Data Object
18 pages
2 IntelligentAgent
No ratings yet
2 IntelligentAgent
31 pages
Updated Operate Presentation Packages
No ratings yet
Updated Operate Presentation Packages
12 pages
Ajassp 2016 1228 1234
No ratings yet
Ajassp 2016 1228 1234
7 pages
Form 2 Term 2 Course Outlines
No ratings yet
Form 2 Term 2 Course Outlines
7 pages
Lab 6
No ratings yet
Lab 6
4 pages
Expository Text Maria Piñeres
No ratings yet
Expository Text Maria Piñeres
6 pages
Comp. Práctico. Construct
No ratings yet
Comp. Práctico. Construct
6 pages
INL 210 - Theme 13 Detailed Notes - 2025
No ratings yet
INL 210 - Theme 13 Detailed Notes - 2025
8 pages
Latest Computer Science Trends
No ratings yet
Latest Computer Science Trends
8 pages
Corrigendum On Orthoimagery
No ratings yet
Corrigendum On Orthoimagery
7 pages
Retrieval of Reading Materials For Vocabulary and Shekhar
No ratings yet
Retrieval of Reading Materials For Vocabulary and Shekhar
10 pages
My Pro
No ratings yet
My Pro
23 pages

Information Retrieval For Kafficho Language

Uploaded by

Information Retrieval For Kafficho Language

Uploaded by

Title: - Information Retrieval for Kafficho Language

KEYWORDS: - Kafficho, Information Retrieval, Vector Space Model.

For Linguistic Diversity, Kafficho is a unique language spoken by a specific community in

Community Engagement, Involving the Kafficho-speaking community in the development of

2.1 Literature review

2.2 Corpus collection and preparation

2.3 Experimentation method

Table 3.1 sample results using the three metrics

No Query Precision Recall F-measure

1. Dojjechino immimmi 1 0.16 0.27

2. asho woco 1 0.14 0.24

3. ceechii katinooch kotiyoo 1 0.25 0.4

4. mooyon ariyoo 0.59 0.73 0.65

5. gonde shuriyoon qayiyee 1 0.25 0.4

6. ashittino hallee biiye 1 0.035 0.067

7. booyoon immeehote? 1 0.16 0.275

9. heete shafiroona dichooch 1 0.66 0.79

10. viidiyee deekkoon 1 0.66 0.79

11. gaachee boono digeneti 0.33 0.019 0.036

12. yaach wotta barooch 1 0.25 0.4

13. daggooch shoodeyoo 1 0.096 0.175

14. gommona biriyoo hakkeehe 1 0.16 0.27

15. doyee kexooch gimo 1 0.134 0.236

Average 0.87 0.28 0.35

Conclusion and Recommendation

Krovetz, R. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1–

Leikola, K. (2014). Talking Manjo linguistic repertoires as means of negotiating marginalization

Teso, T. F. (2009). A typology of verbal derivation in Ethiopian Afro-Asiatic languages.

You might also like