0% found this document useful (0 votes)
40 views8 pages

Information Retrieval For Kafficho Language

Uploaded by

yuti6211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views8 pages

Information Retrieval For Kafficho Language

Uploaded by

yuti6211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Title: - Information Retrieval for Kafficho Language

ABSTRACT
Information retrieval for Kafficho language presents unique challenges due to its limited
resources and lack of standardized tools. In this work, we explore the development of
information retrieval systems tailored specifically for Kafficho language, with the support and
collaboration of native speakers and linguists. We aim to investigate the Information retrieval
system for Kafficho language, ultimately creating the platform for access the information for
Kafficho speakers.

For information retrieval experiment purpose, we use 220 Kafficho text files with fifteen sample
queries. Data pre-processing steps (tokenization, normalization, stop word removal and
stemming) with other tasks like term weighting were preconditions for the vector space model to
represent both each document and a given query. Our evaluation conducted using the three well-
known metrics named as Precision, Recall, and F-measure measured 87%, 28%, and 35%
respectively. This shows that the performance of Kaffiho information retrieval was very using
vector space model.

KEYWORDS: - Kafficho, Information Retrieval, Vector Space Model.

1. Introduction

The amount of information throughout the net drastically increasing from time to time with the
revolution of technology. Especially, with the initiation of the Internet, huge volumes of data
available online through the network. As the amount of information doubled, utilizing and
accessing the relevant document becomes the challenging task. Since 99% of all the information
is not interesting for the 99% of all users. Thus, information retrieval comes into view for
accessing it. The information may be accessible to visitors, police officer, expert analysts and
increasingly being read by any intended users from anywhere and anytime after the materials
available on the net. This information easily accessible using a simple search (Search is the top
activity on the Internet nowadays, 92 % online users use a search engine, including 59 % who
use it daily.) with low effort and a short period of time when any mechanisms are applied for
effective and efficient retrieval.

It is well known that everyone wants to use his/her own language for a different purpose in a day
to day life. For communication, learning, accessing any available information, knowledge
sharing etc. if so, manipulating or processing this language using a computer make simple both
heavy and tedious works. This done by what we call natural language processing (NLP), it
enables computer systems to analyze the natural language (the language which is spoken by a
human) as a human being so that difficult and complex tasks are being simplest.

The newly advanced applications of NLP are paramount like information extraction, information
retrieval, part of tagging, spell checker, text summarizer, question answering etc. Even though
the feeling of getting these applications being sounded on one’s mind, but the limitation of
computational linguistic resources prevents from doing so. Therefore, try the best for these
works on each local language is the task of researchers in a specific area like us.

Information retrieval is an interesting area, which mainly deals with obtaining information
resources relevant to an information need (specifically a user query) from a collection of the
entire database. Normally, information retrieval incorporates two basic subsystems named as
indexing and searching. During indexing, document keywords are selected by applying different
preprocessing tasks; including tokenization, stop word removal, stemming, term weight, and it is
an offline process of representing text documents and organizing large document collection using
indexing structure such as an inverted file, sequential files and signature file to save storage
memory space and speed up searching time. In the same case; the given user query also passes
these steps before it compared from the given documents term. After that searching subsystem
being activated; Searching is the process of relating index terms to query terms and return
relevant hits to user’s query. Both indexing and searching are interconnected and dependent on
each other for enhancing effectiveness and efficiency of IR system.
Information retrieval for Kafficho language involves the development of systems and tools that
enable users to access and retrieve relevant information from a collection of texts or documents
written in Kafficho. This process is essential for linguistic research, language preservation, and
communication within the Kafficho-speaking community.

For Linguistic Diversity, Kafficho is a unique language spoken by a specific community in


Southwestern Ethiopia, and it is crucial to preserve and promote linguistic diversity. Information
retrieval systems for Kafficho can help in documenting and analyzing the language, contributing
to the understanding and preservation of this linguistic heritage.

For Knowledge Dissemination; Information retrieval systems for Kafficho language enable the
dissemination of knowledge, resources, and educational materials within the community. By
making information more accessible in Kafficho, researchers can empower individuals to access,
share, and contribute to a wealth of knowledge in their native language.

Cultural significance, Information retrieval for Kafficho language goes beyond linguistic
analysis; it also plays a role in preserving cultural knowledge, traditions, and practices embedded
in the language. By enabling access to relevant information in Kafficho, this system contributes
to the transmission of cultural heritage within the community throughout the country.

Community Engagement, Involving the Kafficho-speaking community in the development of


information retrieval systems is essential to ensure that the tools meet their specific needs and
preferences. Collaboration with native speakers, linguists, and technology experts can lead to
more effective and culturally sensitive solutions.

Research Opportunities; Information retrieval for Kafficho language offers opportunities for
interdisciplinary research, bringing together experts from linguistics, computer science,
anthropology, and other fields.

Overall, the motivations to engage in Information Retrieval for Kafficho language are rooted in a
commitment to cultural preservation, knowledge dissemination, empowerment, linguistic
research, community engagement, educational opportunities, and promoting multilingualism.

The main objective of this work is; designing Information retrieval for Kafficho language. The
following terms, Kafa, Kefa, Kaffa, Keffa, Kafi-noonoo, Caffino, and Kaficho know the
language in the literature. So, use these names, as an alternative are possible in different
documents depending on the authors intuitive. However, to avoid confusion Kafficho was used
and selected here. Kafficho language used as a working language for Keffa zone and it is a
normal teaching language in school until grade ten. In addition to that, it is given at diploma
level in Bonga teaching college. Moreover, the information, which exchanged using this
language progressively, protracted in different places.

2. Methodology

This thesis methodology; refer to the general procedures that we followed to solve the basic
challenges which discussed above, and they are necessary techniques and tools for the
accomplishment of the work. Alternatively, the mechanisms settled in systematic manner to
achieve the objectives of this work more appropriately.

2.1 Literature review

In order to gain a deep understanding of Information retrieval, different models and techniques
developed for different languages such as English, Amharic, Afaan-Oromo, Wolayitta and others
were reviewed. To understand the morphology of Kafficho reading different books, guidance
materials, dictionary, and discussing with linguistics were incorporated.

2.2 Corpus collection and preparation

Corpus is mandatory for any natural language processing research. However, like other local
under-resourced languages, Kafficho also has no any organized or standard corpora. So, we
collected some data from different sources and consider as a corpus for experimental purpose.
Sample texts of different disciplines were collected from teaching materials (from student
textbooks), from the following papers like (Kidane, 2012), (Kassa, 2012), (Teso, 2009),(Leikola,
2014) from Kafi Televizhinee new articles, and dictionary. Linguistics was a participant in the
preparation of a corpus.

2.3 Experimentation method

Any NLP research should be measurable in terms of performance; which refers the evaluation of
the experimental phase. The performance of information retrieval evaluated using precision,
recall, and F-measure metrics based on test collections.
2.4 Tools and techniques

For the coding part, we were used Python programming language as a tool. Python is more
suitable for NLP application development and the researchers have a better experience and more
familiar than other languages.as a technique we were use tokenization, normalization, stop word
removal and term weighting.

3. Result

We checked the performance of Kafficho information retrieval using the three metrics. Test
collections of 220 documents with 15 queries were taken. As we got; Kafficho text retrieval
performance registers 87% precision, 28% recall, and 35% F-measure. Indeed, this result is very
good using selective model called vector space model.

Table 3.1 sample results using the three metrics

No Query Precision Recall F-measure

1. Dojjechino immimmi 1 0.16 0.27


hajjiyoon dabbii qoppeheete

2. asho woco 1 0.14 0.24

3. ceechii katinooch kotiyoo 1 0.25 0.4


booyeehe.

4. mooyon ariyoo 0.59 0.73 0.65

5. gonde shuriyoon qayiyee 1 0.25 0.4


qashee kexooch kotee
kotoon doyibeeta.

6. ashittino hallee biiye 1 0.035 0.067


wushee iiqqoon
niriyoonane.

7. booyoon immeehote? 1 0.16 0.275


8. kechi hinnoo 0.25 0.52 0.333

9. heete shafiroona dichooch 1 0.66 0.79

10. viidiyee deekkoon 1 0.66 0.79


qechiyeete

11. gaachee boono digeneti 0.33 0.019 0.036


mooyon

12. yaach wotta barooch 1 0.25 0.4


deewiyeete.

13. daggooch shoodeyoo 1 0.096 0.175

14. gommona biriyoo hakkeehe 1 0.16 0.27

15. doyee kexooch gimo 1 0.134 0.236

Average 0.87 0.28 0.35

4. Discussion

The result which is depicted the above table shows that Information Retrieval for Kafficho
registers a very good in terms of performance. Precision and recall hold an approximate inverse
relationship: higher precision is often coupled with lower recall [70]. The precision was
0.87(87%), but the recall was to 0.28(28%). This shows that the two basic metrics are mostly
make trade-off.

Conclusion and Recommendation


In conclusion, information retrieval for Kafficho language holds immense significance in
preserving cultural heritage, promoting linguistic diversity, empowering communities, and
advancing educational opportunities. By developing tools and resources that facilitate access to
information in Kafficho, researchers and language advocates contribute to the preservation and
revitalization of a unique linguistic tradition. Through community engagement, collaboration,
and a commitment to promoting multilingualism, information retrieval initiatives for Kafficho
language can foster inclusivity, knowledge dissemination, and linguistic research. By
recognizing the value of minority languages like Kafficho and investing in their preservation and
promotion, we can create a more inclusive and diverse linguistic landscape that celebrates the
richness of global language diversity. In essence, information retrieval for Kafficho language
serves as a vital tool for cultural preservation, empowerment, and knowledge sharing, ultimately
contributing to the enrichment of our collective linguistic heritage.

We did the evaluation. Test collections of 220 documents with 15 queries were taken. As we
got; Kafficho text retrieval performance measures 87% precision, 28% recall, and 35% F-
measure.

Recommendation

This research was the first trial on Kafficho language. We hope other voluntary guys might
conduct on this language by attaining better performance than our result. Due to the
morphological complexity of the language, there was a cumbersome task to come up with
enough rules. Consequently, the researcher recommends the following points.

 To enhance the performance (precision and recall) of Kafficho IR system the researcher
recommends integrating mechanisms of controlling synonyms, polysemy terms and query
expansion in the VSM model. In addition to that; beyond VSM, figure out the best model
that works for Kafficho retrieval system also another research area.
 Develop a Kafficho-specific search engine.
 Build a comprehensive Kafficho language corpus
 Provide feedback mechanism for the user to rate the relevance of search results so that it
can help in continuously improving the performance of information retrieval system in
Kaffich language.
Acknowledgement

Primarily, I would like to thanks the Almighty GOD and my mother St. Mary
for giving me the strength and capability to complete this work.
To my friends and linguistic experts, thanks for all that you providing your
time, sharing knowledge and relevant resources with politeness for analyzing
Kaffichomorphology especially, Kero Kochito, Belachewu, Kifle, Magnecho, I
have no word for your guidance.

References

Kassa, T. (2012). Case system of Kafinoonoo by Taye Kassa A Thesis submitted to The
Department of Linguistics Presented in partial fulfillment of the Requirements for the
Degree of Masters of Arts in general linguistics School of graduate studies. June.

Kidane, W. (2012). Addis ababa university college of language studies, humanities, journalism
and communication department of foreign languages and literature.

Krovetz, R. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1–


2), 277–294. https://doi.org/10.1016/S0004-3702(99)00101-0

Leikola, K. (2014). Talking Manjo linguistic repertoires as means of negotiating marginalization


(Issue May).

Moral, C., de Antonio, A., Imbert, R., & Ramírez, J. (2014). A survey of stemming algorithms in
information retrieval. Information Research, 19(1), 76–80. https://doi.org/10.9790/0661-
17367680

Teso, T. F. (2009). A typology of verbal derivation in Ethiopian Afro-Asiatic languages.

You might also like