Information Retrieval For Kafficho Language
Information Retrieval For Kafficho Language
ABSTRACT
Information retrieval for Kafficho language presents unique challenges due to its limited
resources and lack of standardized tools. In this work, we explore the development of
information retrieval systems tailored specifically for Kafficho language, with the support and
collaboration of native speakers and linguists. We aim to investigate the Information retrieval
system for Kafficho language, ultimately creating the platform for access the information for
Kafficho speakers.
For information retrieval experiment purpose, we use 220 Kafficho text files with fifteen sample
queries. Data pre-processing steps (tokenization, normalization, stop word removal and
stemming) with other tasks like term weighting were preconditions for the vector space model to
represent both each document and a given query. Our evaluation conducted using the three well-
known metrics named as Precision, Recall, and F-measure measured 87%, 28%, and 35%
respectively. This shows that the performance of Kaffiho information retrieval was very using
vector space model.
1. Introduction
The amount of information throughout the net drastically increasing from time to time with the
revolution of technology. Especially, with the initiation of the Internet, huge volumes of data
available online through the network. As the amount of information doubled, utilizing and
accessing the relevant document becomes the challenging task. Since 99% of all the information
is not interesting for the 99% of all users. Thus, information retrieval comes into view for
accessing it. The information may be accessible to visitors, police officer, expert analysts and
increasingly being read by any intended users from anywhere and anytime after the materials
available on the net. This information easily accessible using a simple search (Search is the top
activity on the Internet nowadays, 92 % online users use a search engine, including 59 % who
use it daily.) with low effort and a short period of time when any mechanisms are applied for
effective and efficient retrieval.
It is well known that everyone wants to use his/her own language for a different purpose in a day
to day life. For communication, learning, accessing any available information, knowledge
sharing etc. if so, manipulating or processing this language using a computer make simple both
heavy and tedious works. This done by what we call natural language processing (NLP), it
enables computer systems to analyze the natural language (the language which is spoken by a
human) as a human being so that difficult and complex tasks are being simplest.
The newly advanced applications of NLP are paramount like information extraction, information
retrieval, part of tagging, spell checker, text summarizer, question answering etc. Even though
the feeling of getting these applications being sounded on one’s mind, but the limitation of
computational linguistic resources prevents from doing so. Therefore, try the best for these
works on each local language is the task of researchers in a specific area like us.
Information retrieval is an interesting area, which mainly deals with obtaining information
resources relevant to an information need (specifically a user query) from a collection of the
entire database. Normally, information retrieval incorporates two basic subsystems named as
indexing and searching. During indexing, document keywords are selected by applying different
preprocessing tasks; including tokenization, stop word removal, stemming, term weight, and it is
an offline process of representing text documents and organizing large document collection using
indexing structure such as an inverted file, sequential files and signature file to save storage
memory space and speed up searching time. In the same case; the given user query also passes
these steps before it compared from the given documents term. After that searching subsystem
being activated; Searching is the process of relating index terms to query terms and return
relevant hits to user’s query. Both indexing and searching are interconnected and dependent on
each other for enhancing effectiveness and efficiency of IR system.
Information retrieval for Kafficho language involves the development of systems and tools that
enable users to access and retrieve relevant information from a collection of texts or documents
written in Kafficho. This process is essential for linguistic research, language preservation, and
communication within the Kafficho-speaking community.
For Knowledge Dissemination; Information retrieval systems for Kafficho language enable the
dissemination of knowledge, resources, and educational materials within the community. By
making information more accessible in Kafficho, researchers can empower individuals to access,
share, and contribute to a wealth of knowledge in their native language.
Cultural significance, Information retrieval for Kafficho language goes beyond linguistic
analysis; it also plays a role in preserving cultural knowledge, traditions, and practices embedded
in the language. By enabling access to relevant information in Kafficho, this system contributes
to the transmission of cultural heritage within the community throughout the country.
Research Opportunities; Information retrieval for Kafficho language offers opportunities for
interdisciplinary research, bringing together experts from linguistics, computer science,
anthropology, and other fields.
Overall, the motivations to engage in Information Retrieval for Kafficho language are rooted in a
commitment to cultural preservation, knowledge dissemination, empowerment, linguistic
research, community engagement, educational opportunities, and promoting multilingualism.
The main objective of this work is; designing Information retrieval for Kafficho language. The
following terms, Kafa, Kefa, Kaffa, Keffa, Kafi-noonoo, Caffino, and Kaficho know the
language in the literature. So, use these names, as an alternative are possible in different
documents depending on the authors intuitive. However, to avoid confusion Kafficho was used
and selected here. Kafficho language used as a working language for Keffa zone and it is a
normal teaching language in school until grade ten. In addition to that, it is given at diploma
level in Bonga teaching college. Moreover, the information, which exchanged using this
language progressively, protracted in different places.
2. Methodology
This thesis methodology; refer to the general procedures that we followed to solve the basic
challenges which discussed above, and they are necessary techniques and tools for the
accomplishment of the work. Alternatively, the mechanisms settled in systematic manner to
achieve the objectives of this work more appropriately.
In order to gain a deep understanding of Information retrieval, different models and techniques
developed for different languages such as English, Amharic, Afaan-Oromo, Wolayitta and others
were reviewed. To understand the morphology of Kafficho reading different books, guidance
materials, dictionary, and discussing with linguistics were incorporated.
Corpus is mandatory for any natural language processing research. However, like other local
under-resourced languages, Kafficho also has no any organized or standard corpora. So, we
collected some data from different sources and consider as a corpus for experimental purpose.
Sample texts of different disciplines were collected from teaching materials (from student
textbooks), from the following papers like (Kidane, 2012), (Kassa, 2012), (Teso, 2009),(Leikola,
2014) from Kafi Televizhinee new articles, and dictionary. Linguistics was a participant in the
preparation of a corpus.
Any NLP research should be measurable in terms of performance; which refers the evaluation of
the experimental phase. The performance of information retrieval evaluated using precision,
recall, and F-measure metrics based on test collections.
2.4 Tools and techniques
For the coding part, we were used Python programming language as a tool. Python is more
suitable for NLP application development and the researchers have a better experience and more
familiar than other languages.as a technique we were use tokenization, normalization, stop word
removal and term weighting.
3. Result
We checked the performance of Kafficho information retrieval using the three metrics. Test
collections of 220 documents with 15 queries were taken. As we got; Kafficho text retrieval
performance registers 87% precision, 28% recall, and 35% F-measure. Indeed, this result is very
good using selective model called vector space model.
4. Discussion
The result which is depicted the above table shows that Information Retrieval for Kafficho
registers a very good in terms of performance. Precision and recall hold an approximate inverse
relationship: higher precision is often coupled with lower recall [70]. The precision was
0.87(87%), but the recall was to 0.28(28%). This shows that the two basic metrics are mostly
make trade-off.
We did the evaluation. Test collections of 220 documents with 15 queries were taken. As we
got; Kafficho text retrieval performance measures 87% precision, 28% recall, and 35% F-
measure.
Recommendation
This research was the first trial on Kafficho language. We hope other voluntary guys might
conduct on this language by attaining better performance than our result. Due to the
morphological complexity of the language, there was a cumbersome task to come up with
enough rules. Consequently, the researcher recommends the following points.
To enhance the performance (precision and recall) of Kafficho IR system the researcher
recommends integrating mechanisms of controlling synonyms, polysemy terms and query
expansion in the VSM model. In addition to that; beyond VSM, figure out the best model
that works for Kafficho retrieval system also another research area.
Develop a Kafficho-specific search engine.
Build a comprehensive Kafficho language corpus
Provide feedback mechanism for the user to rate the relevance of search results so that it
can help in continuously improving the performance of information retrieval system in
Kaffich language.
Acknowledgement
Primarily, I would like to thanks the Almighty GOD and my mother St. Mary
for giving me the strength and capability to complete this work.
To my friends and linguistic experts, thanks for all that you providing your
time, sharing knowledge and relevant resources with politeness for analyzing
Kaffichomorphology especially, Kero Kochito, Belachewu, Kifle, Magnecho, I
have no word for your guidance.
References
Kassa, T. (2012). Case system of Kafinoonoo by Taye Kassa A Thesis submitted to The
Department of Linguistics Presented in partial fulfillment of the Requirements for the
Degree of Masters of Arts in general linguistics School of graduate studies. June.
Kidane, W. (2012). Addis ababa university college of language studies, humanities, journalism
and communication department of foreign languages and literature.
Moral, C., de Antonio, A., Imbert, R., & Ramírez, J. (2014). A survey of stemming algorithms in
information retrieval. Information Research, 19(1), 76–80. https://doi.org/10.9790/0661-
17367680