Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization

Uploaded by

Baraa Hekal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views7 pages

Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization

Uploaded by

Baraa Hekal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Enhancing Quranic Question Answering

Systems Through Dataset Expansion and

Model Optimization
1st Islam Oshallah 2nd Mohamed Basem 3rd Ali Hamdi
Computer Science Computer Science Computer Science
MSA University MSA University MSA University
6th October, Giza, Egypt 6th October, Giza, Egypt 6th October, Giza, Egypt
[email protected] [email protected] [email protected]

4th Ammar Mohammed

Computer Science
MSA University
6th October, Giza, Egypt
[email protected]

Abstract—Understanding the deep meanings of the The Quran QA 2023 shared task addresses these
Quran and bridging the language gap between Modern challenges by advancing research in question-answering
Standard Arabic and Classical Arabic is essential to im- (QA) systems tailored specifically for the Quranic text.
proving the Question and Answer (QA) system for the
Holy Quran. In this study, we focus on fine-tuning large This shared task comprises two subtasks: Passage Re-
language models (LLMs) to enhance the Quran QA system. trieval(the paper work task), which focuses on accurately
The original Quran QA 2023 shared task dataset had a identifying relevant Quranic passages in response to
limited number of questions, which resulted in weak model a query, and Machine Reading Comprehension, which
retrieval performance. To address this challenge, we first aims to generate precise and contextually appropriate
expanded the dataset by rephrasing and generating addi-
tional questions, increasing it from 251 to 629 diversified answers. These subtasks tackle the linguistic complexity
questions. Each question was paraphrased twice, resulting of the Quran while leveraging modern computational
in a comprehensive dataset of 1895 categorized questions approaches to improve retrieval and comprehension ac-
(single-answer, multi-answer, and zero-answer types). After curacy.
expanding the dataset, we translated the questions into This work contributes to the shared task by enhancing
English to incorporate models that support English. The
fine-tuned models, including Falcon 7B, Bloom 3B, Phi 3.5 the quality of datasets and refining the performance of
Mini, T5, and ELECTRA Large, demonstrated significant QA models. Through these efforts, the paper aims to
improvements in performance. The best-performing model, bridge the language gap, improve access to Quranic
ELECTRA Large, achieved MAP@10 of 0.31, MRR of 0.43, knowledge, and facilitate a deeper understanding for
Recall@10 of 0.46, and Precision@10 of 0.18. diverse audiences, including researchers, educators, and
Keywords: Quran Question Answering, Passage Re- lay users. Our approach combines state-of-the-art tech-
trieval, Modern Standard Arabic, Dataset Expansion, niques in natural language processing with a focus on
Fine-Tuning. addressing the unique characteristics of Quranic Arabic,
setting the foundation for future advancements in this
I. I NTRODUCTION domain.
The Holy Quran is revered by Muslims worldwide as
a divine source of guidance and knowledge. However, II. R ELATED W ORK
its accessibility is often hindered by linguistic barriers, The task of Quranic question answering, particularly
particularly for non-Arabic speakers and even for native Quranic passage retrieval, has become a focal point of
Arabic speakers unfamiliar with its Classical Arabic research due to the linguistic complexity and contextual
form. The distinct linguistic style of the Quran, rooted in richness of the Quran. Written in Classical Arabic, the
classical Arabic, differs significantly from Modern Stan- unique characteristics of the Quran, such as intricate syn-
dard Arabic, presenting challenges in comprehension and tax, multiple interpretations, and rhetorical depth, present
interpretation for contemporary readers. significant challenges for QA systems. These challenges
are further compounded by the necessity to bridge the to generate multiple query variations, further improving
linguistic gap between Modern Standard Arabic and retrieval precision. This approach significantly enhanced
Classical Arabic, as well as to manage unanswerable retrieval metrics, including MAP and MRR, demonstrat-
queries using robust zero answer detection mechanisms. ing the effectiveness of translation based methods in
Transformer based models like AraBERT, CAMeL- overcoming linguistic challenges. However, their find-
BERT, and AraELECTRA have shown promise in ad- ings also highlighted challenges with inference based
dressing Arabic NLP tasks, particularly for their ability methods for indirect queries and the limited capability
to model semantic and contextual nuances. However, of their models to handle unanswerable questions effec-
their effectiveness is often hindered by limited, imbal- tively.
anced datasets, which restrict their generalizability to
III. M ETHODOLOGY
unseen queries. Sarhan and Elkomy tackled this issue by
employing a hybrid architecture combining dual-encoder Our methodology involves three key phases:
and cross encoder models. They utilized transfer learning 1) Dataset Preparation: Review and expansion of
on external datasets, such as TyDI QA and Tafseer, the original dataset, increasing the question pool
to address data scarcity and boost model performance. and ensuring diversity.
Their ensemble approach achieved a MAP score of 25.05 2) Translation: Questions were translated into En-
percent, showcasing improved prediction stability and glish to enhance accessibility and model compati-
passage retrieval accuracy. Despite these advancements, bility.
their work emphasized the need for further dataset aug- 3) Model Fine-Tuning: Fine-tuning large language
mentation and enhanced fine-tuning techniques to fully models (LLMs), including Falcon 7B, Bloom 3B,
capture the complexities of Quranic QA. and T5, to optimize performance for the Quran QA
Alawwad et al. explored ensemble learning and fine- task.
tuning techniques on datasets like Tafseer and TyDI QA Extensive experiments were performed to compare
to improve passage retrieval performance. They intro- model performance.
duced thresholding mechanisms to handle unanswerable
questions, a crucial feature for realistic QA systems. A. Dataset Preparation
Their results demonstrated the benefits of ensemble 1) Database Collection: The dataset used in this
methods in low resource settings but also highlighted study was meticulously curated from various reliable
the dependency on external datasets and the limitations sources to ensure diversity, accuracy, and relevance. Our
posed by the quality and size of existing resources. Their goal was to gather a comprehensive dataset that could
findings underscored the importance of developing more improve the model’s ability to handle a wide range of
diverse and specialized datasets to enable consistent and question-answer pairs, thus enhancing the overall perfor-
accurate model predictions. mance of Quran Question-answering (QA) systems. To
Mahmoudi et al. proposed a multitask transfer learn- achieve this, we expanded the dataset significantly by
ing approach that incorporated both unsupervised and integrating data from the following trustworthy sources:
supervised fine tuning with models like AraELECTRA • Quran QA 2022 Shared Task Dataset: This
and AraBERT. To address the contextual richness of foundational dataset, curated by the Quran-QA team
Quranic text, they used advanced embedding techniques for competitors in the shared task, served as the
such as TSDAE and SimCSE to produce high-quality starting point for our experiments. It contains 251
sentence embeddings. These embeddings enhanced the structured questions paired with annotated Quranic
ability of models to perform passage retrieval and com- passages, covering both factoid and non-factoid
prehension tasks. However, their study pointed out that queries. The dataset is split into training, testing,
further advancements would require larger, more diverse and development subsets.
datasets to adequately address the complexity of indirect • Tafseer Book PDF: An extensive resource titled
and context-dependent queries in Quranic QA. 1000 Questions and Answers in the Holy Quran,
An innovative contribution came from Alawwad et al. based on Tafseer, was used to extract relevant
in their work for the Quran QA 2023 Shared Task. They question-passage pairs. The extracted data under-
tackled Quranic passage retrieval using translation mech- went a rigorous cleaning process to ensure high
anisms, which converted Arabic queries into English, quality and seamless integration.
enabling the use of advanced English based transformer • List of Plants Citation in Quran and Hadith:
models, such as OpenAI embeddings and sentence trans- This resource, provided by the Qur’anic Botanic
formers. By leveraging English based resources, they Garden, added a unique dimension to the dataset
mitigated the limitations of Arabic-language datasets. by offering context-specific references to plants
Their system also incorporated a paraphrasing module mentioned in the Quran and Hadith.
2) Dataset Expansion: To enhance the original 251- and answer questions accurately, enhancing the overall
question dataset from the Quran QA 2022 Shared Task performance of the Quranic QA system.
was manipulated by rephrasing and generating additional
questions. This process involved expanding the dataset
to 629 questions using the Tafseer Book PDF and the Old Dataset
List of Plants Citation in Quran and Hadith. All of these
questions were rephrased twice, resulting in a robust
dataset of 1,895 questions, which were categorized into 174 Q Train 52 Q Test 25 Q Development
single-answer, multi-answer, and zero-answer types.
This process is illustrated in Figure 1, which shows
the transformation of the original dataset through the
”DS Expansion and Manipulation” process into a more
comprehensive version that feeds into large language
models for fine-tuning.
The expanded data set allowed us to fine-tune mul- Expand number of Q from 1000 Q&A in the Holy Quran pdf
tiple pre-trained transformer models, with a focus on
improving performance in the retrieval of the Qur’anic
passage. By varying the phrasing, the dataset’s flexibility
and adaptability were improved, enabling models to
handle different question formats and vocabulary more New Train Dataset: +800 Q&A
effectively. This ultimately enhanced the generalization
and performance of the models in Quranic Question
Answering tasks.
3) Database Cleaning: The collected data was care-
fully cleaned to ensure high quality and reliability. The Extracted and Cleaned Relevant Question-Passage Pairs
cleaning process included the following steps:
• Removal of irrelevant or low-quality question-
passage pairs: Question-passage pairs that were
unclear or irrelevant to the task were removed Train Dataset Increased to : 629 Questions
to ensure that the dataset consisted only of high-
quality data.
• Standardization of question formats: To ensure
consistency across the dataset, question formats
were standardized, making it easier for the models Paraphrased Twice: 1895 Questions
to process and interpret the data.
By applying these cleaning steps, we ensured the
integrity of the dataset for optimal model training and
performance.
Translated to English
B. Dataset Translation
To ensure compatibility with large language models
(LLMs), we translated the dataset into English. The
Quranic passages were translated into English using the Final Train Dataset: 1895 Questions (English)
widely recognized translation by Marmaduke Pickthall,
an esteemed Islamic scholar. Additionally, the questions
Fig. 1. Workflow for Dataset Expansion and Manipulation.
were translated into English using Google Translator.
The main reason for translating the data into English is
that most LLMs understand Arabic better when trans- C. Model Fine-Tuning
lated into English, which allows for more effective In this study, we fine-tuned several large language
training and performance on Quranic question-answering models (LLMs) for the Quran Question Answering (QA)
tasks. task. The models used include Falcon 7B, Bloom 3B,
This approach enabled the creation of a multilingual Phi 3.5 Mini, T5, and ELECTRA Large. Each of these
dataset that improves the ability of LLMs to process models is built on a transformer architecture and has
been fine-tuned for different aspects of question answer- • Multilingual capabilities, supports a wide range of
ing. Additionally, we utilized a cross-encoder model for languages.
better interaction between question and passage pairs, • Fine-tuned on SQuAD, an extractive QA dataset.
ensuring more accurate and contextually relevant an- • Large-scale pre-training with 3 billion parameters.
swers. A cross-encoder operates by jointly processing the
question and passage as a single input, learning to predict F. Flan-T5 Large (Flan-T5 Large Squad2)
the relevance of the passage to the question, which Parameters: 11 billion parameters
significantly improves performance on tasks requiring Model Overview: Flan-T5 is a variant of the T5
passage retrieval. model that has been instruction-tuned to perform a
Below are detailed descriptions of each model, includ- variety of NLP tasks. The Flan-T5 Large Squad2 variant
ing their architecture, parameter size, and key features. has been fine-tuned on the SQuAD 2.0 dataset, which
D. Falcon QAMaster (Falcon 7B) includes unanswerable questions, allowing the model to
determine when a question has no answer. This makes
Parameters: 7 billion parameters Flan-T5 highly effective for nuanced QA tasks.
Model Overview: Falcon 7B is a transformer-based Architecture and Training: Flan-T5 is based on
language model designed for a variety of natural lan- the T5 (Text-to-Text Transfer Transformer) architecture,
guage processing (NLP) tasks. The QAMaster variant which treats every NLP task as a text generation prob-
is fine-tuned specifically for question answering tasks. lem. The model has been fine-tuned on a diverse set of
This model has been optimized to handle both fact-based tasks, including question answering, summarization, and
and context-dependent queries, making it suitable for the text classification. It excels at understanding complex
Quran QA task. instructions and is capable of handling both fact-based
Architecture and Training: Falcon 7B uses a stan- and unanswerable questions.
dard transformer architecture with 7 billion parameters, Key Features:
making it one of the larger models in terms of scale.
• Instruction-tuned, capable of handling a wide vari-
It is trained on a variety of QA datasets, which include
both extractive and generative question answering tasks. ety of NLP tasks.
• Fine-tuned on the SQuAD 2.0 dataset, which in-
This model excels in answering fact-based questions and
general knowledge queries, which makes it well-suited cludes unanswerable questions.
• Large number of parameters (11 billion), improving
for a Quranic QA task.
Key Features: performance on complex tasks.
• Fine-tuned specifically for QA tasks. G. Phi-3.5 Mini (Phi-3.5 Mini Instruct SQuAD V1)
• Uses a dense attention mechanism typical of
transformer-based models. Parameters: 3.5 billion parameters
• Optimized for handling long-form text and complex
Model Overview: Phi-3.5 Mini is a smaller,
queries. instruction-following model built to handle tasks that re-
quire understanding and following specific instructions.
E. Bloom 3B (Bloom-3B Squad) Fine-tuned on the SQuAD V1 dataset, Phi-3.5 Mini is
Parameters: 3 billion parameters adept at answering fact-based questions with high preci-
Model Overview: Bloom 3B is part of the Bloom sion. It is well-suited for tasks that involve instruction-
family of models, which are designed for multilingual based QA, where the model needs to comprehend the
NLP tasks. This specific variant, Bloom 3B Squad, has question and extract the relevant answer.
been fine-tuned on the SQuAD dataset, which focuses Architecture and Training: Phi-3.5 Mini follows
on extractive question answering. Bloom’s multilingual the instruction-following paradigm, where the model
capabilities allow it to handle text in multiple languages, has been trained to interpret tasks as instructions and
making it highly suitable for diverse linguistic applica- provide appropriate responses. It has been fine-tuned on
tions. the SQuAD V1 dataset, making it particularly strong in
Architecture and Training: Bloom is a large trans- extractive question answering.
former model with a focus on multilingual understand- Key Features:
ing. It leverages a GPT-style architecture (decoder-only • Optimized for instruction-following tasks, improv-
transformer) but has been trained on a multilingual ing response quality.
corpus to handle a wide range of languages. The model • Fine-tuned on the SQuAD V1 dataset, excelling in
is fine-tuned on the SQuAD dataset, making it effective extractive QA.
for extractive QA tasks. • 3.5 billion parameters, allowing it to balance per-
Key Features: formance and efficiency.
H. ELECTRA Large (ELECTRA Large SQuAD2) Quran QA dataset. These models are pre-trained on other
Parameters: 335 million parameters datasets, but they are specifically fine-tuned on this task
Model Overview: ELECTRA is a pre-trained trans- to improve their ability to retrieve the most relevant
former model that uses a unique method for train- passages for a given question.
ing, known as discriminative training, to generate more 5. LLM Output and Retrieval: The fine-tuned model
efficient and effective models. The ELECTRA Large generates output in the form of relevance scores for each
SQuAD2 model has been fine-tuned on the SQuAD 2.0 passage based on its understanding of the relationship
dataset, which includes both answerable and unanswer- between the input question and the passages. This allows
able questions, making it suitable for complex QA tasks. the system to rank the passages and select the most
Architecture and Training: ELECTRA uses a relevant ones.
generator-discriminator architecture. Instead of predict-
ing masked tokens in the usual transformer fashion, Q-Text
ELECTRA trains a discriminator to distinguish real
tokens from fake ones generated by a generator model.
This method allows ELECTRA to achieve high perfor- Positive passages Negative passages
mance with fewer parameters compared to other models
like BERT. The model is fine-tuned on the SQuAD 2.0 Text Relevance Text Relevance
dataset, making it highly capable of handling nuanced Passage1 1 Passage6 0
Passage3 1 Passage14 0
QA tasks. ... 1 ... 0
Key Features:
• Efficient pre-training technique, leading to faster
convergence and better generalization. Randomizer
• Fine-tuned on SQuAD 2.0, handling both answer-
able and unanswerable questions.
• 335 million parameters, providing a good trade-off
Combined Data for each question
between performance and computational efficiency.
I. Explanation of the Workflow Diagram Text Relevance
Passage3 1
Figure 2 illustrates the end-to-end workflow for fine-
Passage6 0
tuning and retrieval tasks in our Quran QA system. ... ...
The diagram shows how the input question (Q-Text) is
processed along with positive and negative passages to
generate the final answer during the fine-tuning phase.
1. Question-Text (Q-Text): In the training phase, the LLM using Cross-Encoder Test Cases
input question is provided as part of the dataset. This is
a question from the Quran QA dataset that the model is
being trained to answer from the Quranic passages. LLM Output Retrieval
2. Positive and Negative Passages: The model uses
passages from the Quran that are either relevant (positive
passages) or irrelevant (negative passages) to the given Answers
question. These passages are labeled with ”1” indicating
Passage2
relevance and ”0” indicating irrelevance during training.
Passage5
This helps the model learn how to distinguish relevant Passage9
passages. ...
3. Randomizer and Combined Data: The system
combines both positive and negative passages using a Fig. 2. Workflow for LLM training
randomizer. The purpose of this step is to create a bal-
anced training set where positive and negative passages
are mixed and ready to be used for model fine-tuning. IV. R ESULTS
4. Cross-Encoder Fine-Tuning: The combined data, The evaluation results demonstrate the impact of
with the relevant labels, is passed into a large language dataset expansion, question translation, and model fine-
model (LLM) using the cross-encoder architecture. Dur- tuning on improving the performance of Question An-
ing this phase, models like Falcon 7B, Bloom 3B, Phi swering systems for the Holy Quran. This section dis-
3.5 Mini, T5, and ELECTRA are fine-tuned using the cusses the results presented in Tables 1, 2, and 3,
which compare the performance of the models based on D. Precision Performance
MAP@10, MRR, Recall@10, and Precision@10 for our
Table 4 shows precision performance at Precision@5
fine-tuned approach.
and Precision@10. Electra-Large demonstrated the high-
A. Model Performance: MAP@10 est improvement, with Precision@5 rising to 0.21 and
Precision@10 rising to 0.19.
Table 1 compares the Mean Average Precision
(MAP@10) of each model after fine-tuning. Electra-
Large achieved the best performance among the models, TABLE IV
C OMPARISON OF MULTIPLE MODEL VERSIONS BASED ON
with MAP@10 reaching 0.31, showcasing the effective- P RECISION @5 AND P RECISION @10 EVALUATION METRICS .
ness of fine-tuning and dataset expansion. Other models,
such as Flan-T5 and Falcon, also achieved significant Model Per@5 Per@10
improvements. Base Ours Base Ours
Electra-Large 0.05 0.21 0.04 0.19
TABLE I Bloom 0.07 0.14 0.07 0.10
C OMPARISON OF MULTIPLE MODEL VERSIONS BASED ON Flan-T5 0.02 0.20 0.03 0.14
MAP@10 EVALUATION METRICS . Falcon 0.04 0.20 0.04 0.12

Model Baseline Ours

Electra-Large 0.04 0.31
Bloom 0.04 0.14 V. ACKNOWLEDGMENT
Flan-T5 0.01 0.26
Falcon 0.03 0.26 Heartfelt gratitude is extended to AiTech AU, AiTech
for Artificial Intelligence and Software Development
(https://aitech.net.au), for funding this research, provid-
B. Model Performance: MRR ing technical support, and enabling its successful com-
Table 2 presents the Mean Reciprocal Rank (MRR) re- pletion.
sults for each model. Electra-Large achieved the highest
MRR, reaching 0.43. VI. C ONCLUSION

TABLE II This study presents a significant advancement in im-

C OMPARISON OF MULTIPLE MODEL VERSIONS BASED ON MRR proving Quranic passage retrieval for Auestion Answer-
EVALUATION METRICS .
ing systems by fine tuning pretrained Large Language
Model Baseline Ours Models on an expanded dataset of 1,895 questions,
including translated and diverse queries. Models such
Electra-Large 0.11 0.43
Bloom 0.14 0.24 as Electra-Large, Flan-T5, Falcon, and Bloom were
Flan-T5 0.07 0.35 fine-tuned using transfer learning techniques, yielding
Falcon 0.10 0.40 substantial improvements in key performance metrics
like MAP@10, MRR, Recall, and Precision.
The results highlight the critical role of dataset ex-
C. Recall Performance
pansion and model architecture in building robust QA
Table 3 compares the recall performance of various systems. Electra-Large achieved the best overall perfor-
models at Recall. Electra-Large showed a significant mance, with MAP@10 reaching 0.31, MRR at 0.43,
improvement, with Recall@5 reaching 0.34 and with and Recall@10 at 0.46. Other models, such as Flan-T5
Recall@10 reaching 0.46 and Falcon, also demonstrated significant gains across
all metrics, showcasing the effectiveness of fine-tuning
TABLE III strategies. These improvements ensure more accurate
C OMPARISON OF MULTIPLE MODEL VERSIONS BASED ON and contextually relevant responses to user queries, sig-
R ECALL @5 AND R ECALL @10 EVALUATION METRICS .
nificantly enhancing the reliability of the QA system.
Model Rec@5 Rec@10 Future work will explore leveraging cutting-edge ad-
Base Ours Base Ours vancements, such as multi-modal learning and further
dataset diversification, to improve model performance
Electra-Large 0.05 0.34 0.08 0.46
Bloom 0.07 0.25 0.15 0.29 and generalizability. This study contributes to Natu-
Flan-T5 0.01 0.33 0.08 0.38 ral Language Processing research while providing a
Falcon 0.05 0.32 0.08 0.40 powerful tool for Muslims worldwide, facilitating more
effective and accurate engagement with the Holy Quran.
R EFERENCES [20] Malhas, M., et al.: Qur’an qa 2023 shared task: Overview of
passage retrieval and reading comprehension tasks over the
[1] Alawwad, H., Alawwad, L., Alharbi, J., Alharbi, A.: Ahjl at
holy qur’an. In: ArabicNLP-WS 2023, pp. 1–13. Association
qur’an qa 2023 shared task: Enhancing passage retrieval using
for Computational Linguistics (2023)
sentence transformer and translation. In: Proceedings of
[21] Malhas, R.: Arabic question answering on the holy qur’an.
ArabicNLP 2023, pp. 702–707 (2023)
Ph.D. thesis, Ph.D. thesis (2023)
[2] Aljamel, A., Khalil, H., Aburawi, Y.: Comparative study of
[22] Mobassir: Quran qa dataset on kaggle (2024). URL
fine-tuned bert-based models and rnn-based models. case study:
https://www.kaggle.com/datasets/mobassir/quranqa/code
Arabic fake news detection. The International Journal of
[23] Qamar, F., Latif, S., Latif, R.: A benchmark dataset with larger
Engineering and Information Technology (IJEIT) 12(1), 56–64
context for non-factoid question-answering over islamic text.
(2024)
Preprint submitted to Elsevier (2024)
[3] Antoun, W., Baly, F., Hajj, H.: Arabert: Transformer-based
[24] Rashad, M.: Quran-tafseerbook dataset on hugging face (2024)
model for arabic language understanding. arXiv preprint
[25] Sardar, Z.: Reading the Qur’an: The contemporary relevance of
arXiv:2003.00104 (2020). URL https://arxiv.org/abs/2003.00104
the sacred text of Islam. Oxford University Press (2017)
[4] Antoun, W., Baly, F., Hajj, H.: Araelectra: Pre-training text
[26] Sun, L., Xia, C., Yin, W., Liang, T., Yu, P.S., He, L.:
discriminators for arabic language understanding. In:
Mixup-transformer: Dynamic data augmentation for nlp tasks.
Proceedings of the Sixth Arabic Natural Language Processing
arXiv preprint arXiv:2010.02394 (2020)
Workshop, pp. 191–195. Association for Computational
[27] Zheng, H., Shen, L., Tang, A., Luo, Y., Hu, H., Du, B., Tao,
Linguistics (2021). URL
D.: Learn from model beyond fine-tuning: A survey. arXiv
https://aclanthology.org/2021.wanlp-1.21/
[5] Ashor, Q.: 1000 QAs from the Holy Qur’an. Noor Book preprint arXiv:2310.08184 (2023)
(2023). URL https://quranpedia.net/book/451/1/259
[6] Clark, J., et al.: Tydi qa: A benchmark for information-seeking
question answering in typologically diverse languages.
Transactions of the ACL (2020)
[7] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert:
Pre-training of deep bidirectional transformers for language
understanding. In: Proceedings of the 2019 Conference of the
North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pp. 4171–4186.
Association for Computational Linguistics (2019). URL
https://aclanthology.org/N19-1423/
[8] Elkomy, M.: Quran qa 2022 dataset (2022). GitHub Repository
[9] Elkomy, M., Sarhan, A.: Tce at qur’an qa 2023 shared task:
Low resource enhanced transformer-based ensemble approach
for qur’anic qa. In: Proceedings of ArabicNLP 2023, pp.
728–742. Association for Computational Linguistics, Singapore
(Hybrid) (2023)
[10] Essam, M., Deif, M., Elgohary, R.: Deciphering arabic
question: A dedicated survey on arabic question analysis
methods, challenges, limitations and future pathways. Artificial
Intelligence Review 57(9), 1–37 (2024)
[11] GARDEN, Q.B.: List of plants citation in quran and hadith
v5.pdf (2024)
[12] Hamdi, A., Shaban, K., Zainal, A.: A review on challenging
issues in arabic sentiment analysis. Journal of Computer
Science (2016)
[13] Hamdi, A., Shaban, K., Zainal, A.: Clasenti: a class-specific
sentiment analysis framework. ACM Transactions on Asian and
Low-Resource Language Information Processing (TALLIP)
17(4), 1–28 (2018)
[14] Hillman, J., Baydoun, E.: Quality assurance and relevance in
academia: a review. Springer (2019)
[15] ImruQays: Quran-classical-arabic-english parallel texts dataset
on hugging face (2024). URL https://huggingface.co/datasets/
ImruQays/Quran-Classical-Arabic-English-Parallel-texts
[16] Inoue, G., Habash, N.: Camelbert: A language model for
arabic. In: Proceedings of the Sixth Arabic Natural Language
Processing Workshop, pp. 270–278. Association for
Computational Linguistics (2021). URL
https://aclanthology.org/2021.wanlp-1.29/
[17] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy,
O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A
robustly optimized bert pretraining approach. arXiv preprint
arXiv:1907.11692 (2019). URL https://arxiv.org/abs/1907.11692
[18] Liu, Z., Li, Y., Chen, N., Wang, Q., Hooi, B., He, B.: A survey
of imbalanced learning on graphs: Problems, techniques, and
future directions. arXiv preprint arXiv:2308.13821 (2023)
[19] Mahmoudi, G., Eetemadi, S., Morshedzadeh, Y.: A multi-task
transfer learning approach for qur’an-related question
answering. In: Proceedings of the First Arabic Natural
Language Processing Conference (ArabicNLP 2023). ACL
Anthology (2023)