1. Introduction
Research around building a healthcare agent for Hajj pilgrims is essential due to the healthcare challenges faced during the event. Hajj draws millions of Muslims from around the world, leading to a massive congregation in a confined space. This congregation presents significant health challenges, including the risk of infectious diseases spreading rapidly due to close proximity and crowded conditions, as well as the potential for heat-related illnesses, accidents, and other medical emergencies [
1,
2,
3,
4,
5].
A healthcare agent customized for Hajj pilgrims can address several critical needs. Firstly, it will provide pilgrims with accurate and up-to-date information on preventive measures, such as vaccination requirements, hygiene practices, and crowd management strategies, to minimize the risk of disease transmission. Secondly, it will offer guidance on managing common health issues encountered during the pilgrimage, such as dehydration, heatstroke, and musculoskeletal injuries. Additionally, the healthcare agent will facilitate access to medical assistance by providing information on nearby healthcare facilities, emergency contacts, and virtual consultations with healthcare professionals. This highlights the importance of a virtual agent in this context. Given the sheer scale of pilgrims and their diverse backgrounds and languages, a virtual agent offers a scalable and easy-to-access solution to deliver healthcare information and support. Pilgrims can access the virtual agent via their smartphones or other devices, allowing for widespread dissemination of critical health-related guidance. This accessibility is particularly valuable in emergency situations when immediate access to reliable healthcare information and assistance can be life-saving.
The advent of LLMs has revolutionized various natural language processing (NLP) tasks, ranging from text generation to question answering. With the growing complexity of human–computer interactions, there is an increasing demand for LLMs that are not only powerful but also finely tuned to specific domains and trustworthy. LLMs have shown immense value in the medical field. Their uses span from medical writing and documentation to medical education. With their advanced capabilities, they can analyze data thoroughly, assisting in translational medicine and drug development. Additionally, LLMs improve tasks like medical reporting, diagnostics, and treatment planning, resulting in a better overall patient experience [
6,
7]. Following the global pandemic, significant advancements have been observed in the implementation of medical chatbots as conversational agents for patients. Traditionally, these chatbots were designed to handle specific tasks such as answering user queries on specific medical subjects by using a pre-defined database and incorporating user feedback to enhance their responses [
8].
GPT-3 is one of the LLMs that is built upon the Transformer architecture, which was introduced by Vaswani et al. in the paper “Attention is All You Need” [
9]. This architecture relies on self-attention mechanisms to capture long-range dependencies in sequences efficiently. GPT employs an unaltered Transformer decoder, distinguishing itself by the absence of an encoder attention component [
10]. This distinction is evident in the visual representations provided in the diagrams above. Unlike BERT, which utilizes Transformer encoder blocks, GPT, GPT-2, and GPT-3 are constructed using Transformer decoder blocks. Notably, GPT-3 underwent training with extensive Internet text datasets totaling 570 GB, marking it as the most substantial neural network upon its release, boasting an impressive 175 billion parameters, a hundredfold increase from GPT-2. GPT-3 comprises 96 attention blocks, each housing 96 attention heads. As Transformers do not inherently understand the order of tokens, positional embeddings are added to the token embeddings to give the model information about the position of each token. This helps the model understand the order of tokens in a sequence. The attention mechanism which is used in Transformers is called scaled dot-product attention.
The retrieval-augmented generation (RAG) module can be utilized for uncertainty validation. The RAG module serves two primary functions:
Knowledge retrieval: When the model encounters uncertain or ambiguous input, the RAG module retrieves relevant knowledge from specific external resources. This retrieval process enables the model to augment its understanding of the topic at hand and generate more informed responses.
Validation of uncertain text: After retrieving relevant knowledge, the RAG module validates the uncertain text generated by the GPT-3.5 Turbo model against the retrieved information. By cross-referencing the model’s output with external knowledge sources, the RAG module assesses the accuracy and credibility of the generated text, identifying and correcting any inaccuracies or inconsistencies before finalizing the response.
The knowledge retrieval process within the RAG module involves several key steps. Firstly, the module analyzes the input text, identifying keywords, entities, and contextual cues that signal the need for additional information or clarification. Next, it formulates structured queries based on the identified context, aiming to retrieve pertinent knowledge vectors from the inherited databases. These queries are carefully crafted to extract relevant information aligned with the topic at hand, ensuring that the retrieved knowledge is both informative and contextually appropriate.
Research conducted in the field of health promotion and communication consistently underscores the effectiveness of strategic messaging and communication campaigns in driving awareness and promoting positive health behavior among different populations [
11]. By leveraging technology like a medical chatbot, healthcare providers can efficiently manage the influx of inquiries and deliver prompt assistance to those in need. This automated system can contribute to enhancing the overall healthcare experience during mass gatherings like Hajj, ensuring a safer and more organized environment. When individuals have access to a reliable and credible source of information, they are more likely to trust and utilize the health-related information provided by that source. This, in turn, increases their likelihood of effectively reducing potential health threats. A thorough investigation was carried out, involving 280 pilgrims from 28 different countries, in order to assess the perception of health risks associated with the Hajj pilgrimage [
12]. The results of this study demonstrate a decline in the awareness level among pilgrims when it comes to these risks [
13]. Hence, it emphasizes the immediate need for a thorough strategy that includes increasing awareness, implementing surveillance systems, enforcing hygiene standards, providing healthcare services, and promoting international cooperation [
14].
The objective of this study is to present a medical chatbot that specifically caters to the needs of Hajj pilgrims. By using real-world data and integrating a synthetic dataset, an AI-powered multilingual chatbot is developed. This chatbot effectively interacts with pilgrims, providing them with medical advice and addressing their common inquiries. For pilgrims, having access to accurate and current medical information is essential for providing reliable suggestions and treatment options. The data used are sourced from trustworthy sources and are consistently updated. This not only saves time for healthcare providers but also improves the overall experience for pilgrims. The LLM’s advanced natural language understanding capabilities enable it to offer pilgrims highly accurate and relevant information. The contributions of the presented work can be summarized as follows:
Domain-specific fine-tuning of LLM: We fine-tune a large language model (LLM) specifically for the domain of healthcare and cultural sensitivities relevant to Hajj pilgrims. This fine-tuning process ensures that the model is capable of understanding and generating relevant responses within the context of healthcare conversations during the pilgrimage.
Introducing the HajjHealthQA dataset: To facilitate the development and evaluation of our healthcare chatbot, we introduce the HajjHealthQA dataset. This dataset contains a diverse collection of questions, answers, and conversations relevant to healthcare issues faced by Hajj pilgrims. We also employ synthetic data augmentation techniques (
https://github.com/AbeerMostafa/HajjHealthQA-Dataset (accessed on 1 March 2024)).
RAG module for uncertainty validation: We add a retrieval-augmented generation (RAG) module to validate uncertain information provided by the chatbot. This mechanism enhances the reliability and accuracy of the chatbot’s responses by cross-referencing generated text with external knowledge sources.
Training a secondary AI agent on the HealthVer dataset: We train two separate models as part of our framework, one on the HajjHealthQA dataset for Hajj-specific healthcare inquiries and another on the HealthVer dataset for medical information verification. The latter is used to verify that the medical information generated by our chatbot is supported by medical evidence.
Prompt engineering for case study specifics: We employ prompt engineering techniques tailored to the specific case study of building a healthcare chatbot for Hajj pilgrims. This ensures that the chatbot’s responses are optimized for relevance, accuracy, and cultural appropriateness within the context of Hajj-related healthcare scenarios.
Multilingual support: To accommodate the linguistic diversity of Hajj pilgrims, our chatbot offers multilingual support, allowing users to interact in their preferred language.
3. HajjHealthQA Dataset
The success of any healthcare chatbot hinges on the quality and diversity of its dataset. In the context of assisting pilgrims during Hajj, a critical and challenging event that draws millions of people annually, it is paramount to have a robust dataset that encompasses real-world medical scenarios and synthetic data to enhance the chatbot’s performance. In this section, we provide a detailed overview of the two datasets we collected and used in our study, highlighting their sources, characteristics, and the rationale behind their inclusion. The datasets are publicly available at (
https://github.com/AbeerMostafa/HajjHealthQA-Dataset (accessed on 1 March 2024)).
The HajjHealthQA dataset has been obtained from three primary sources: the Ministry of Health (MOH) in the Kingdom of Saudi Arabia (KSA) [
28], the World Health Organization (WHO) [
29], and the Ministry of Hajj and Umrah (MOHU) in KSA [
30]. These sources were accessed on 1 November 2023, as mentioned in the References section. Of the data, 60% are retrieved Q&A from the MOH, 20% from the WHO, and the remaining 20% from the MOHU. The HajjHealthQA dataset includes frequently asked and common questions from users, along with corresponding authoritative answers retrieved from these reputable resources. The MOH resource is the official website of the Ministry of Health in Saudi Arabia, a governmental organization subsidized and funded by the Saudi government. The MOHU portal is a verified website for the Ministry of Hajj and Umrah, a government agency responsible for facilitating the procedures for performing the rituals of Hajj and Umrah in KSA. The WHO, on the other hand, is the United Nations agency that connects nations, partners, and people to promote health, keep the world safe, serve the vulnerable, and spearhead international public health efforts.
The primary foundation of our dataset is comprised of real-world medical questions and answers sourced from reputable platforms, including the official website of the Ministry of Health in Saudi Arabia [
28] and various other trusted healthcare websites [
29,
31,
32]. This dataset is a reflection of the actual health concerns and inquiries that pilgrims may encounter during their Hajj journey. To ensure the authenticity and reliability of the data, we employed a systematic approach to curate the real medical questions and answers. Web scraping techniques using Python version 3.10.12 were applied to extract information from the official Ministry of Health website and other credible health platforms. Only verified and authoritative sources were used to compile a diverse range of questions related to common health issues faced by pilgrims during Hajj.
The real-world medical dataset comprises a vast array of health-related queries, covering topics such as preventive measures, vaccination requirements, common ailments, and emergency protocols [
28,
33,
34]. Each question is paired with its corresponding authoritative answer, often sourced directly from healthcare professionals or government health agencies. The inclusion of real-world medical data serves to ground the chatbot’s knowledge in the practical concerns of pilgrims. By drawing on actual questions posed by individuals preparing for or participating in Hajj, the chatbot can offer relevant and accurate information tailored to the specific health challenges associated with this religious pilgrimage.
Figure 1 shows examples of the collected real-world Q&A data.
In addition to real-world data, we incorporated a synthetic dataset generated using ChatGPT, a powerful language model developed by OpenAI. This dataset was designed to supplement the real medical questions and answers, providing a broader spectrum of potential queries and responses that may not be covered by the real dataset alone. Our synthetic dataset was created by prompting ChatGPT with healthcare-related questions specific to the context of Hajj. We first extracted the topics and general themes from the real questions asked by users. Based on these insights, we then formulated synthetic data that accurately reflected these themes. The model’s responses were then used to generate a diverse set of synthetic Q&A pairs.
Figure 2 shows examples of synthetic Q&A data. This process allowed us to explore hypothetical scenarios, address niche concerns, and anticipate questions that may not have been explicitly addressed in the real medical dataset. The synthetic dataset enhances the chatbot’s versatility by introducing a wide range of hypothetical medical scenarios, preventive measures, and nuanced inquiries that pilgrims might have. It complements the real dataset, providing a more comprehensive knowledge base for the chatbot to draw upon when assisting users.
Our curated dataset comprehensively addresses a spectrum of health concerns prevalent during the Hajj pilgrimage, drawing insights from research papers identifying the most common diseases [
1,
2,
3,
4,
5]. It encompasses detailed Q&A pairs covering respiratory diseases, pneumonia, influenza, asthma, sunlight effects, cardiovascular diseases, heart diseases, heat strokes, skin diseases, and meningococcal diseases. Recognizing the crowded conditions and environmental factors during Hajj, our dataset provides valuable information on respiratory diseases, pneumonia, and influenza, offering guidance on prevention and intervention. For asthma, tailored advice is given to manage and prevent attacks in the pilgrimage setting. The dataset also delves into the impact of prolonged sunlight exposure and addresses associated health risks. With a focus on cardiovascular diseases, heart diseases, and the potential for heat strokes, the dataset equips pilgrims with knowledge on prevention and management. Skin-related queries cover sun protection, hygiene practices, and managing skin conditions during the pilgrimage. Lastly, preventive measures against meningococcal diseases, including vaccination information and symptom awareness, are integrated into the dataset. This collective information ensures that the healthcare chatbot is well equipped to provide nuanced guidance on a wide array of health challenges faced by pilgrims during Hajj.
4. Methodology
In crafting a robust methodology for the development of our healthcare chatbot tailored to the needs of Hajj pilgrims, a strategic approach was undertaken. In this section, we describe the methodology employed in our research, detailing the process of fine-tuning GPT-3.5 Turbo and Llama 3 for domain-specific healthcare applications, utilizing the RAG module for uncertainty validation, evaluating text generation using quality metrics, and training a different model on a specialized dataset for medical information verification.
Our methodology workflow, as illustrated in
Figure 3, begins with the step of model fine-tuning, where we adapt the LLM to the domain of healthcare communication customized for Hajj pilgrims. This process involves iterative cycles of hyperparameter tuning to optimize the model’s performance. Following model fine-tuning, we address the aspect of medical information validation within the generated text. Initially, we add the retrieval-augmented generation (RAG) module, which retrieves knowledge from databases inherited from reputable websites such as the World Health Organization (WHO), the Ministry of Health in the Kingdom of Saudi Arabia (KSA), and the Ministry of Hajj and Umrah. This ensures that our responses are grounded in reliable medical information. Subsequently, we employ another chatbot trained on the HealthVer dataset [
35] to verify the accuracy and consistency of our responses against established medical evidence. This dual-validation approach enhances the reliability and trustworthiness of our chatbot’s output. Additionally, we incorporate prompt engineering techniques to further refine and optimize the relevance and specificity of our responses within the context of Hajj-related healthcare inquiries. Through this comprehensive workflow, we aim to develop a robust and dependable healthcare chatbot capable of providing accurate and evidence-based support to pilgrims.