0% found this document useful (0 votes)
11 views36 pages

Preprint 70005 Submitted

This document outlines a protocol for an exploratory survey aimed at crowdsourcing a training dataset of question-and-answer pairs related to sexually transmitted infections (STIs) to enhance AI-enabled health information tools. The study seeks to develop an open-access dataset that is contextualized for Sub-Saharan Africa, where access to accurate health information is limited. Data collection began in June 2024 and is ongoing, with the final dataset expected to be published in 2025 to improve sexual health literacy and health-seeking behaviors in the region.

Uploaded by

kiambimorry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views36 pages

Preprint 70005 Submitted

This document outlines a protocol for an exploratory survey aimed at crowdsourcing a training dataset of question-and-answer pairs related to sexually transmitted infections (STIs) to enhance AI-enabled health information tools. The study seeks to develop an open-access dataset that is contextualized for Sub-Saharan Africa, where access to accurate health information is limited. Data collection began in June 2024 and is ongoing, with the final dataset expected to be published in 2025 to improve sexual health literacy and health-seeking behaviors in the region.

Uploaded by

kiambimorry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/387376500

Crowdsourcing a Training Dataset of Question-and-Answer Pairs for AI-


Enabled Health Information Tools on Sexually Transmitted Infections:
Protocol for an Exploratory Survey (Prepri...

Preprint · December 2024


DOI: 10.2196/preprints.70005

CITATIONS READS

0 18

8 authors, including:

Elizabeth Oseku Henry Semakula


Infectious Diseases Institute, Makerere University Infectious Diseases Institute, Makerere University
7 PUBLICATIONS 48 CITATIONS 1 PUBLICATION 0 CITATIONS

SEE PROFILE SEE PROFILE

Clare Kahuma Martin Balaba


Makerere University Infectious Diseases Institute, Makerere University
1 PUBLICATION 0 CITATIONS 4 PUBLICATIONS 13 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Henry Semakula on 27 December 2024.

The user has requested enhancement of the downloaded file.


JMIR Preprints Oseku et al

Crowdsourcing a Training Dataset of Question-and-


Answer Pairs for AI-Enabled Health Information Tools
on Sexually Transmitted Infections: Protocol for an
Exploratory Survey

Elizabeth Oseku, Petra Kerubo Mariaria, Henry Semakula, Clare Allelua Kahuma,
Martin Balaba, Agnes Bwanika Naggirinya, Rachel Lisa King, Rosalind Parkes-
Ratanshi

Submitted to: JMIR Research Protocols


on: December 16, 2024

Disclaimer: © The authors. All rights reserved. This is a privileged document currently under peer-review/community
review. Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for
review purposes only. While the final peer-reviewed paper may be licensed under a CC BY license on publication, at this
stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Table of Contents

Original Manuscript....................................................................................................................................................................... 5
Supplementary Files..................................................................................................................................................................... 32
Figures ......................................................................................................................................................................................... 33
Figure 1...................................................................................................................................................................................... 34
Figure 2...................................................................................................................................................................................... 35

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Crowdsourcing a Training Dataset of Question-and-Answer Pairs for AI-


Enabled Health Information Tools on Sexually Transmitted Infections:
Protocol for an Exploratory Survey

Elizabeth Oseku1 MBChB, MSc PH; Petra Kerubo Mariaria1 BSc, MPhil; Henry Semakula1 BIT, EMoS; Clare Allelua
Kahuma1 BSSE; Martin Balaba1 MBChB, PGDip; Agnes Bwanika Naggirinya1 MBChB, MSc, MMED; Rachel Lisa
King1, 2 BA, MPH, PhD; Rosalind Parkes-Ratanshi1, 3 MBBS, MA, DFSRH, DipGUM, PhD
1
Academy for Health Innovation Uganda Infectious Diseases Institute Makerere University Kampala UG
2
Department of Epidemiology and Biostatistics, Institute for Global Health Sciences University of California San Fransisco US
3
Department of Psychiatry University of Cambridge Cambridge GB

Corresponding Author:
Elizabeth Oseku MBChB, MSc PH
Academy for Health Innovation Uganda
Infectious Diseases Institute
Makerere University
P.O. Box 22418
Kampala
UG

Abstract

Background: Sexually transmitted infections (STIs) are a significant public health concern, particularly in Sub-Saharan Africa,
where their prevalence remains high. Promoting awareness and reducing stigma are essential strategies for addressing this
challenge, but those affected often have limited access to accurate and culturally appropriate health information. Innovative
solutions are therefore essential to enhance sexual health literacy and encourage informed health-seeking behaviors. AI-enabled
tools like chatbots have emerged as promising avenues for delivering accurate and accessible health information. However, their
potential is constrained by the lack of contextualized datasets, which are crucial for ensuring their effectiveness and relevance to
diverse populations.
Objective: This study therefore aims to develop an open-access, contextualized dataset of question-and-answer pairs on sexual
health and STIs to support development and training of digital and AI-enabled health information tools.
Methods: Using a crowdsourcing approach, questions are being collected from participants aged 15 years and older via online
platforms, paper-based submissions, and in-person interactions at public events across Sub-Saharan Africa. Each question will be
anonymized and reviewed by medical professionals, who will provide accurate, evidence-based answers. The dataset will then
undergo processing, including cleaning and tagging for AI training, ensuring adherence to FAIR principles. The final dataset will
be published as open access.
Results: Data collection began on 12th June 2024 and is ongoing. The data collection process was piloted in Kigali. Data is
undergoing cleaning and processing to enhance its utility for AI applications. The final dataset will be published as open access
in 2025, contributing
Conclusions: This study represents a significant step toward developing accessible evidence-based health information tools, with
the potential to increase literacy levels on STIs and improve health-seeking behaviours. The Q&A dataset from this study will
enable the development of AI tools to address critical gaps in sexual health education thus fostering informed decision-making.
(JMIR Preprints 16/12/2024:70005)
DOI: https://doi.org/10.2196/preprints.70005

Preprint Settings
1) Would you like to publish your submitted manuscript as preprint?

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Please make my preprint PDF available to anyone at any time (recommended).


Please make my preprint PDF available only to logged-in users; I understand that my title and abstract will remain visible to all users.
Only make the preprint title and abstract visible.
No, I do not wish to publish my submitted manuscript as a preprint.
2) If accepted for publication in a JMIR journal, would you like the PDF to be visible to the public?
Yes, please make my accepted manuscript PDF available to anyone at any time (Recommended).
Yes, but please make my accepted manuscript PDF available only to logged-in users; I understand that the title and abstract will remain v
Yes, but only make the title and abstract visible (see Important note, above). I understand that if I later pay to participate in <a href="http

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Original Manuscript

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Original Paper

Crowdsourcing a Training Dataset of Question-and-Answer Pairs for AI-Enabled


Health Information Tools on Sexually Transmitted Infections: Protocol for an
Exploratory Survey
Elizabeth Oseku1, MBChB, MSc PH; Petra Kerubo Mariaria1, BSc, MPhil; Henry Semakula1, BIT,
EMoS; Clare Kahuma Allelua1, BSSE; Martin Balaba1, MBChB, PGDip; Agnes Bwanika
Naggirinya1, MBChB, MSc, MMED; Rachel Lisa King 1,2, BA, MPH, PhD; Rosalind Parkes-
Ratanshi1,3, MBBS, MA, DFSRH, DipGUM, PhD.
1
Academy for Health Innovation Uganda, Infectious Diseases Institute, Makerere University,
Kampala, Uganda
2
Department of Epidemiology and Biostatistics, Institute for Global Health Sciences, University of
California, San Francisco, California, USA
3
Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

Corresponding Author:
Elizabeth Oseku, MBChB, MSc PH
Academy for Health Innovation Uganda
Infectious Diseases Institute
Makerere University
Kampala, Uganda
Email: [email protected]
Phone: +256782048875
ABSTRACT

Background: Sexually transmitted infections (STIs) are a significant public health concern,
particularly in Sub-Saharan Africa, where their prevalence remains high. Promoting awareness and
reducing stigma are essential strategies for addressing this challenge, but those affected often have
limited access to accurate and culturally appropriate health information. Innovative solutions are
therefore essential to enhance sexual health literacy and encourage informed health-seeking
behaviors. AI-enabled tools like chatbots have emerged as promising avenues for delivering accurate
and accessible health information. However, their potential is constrained by the lack of
contextualized datasets, which are crucial for ensuring their effectiveness and relevance to diverse
populations.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Objective: This study therefore aims to develop an open-access, contextualized dataset of question-
and-answer pairs on sexual health and STIs to support development and training of digital and AI-
enabled health information tools.

Methods: Using a crowdsourcing approach, questions are being collected from participants aged 15
years and older via online platforms, paper-based submissions, and in-person interactions at public
events across Sub-Saharan Africa. Each question will be anonymized and reviewed by medical
professionals, who will provide accurate, evidence-based answers. The dataset will then undergo
processing, including cleaning and tagging for AI training, ensuring adherence to FAIR principles.
The final dataset will be published as open access.

Results: Data collection began on 12th June 2024 and is ongoing. The data collection process was
piloted in Kigali. Data is undergoing cleaning and processing to enhance its utility for AI
applications. The final dataset will be published as open access in 2025, contributing to the
development of AI-driven health tools and promoting public health literacy.

Keywords: Sexually Transmitted Infections; Artificial Intelligence; Health Information; Dataset;


Crowdsourcing

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

INTRODUCTION

Background and Rationale

The World Health Organization (WHO) defines sexual health as a state of physical, emotional,
mental and social well-being in relation to sexuality; it is not merely the absence of disease,
dysfunction or infirmity [1].
Worldwide, more than 1 million Sexually Transmitted Infections (STIs) are acquired every day. In
2020, the WHO estimated 374 million new infections, among people aged 15-49 years with 1 of 4
STIs: chlamydia (129 million), gonorrhoea (82 million), syphilis (7.1 million) and trichomoniasis
(156 million) [2]. Viral STIs such as Herpes Simplex and Human Papillomavirus are also highly
prevalent. STIs can cause significant morbidity and mortality to infected persons negatively
impacting fertility, increasing the risk of Human Immunodeficiency Virus (HIV) infection, causing
cancer, and causing pregnancy complications and even newborn morbidity [2]. These effects are
compounded by stigmatization.
The African Region is particularly affected with a high prevalence of these infections [3]. The
burden of STIs in Sub-Saharan Africa (SSA) is high and increasing [4,5], accounting for
approximately 40% of the global burden [6] with an incidence rate of 241 per 1000 among adults
aged 15–49 years [7,8]. In Uganda, the Demographic Health Survey 2016, reports that 24% of
women and 14% of men (all aged 15-49 years) report having had symptoms of an STI in the last 12
months [9]. The high prevalence of STIs is contributed to by challenges such as lack of awareness,
stigma surrounding STIs, and poor access to medical care. Young adults are most vulnerable to
infection because they engage in risky practices and lack adequate knowledge of STIs [10]. Young
people especially face considerable challenges in accessing Sexual Reproductive Health (SRH)
services due to stigma [11].
As a key strategic action for primary prevention of STIs, WHO recommends the provision of
comprehensive, accurate and culturally relevant information and education that promotes sexual
health and wellbeing [12]. Information, education and counselling are effective approaches to
improve health-seeking behaviour for STIs since people become better able to recognize the signs
and symptoms of disease [2].
Rationale

In recent years, internet access in Africa has been expanding rapidly due to improvements in
infrastructure, including increased electricity availability, and the widespread adoption of digital
technologies. The continent has seen significant growth in both internet connectivity and mobile
phone usage, creating opportunities for data-driven strategies and innovations. According to the

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

GSMA Mobile Money Economy Sub-Saharan Africa 2024 report, internet penetration in SSA rose to
27% by the end of 2023, with over 527 million mobile service subscribers (44% of the population)
across the region, a figure expected to rise in the coming years [13]. In 2023, Uganda had an internet
penetration rate of 11.77 million (24.6%) with 2.05 million (4.3%) social media users [14]. As of
January 2024, this number had increased by 1.2 million with 13.30 million internet users and a
penetration rate of 2% of the total population [15]. This increasing access to the internet, combined
with the rise of social media platforms offers an opportunity to use these digital platforms to
disseminate information about STIs on a large scale.

A chatbot is a computer program that mimics human conversation through voice or text interactions,
usually online [16,17]. Chatbots have been found to be acceptable for use in public health due to
their convenience. For example, a mixed-methods study comprising of interviews and surveys,
conducted by Chang et al., in central Taiwan found that users’ attitudes and subjective norms were
significantly and positively associated with their intentions to use medical chatbots [18]. The paper
also suggests that the use of chatbots was acceptable due to their accessibility and anonymity [18].
Additionally, a study by Miles et al., conducted among adults in the UK suggests that chatbot
acceptability might be higher for stigmatized health issues [19]. This was found to be particularly the
case for illnesses that have a high level of perceived stigma such as STIs, as they offer greater
anonymity than face-to-face consultations. This was highlighted by an increased willingness to
disclose sensitive health information to chatbots in comparison to healthcare workers [19]. Research
also shows that medical practitioners recommend chatbot use for the provision of medical
information. In the US, a study involving a hundred general practitioners (GPs) showed that more
than half of the physicians (54%) agreed that health chatbots could help patients better manage their
health and improve access and timeliness to care [20].
Chatbots can understand and interact with human language through Natural Language Processing
(NLP). NLP is a branch of computer science and artificial intelligence (AI) that uses machine
learning to help computers understand and communicate using human language [21]. Chatbots need
to be trained with a knowledge base relevant to the subject area so that they can adequately respond
to the queries of users. However, access to large datasets that have been adapted to the diverse
linguistic, cultural, epidemiological, and socio-economic realities of the African continent is a
challenge [22-24]. Therefore, to harness the opportunity that medical information chatbots present as
a public health tool, there is a need to create appropriate datasets that reflect the region’s unique
needs. These datasets must include question-and-answer pairs that cover topics relevant to the
potential users, accounting for factors like local languages such as slang and colloquial expressions,
particularly around sensitive subjects like SRH. Generally, the terminology around sex and SRH

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

tends to use population-specific words that change frequently and vary by age group, location,
education level, and other social factors, with people often using it to discuss private/intimate
matters like STIs. The training datasets should also adapt to the users’ varying educational levels and
lived experiences. This will ensure the chatbots offer accessible, accurate and empathetic responses.
By collecting these nuances, the proposed dataset will contribute to the development of chatbots that
are both effective and relevant in addressing SRH issues across the continent.
To maximize applicability in the African setting, the proposed approach for data collection for the
datasets is crowdsourcing from populations across Africa [25]. Crowdsourcing is a shared
computing method that taps into the collective knowledge and skills of people to solve problems
that are difficult for computers but easily handled by humans, such as labelling data, speech
recognition, and software development [26]. This approach has been found to be beneficial in AI
because it enables the collection of large amounts of diverse data within a short period of time,
which reduces bias and increases data richness [27,28].
While many datasets used by large language models (LLMs) or generative AI systems are not
formally verified by medical professionals, the dataset we aim to construct will be carefully
curated and validated by healthcare professionals to ensure its medical accuracy and reliability.
This is crucial when dealing with sensitive topics like sexual health and STIs, where incorrect or
misleading information could have serious consequences [29]. Crowdsourcing will therefore allow
us to gather authentic insights directly from the populations most affected, ensuring that the data
reflects the actual concerns and questions of users, which healthcare workers may sometimes
overlook [25]. By combining crowdsourced data with expert validation, we ensure that the chatbot
delivers accurate, medically sound information. This initiative ensures that relevant information is
accessible to a broader audience, thereby enhancing public health literacy.
Study Objectives and Endpoints

Objectives

The study’s general objective is to engage a wide range of people from across Africa to collect
relevant context-specific questions about sexual health, and to use evidence-based medical
knowledge to develop answers to the questions. This will therefore produce an open-access,
contextualized question-and-answer-pair dataset on sexual health in English which can be used to
train AI-enabled health information tools.
Primary Objectives
1. To collect contextualized English language questions on sexual health and STIs from the public
using the internet and public events in SSA over a period of 6 months.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

2. To provide accurate, evidence-based answers, based on the WHO Guidelines, to questions


collected on sexual health and STIs from the public in English in SSA.
3. To process and curate the question-and-answer (Q&A) pairs into a training dataset for AI-
enabled sexual health information tools.
4. To provide the public with an open-access, contextualized training dataset of Q&A pairs on
sexual health and STIs in English.
Study Endpoints

Primary Endpoint
A complete question-and-answer dataset processed and curated for use in AI-enabled information
tools.
Secondary Endpoint
At least 5000 questions on sexual health collected from the public through crowdsourcing via the
internet and at public events. At least two accurate answer formats provided for each question
collected on sexual health.

METHODS

Study Setting

This study will be conducted as part of a Network activity for the Hub for Artificial Intelligence for
Maternal, Sexual and Reproductive Health (HASH) [30]. As of January 2023, the HASH Network is
composed of ten subgrantees from seven African countries. These subgrantees are students, start-ups
and established organizations who won grants from HASH to research and develop Artificial
Intelligence innovations through a competitive Request for Applications. Three of these subgrantees
are developing chatbots to relay information about sexual health, in particular, STIs in different
settings in Ethiopia, Kenya and Nigeria. To improve the accuracy and usefulness of the subgrantees’
chatbots, HASH will aim to prepare and share a Q&A pair dataset. This dataset will also be made
open access for the wider AI community who may be developing AI-enabled information tools.
Study Design

This study is an exploratory survey. Data collection will be done through crowdsourcing from
members of the public including but not limited to students, colleagues, and professionals, aged 15
years and above from different locations and methods to ensure diversity in responses. This will
include those who are willing to complete an online survey about sexual health, as well as those who
attend public events such as conferences, clinics, meetings, and universities.
Participants will submit questions related to any aspect of sexual health, particularly STIs, through

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

three modes: an online platform developed on Slido (Cisco, California) [31], a widely used tool that
allows for anonymous question submission and feedback, anonymous question submission on paper,
or face-to-face interaction with a medical doctor. A web link to the online platform will be shared
with online communities for their submission. At physical public events or locations, a designated
area will be set up for question collection and a link to the online platform will also be provided. A
medical doctor will be available to provide accurate answers to questions posed via the three modes.
Figure 1 shows the Slido interface used for anonymous question submission and real-time
interaction.

Figure 1: Slido interface.

Therefore, there will be no limit to the number of participants enrolled in the study to pose questions,
how many questions a participant can pose, or the number of times a participant can engage to ask
questions. All questions will be posed anonymously through the Slido platform which ensures that
the participants’ identities are not captured when submitting questions. Slido will assign each
participant ‘Anonymous’ as their name, so their personal details remain confidential, creating an
environment for open and honest participation. Figure 2 illustrates the study design.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Total N: Screen potential participants by inclusion and exclusion criteria. Obtain


informed Enrollment consent/assent via paper/Slido.

Submission of questions by study participants


(Via online portal, paper submission or face-to-face consultation)

Real-time online recording of physically submitted questions and online input of


accurate evidence-based answers

6-month Closure of online portal and extraction of online database of Q&A pairs
time point

Processing of database including:


Co-development of variations of questions and
answers by a team of doctors
Organization of question-and-answer pairs into
useable format for chatbots
Publication of dataset on open access

Figure 2: Schematic of Study Design

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Participants' Selection and Withdrawal

The following eligibility criteria are designed to select participants for whom
protocol treatment is considered appropriate.
Inclusion Criteria

Participants must meet the following inclusion criteria to be eligible for enrollment into the
study:
1. Must be aged at least 15 years

2. Must be able to speak, write and comprehend English.


3. Must be willing to give consent or assent for their question(s) on sexual health to be
anonymously discussed, responded to and shared publicly.
4. Participants who are willing and able to comply with the determined modes of
crowdsourcing.
5. Evidence of personally signed informed consent or assent in the case of minors,
indicating that the participant (or a legal representative) has been informed of all
pertinent aspects of the study.

Exclusion Criteria
Participants presenting with any of the following will not be included in the study:

1. People who are deaf, dumb or blind and concurrently unable to read and write.
2. People who are mentally impaired or very sick.
Strategies for Recruitment and Retention

Invitations for participants to contribute to crowdsourcing will be extended through


advertisement via internet media or through signposts and word of mouth at physical gatherings
such as relevant health or AI conferences, meetings, and other public events/locations. The link
to the online platform will be shared widely on the social media platforms of HASH and its
partners and networks so as to attract as many participants and questions as possible. At physical
public events or locations, a designated area with medical doctors will be set up for question
collection and advertised by signposting and word of mouth. A link to the online platform will
also be provided at the physical locations. For an informative and robust dataset, Q&A sessions
from previously held SRH conferences will be added.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Study visits will be self-directed in that after a participant provides informed consent or assent as
the preliminary step, they are eligible to return as many times as they desire to their preferred
data collection mode (online platform, paper, face-to-face consultation) to submit anonymous
questions, check answers as provided by qualified health workers, or review other participants’
anonymous questions and answers.
Even though enrolled participants will submit their questions through various modes; an online
platform, through the provision of anonymous questions on paper, or through face-to-face
interaction with a medical doctor for the benefit of all participants and to centralize the data, all
questions collected outside of the online platform will also be recorded on the platform.
Participants will be able to ask more than one question through whichever mode they choose and
may return multiple times to ask questions whenever they desire. Since the study aims to
promote health literacy, all participants will be able to view the answers to their questions as
provided by a qualified health worker when they visit the platform. They will also be able to
view questions and answers from other participants and contribute to discussions so as to
stimulate more questions.
Participants will not receive any financial compensation for their participation, but they will
benefit from receiving an accurate evidence-based answer to their question from a qualified
health worker. Since access to accurate STI information is the major concern of this study, every
effort will be made to give timely and accurate responses. All questions and their respective
answers will be added to the online portal to make them accessible to all participants.
There will be no follow-up of study participants and all questions will be logged anonymously.
From its launch, the online platform will be open to questions for 6 months. While participants
will be able to interact with the online platform independently, there will be contact information
on the platform and on the informed consent/assent document through which participants can
communicate with the study team in case of any challenges.
Withdrawal

Participants may withdraw from the study at any time at their own request, or they may be
withdrawn at any time at the discretion of the investigator or sponsor for safety or behavioral
reasons, or the inability of the participant to comply with the protocol-required procedures.
Reasons for Withdrawal

An investigator may terminate a study participant’s participation in the study if the participant

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

meets an exclusion criterion (either newly developed or not previously recognized) that
precludes further study participation.
Handling of participant withdrawals

Since there are no follow-up visits, participant withdrawal will not necessitate an effort to
contact the participant. If the participant withdraws from the study, no further evaluations should
be performed, and no additional data should be collected. The study team may retain and
continue to use any data collected before such withdrawal of consent.
However, should the participant wish to withdraw their question after submission, if they entered
it directly onto the online platform, they are free to log back in and do so themselves. However,
if their question was submitted on paper or through a face-to-face consultation, the participant
should contact the study team and request withdrawal of their particular question.
Assessment of Safety

Safety
Safety monitoring for this study will focus on unanticipated problems involving risks to
participants, including unanticipated problems that meet the definition of a serious adverse event
(SAE).

Unanticipated Problems
Unanticipated problems involving risks to participants are defined as, any incident, experience,
or outcome that meets all of the following criteria:
Unexpected in terms of nature, severity, or frequency given (a) the research procedures that are
described in the protocol-related documents, such as the Institutional Review Board (IRB)-
approved research protocol and informed consent/assent document; and (b) the characteristics of
the participant population being studied;

Related or possibly related to participation in the research (possibly related means there is a
reasonable possibility that the incident, experience, or outcome may have been caused by a
procedure involved in the research); and
Suggests that the research places participants or others at a greater risk of harm (including
physical, psychological, economic, or social harm) than was previously known or recognized.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Adverse Event Reporting

Adverse Events
As this is an exploratory survey collecting anonymous questions about sexual health, we do not
anticipate any adverse events.

Since the study does not test any investigational drug or intervention, and causality assessment is
not applicable, any possible adverse events will also be reported to the IRB and the Uganda
National Council of Science and Technology (UNCST) annually in aggregate.
The investigators will generate and submit annual reports summarizing these adverse events.

Data Processing

There will be no participant sample size but the study will aim to create a dataset of 5,000
English Q&A pairs. All submitted questions will receive accurate evidence-based answers from
medical doctors. A team of at least two medical doctors will collaborate to develop appropriate
answers utilizing information available in the public domain from reputable medical sources.
The answers will be phrased to respond to questions comprehensively as a chatbot would. The
medical doctors will work closely with a data scientist to ensure that adequate numbers,
variations and formats of answers are created to support different scenarios that could be
encountered by a chatbot. During this process, some additional variations of questions may be
created by the team to make the dataset more comprehensive.
When data collection is complete, the database of Q&A from the online portal will be extracted
and processed into a training dataset that can be utilized for the development of AI-enabled
information tools like chatbots. This processing will include; labeling each data entry with
relevant tags (e.g., prevention, treatment, symptoms) to facilitate easy retrieval of information.
The dataset will be in JSON format to enable Fast Healthcare Interoperability Resources (FHIR)
Standard [32] for sharing the dataset, where relevant and possible. This will ensure that the
dataset can be easily integrated into existing health information technology (IT) systems. Data
cleaning and annotation will also be done with Open Refine to speed up these processes and
ensure high-quality results.

Quality Control and Quality Assurance

During the study, periodic monitoring may be conducted to ensure that the protocol and Good

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Clinical Practices (GCPs) are being followed. The monitors may review source documents (the
online data collection platform) to confirm that the data recorded in the training dataset is
accurate.

All participant questions will be permissible on the online platform. The medical doctors will
source their answers from reputable medical sources such as the WHO, Centre for Disease
Control (CDC), Medscape, UpToDate, etc. For information that may be country-specific, the
medical doctors shall source responses from official governmental or nationally recognized
documents. Answers to questions will be co-developed by a team of at least three medical
doctors in close collaboration with a Data Scientist to ensure appropriateness for use by a
chatbot.
Processing of the data into a training dataset will be done by a qualified Data Scientist who will
be a part of the study team from the design of the online platform. This will ensure that the data
collected and processed will be appropriate for the use case.

Additionally, the study site may be subject to review by the IRB and/or to inspection by
appropriate regulatory authorities from time to time.
To ensure data quality, quality control will be done as follows:

● Data validation – This will be through checking for missing values, duplicates and errors
in data entry.

● Data cleaning – This will be done by removing inaccurate, inconsistent, or irrelevant data.
Any outliers or anomalies in the dataset will also be identified.

● Data standardization – This will be through systemizing the data format and units to make
it consistent across all data samples.

● Data transformation – This will include various rephrasing approaches to aid in generating
a quality set of questions that will be answered by multiple medical doctors and nurses.
Data Handling/Record Retention

When the dataset is completed, in adherence to FAIR (Findability, Accessibility, Interoperability,


Reusability) principles [33], we shall use Havard Dataverse to host the dataset. Harvard
Dataverse is a free, self-service data repository open to all researchers provided by any
discipline, both inside and outside of the Harvard community, where researchers can share,
archive, cite, access, and explore research data. Each Dataverse collection is a customizable

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

collection of datasets (or a virtual repository) for organizing, managing, and showcasing datasets
[34].
Record Retention

The investigator has ultimate responsibility for the collection and reporting of all questions
entered through the different data collection methods – online, hard copy case report forms
(CRFs) and face–to–face consultations (source documents) and ensuring that they are accurate,
authentic/original, attributable, complete, consistent, legible, timely (contemporaneous),
enduring and available when required. Any corrections to entries made in the source documents
must be dated, initialized and explained (if necessary) and should not obscure the original entry.
To enable evaluations and/or audits, the investigator agrees to keep records, including the
identity of all participating patients (sufficient information to link records, e.g., CRFs), all
original signed informed consent/assent documents, copies of all safety reporting forms, source
documents, and detailed records of treatment disposition, and adequate documentation of
relevant correspondence (e.g., letters, meeting minutes, telephone calls reports).

Investigator records must be kept for as long as required by applicable local regulations (UNCST
generally requires a minimum of 5 years). When more than one requirement can be applied,
records must be maintained for the longest period provided.
Confidentiality

Data will be collected through online submission, questions on paper, or through face-to-face
consultation with medical doctors. Each participant will submit at least one question and all
questions will be recorded anonymously. Questions submitted via paper and face-to-face
consultation will also be transcribed onto the online platform for record purposes. Access to the
online platform will be password protected and limited to only study staff.

Hard copy consent/assent forms, handwritten questions and any other source documents will be
kept in locked cabinets. Questions asked by physical participants will be entered into the online
platform regularly to ensure that it is up to date. The database for the online platform will be
stored in a secure cloud location owned by the Infectious Diseases Institute (IDI). Access to the
database will be given to authorized personnel only (members of the immediate study team) and
a log of authorized personnel will be stored in the trial master file. No participant-identifying
information will be disclosed in any publication or at any conference activities arising from the

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

study.

Protocol Deviations

A protocol deviation is any noncompliance with the clinical study protocol, GCP, or Manual of
Procedures requirements. Noncompliance may be on the part of the participant, investigator, or
study staff. As a result of deviations, corrective actions are to be developed by the study staff
and implemented promptly.

All deviations from the protocol must be addressed in study participant source documents and
promptly reported to the local IRB, according to their requirements.

Ethics

Ethical approval has been obtained from the Infectious Diseases Institute Research Ethics
Committee (IDI-REC-2024-91) and the Uganda National Council for Science and Technology
(HS5173ES). The study also obtained a waiver of parental consent for minors from the Research
Ethics Committee because even though the study posed minimal risk to participants, in the
African cultural setting, parents may feel uncomfortable consenting to their children being
requested to ask questions regarding Sexually Transmitted Infections. Therefore, participating
minors will enroll with individual assent only.

Ethical Conduct of the Study


The study will be conducted in accordance with legal and regulatory requirements, as well as the
general principles outlined in the International Ethical Guidelines for Biomedical Research
Involving Human Participants [35] and the Declaration of Helsinki [36]. In addition, the study
will be conducted as per the protocol, GCP guidelines, and applicable local regulatory
requirements and laws.
All questions will be submitted anonymously and there will be no way to trace a question back
to a participant.
Participant Information and Consent

Participants will provide written informed consent/assent before accessing the online platform,
writing their question(s) on paper and/or before having a face-to-face consultation with a
Medical Doctor. This consent/assent will be given via online forms or hard copy forms. The
record of consent/assent will be stored electronically for the online forms or via hard copy

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

records for those participants who will sign physically.

All questions will be submitted anonymously, and parties will ensure the protection of participant
personal data and will not include participant names on any forms, reports, publications, or in
any other disclosures, except where required by legal requirements.
The informed consent/assent document used in this study, and any changes made during the
study, will prospectively be approved by the IRB. The investigator will retain the original of each
participant's signed consent/assent document.

Publication of Study Results

The final developed training dataset will be published in an open-access repository to ensure
wide access [34]. We aim to publish in high-impact, peer-reviewed journals with a focus on open
access. We also aim to present the research findings at both local and international conferences.
All parties will ensure the protection of participant personal data and will not include participant
names on any forms, reports, publications, or in any other disclosures, except where required by
legal requirements.

RESULTS

Recruitment and data collection for this study began in June 2024 and will continue until
December 2024. Piloting of the process was done at the AfricAI conference in Kigali, Rwanda
held in 2023. A quick-response (QR) code for Slido was generated and shared with participants
at a booth and on fliers around the conference venue. Further, the QR code was printed on small
chocolates in session rooms. These advertising efforts played a crucial role in raising awareness
and expanding the reach of the pilot study, highlighting the value of sustained promotional
strategies for maximizing participation. During this pilot medical doctors were available on-site
to answer questions in real time to those posing questions. During the pilot, 140 questions were
collected, demonstrating the feasibility of the crowdsourcing approach for gathering data. The
questions were broader and wide-ranging than anticipated for example, including questions
asking for relationship advice. The pilot revealed the need for a large, supportive team to
actively engage with participants, answer questions promptly, and monitor the process. This
team-based approach ensured that participants felt encouraged and supported throughout and
professionals could consult internally. The importance of real-time monitoring and support
during data collection was also evident, as it enabled the resolution of any challenges

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

encountered and ensured the smooth execution of the pilot. The pilot served as a crucial testing
ground for refining recruitment strategies, participant engagement, and logistical requirements,
paving the way for the success of the main study.

Integration of pilot lessons and current progress

The lessons learnt from the pilot study were incorporated into the main study, which has since
collected over 5,360 question-and-answer pairs, including the 140 questions from the pilot
phase, as of 5th December 2024 from various sources such as online submissions, paper-based
entries from public events and face-to-face consultations with medical doctors. The collected
data is simultaneously undergoing a rigorous processing phase, which involves cleaning and
tagging to facilitate their use in training AI tools. The data has been grouped into eight STI
disease areas comprising the common STI topics; General STIs, HIV, Syphilis, Gonorrhea,
Chlamydia, Hepatitis B, Trichomoniasis and Herpes Simplex Virus. The questions have been
categorized into key themes such as prevention, treatment, symptoms, and other subcategories to
enhance usability for AI-enabled health information systems. Eleven workshops have been held
with health workers of various cadres including medical doctors, nurses, counsellors and
pharmacists to develop accurate answers for each STI topic and category, as well as to add
questions to the dataset from their experience in the clinic.

Upon completion, the dataset will be hosted on the Harvard Dataverse open-access repository
[34], with publication planned for 2025. This effort aims to create a valuable resource for AI
developers and public health initiatives, particularly in the African context.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

DISCUSSION

Strengths
This study has several strengths that contribute to its innovation, reliability and broader impact. It
leverages an innovative crowdsourcing design to gather questions from a diverse and
geographically dispersed participant pool, ensuring data diversity and relevance. This will allow
the inclusion of real-world concerns enhancing the cultural and contextual accuracy of the
resulting dataset. The involvement of health workers of different cadres in the processing of the
dataset will also ensure medical accuracy of the answers to the collected questions. This aspect
of collaborative expertise is a cornerstone of this study and will enhance the reliability and utility
of the final dataset.
Additionally, anonymity and privacy are emphasized, encouraging participants to share authentic
and nuanced data on sensitive topics such as STIs without fear of judgment. The study will also
publish its dataset as an open-access resource, adhering to FAIR principles 33 which will support
HASH subgrantees in their chatbot development and benefit the broader AI and public health
community. Furthermore, the study offers a scalable model that can be adapted to other health
domains or regions, leveraging the increasing internet penetration across Africa. Together, these
strengths enhance the study’s potential impact and utility.

Limitations

Despite its innovative approach, the study faces several limitations. Selection bias may arise as it
excludes individuals without internet access, English literacy, or the confidence to ask questions
about sexual health. This could potentially limit the dataset’s representativeness for marginalized
populations. The exclusive focus on English further narrows the scope, overlooking the unique
concerns of non-English speakers. Future studies should therefore consider incorporating
multilingual support to incorporate the questions and opinions of non-English speakers.
Additionally, there is a risk of misinterpretation as some questions may not fully capture the
participants’ intended concerns leading to mismatched answers. The study's anonymous nature,
while ensuring privacy, limits opportunities for clarification. Lastly, the six-month data collection
window may not be sufficient to account for changes in public engagement over time, potentially
affecting the comprehensiveness of the dataset. These limitations underscore areas for
improvement in future iterations of the study.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Potential risks and benefits

Risks
Overall, this study is considered to have minimal risk of injury to participants. In addition, the
study has been designed to provide multiple options for participants to ask questions so as to
maintain privacy and anonymity, and to minimize embarrassment and stigma.

Benefits
The dataset on sexual health that will be developed through this study will be used to enrich the
training data of the HASH subgrantees who are developing chatbots. This will enhance the
ability of their chatbots to respond to various queries that will be presented during interaction
with end users and therefore support these members of the HASH Network towards making their
research successful.
Additionally, because the dataset will be open access, it will be made available to all users who
desire a training dataset for their AI-enabled information tools on sexual health. This will
contribute towards solving a significant need for African AI developers which is large datasets. It
will also contribute towards combating misinformation and improving public health literacy in
Africa through increasing access to accurate medical information.
Finally, during the crowdsourcing process, participants will receive accurate medical answers to
their questions and have access to other questions and answers gathered through the
crowdsourcing process. This will improve the knowledge of participants.

Comparison with Prior Work

This study builds upon the existing literature and initiatives that investigate the use of AI tools—
especially chatbots—within the public health domain, addressing challenges like STIs. Prior
work, like the review by Phiri and Munoriyarwa (2023) [24,37], emphasizes the promise that
health chatbots hold in Africa for making health information more accessible. Yet, they and other
authors frequently point to a common limitation that holds back this promise, not having enough
quantity or quality of contextualized data to train the chatbots. Similarly, DataKind UK and
Raheema [22] discuss the importance of "decolonizing" the data that AI is trained on so that it
reflects the lived experiences of groups that are underrepresented in both the health and tech
sectors. Public health can benefit from the greater accessibility and reduced stigma that chatbots
offer, especially when it comes to sensitive topics like STIs [19]. When it comes to the

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

acceptability of chatbots in a medical context, Chang et al. [18] point out that these
conversational agents are viewed more favorably when they are perceived as reliable and that
much clinical work is itself quite conversational.

Although there have been some strides made recently, there remains a significant absence of
datasets that are fine-tuned to the many forms of African speech and the continent's almost 2,000
different languages. This is pointed out in studies by Javaid [28] and Ochieng and Awosiku [23].
The absence of such datasets in a key demographic region means that AI tools will perform
comparatively poorly in this part of the world. Although Gottlieb et al. [7] and the WHO [1] have
brought attention to the STI epidemic in SSA, condemning its high occurrence and public health
impact, there has not been much focus on the potential of AI to advance the awareness and
prevention of these infections. This absence seems largely related to the unavailability of
comprehensive and open-access training datasets. Our study attempts not only to outfit AI with
the means to be more useful in this dimension but also to shed light on the sexual health concerns
that the populations of this region find most urgent. In contrast to earlier datasets that frequently
depended on clinical or structured health information, this research leverages a participatory
methodology. This ensures that the datasets we create are inclusive and contextually relevant, as
we engage in a meaningful way with diverse communities.

In summary, this study contributes significantly to the field of innovation by developing a


comprehensive Q&A dataset that is tailored to different cultures for training AI tools to address
STIs. By employing crowdsourcing as recommended by B.S. et al. [27] and the Implementation
Research and Innovation Support [25], the study ensures the inclusion of genuine, community-
specific concerns, capturing the diverse linguistic and contextual nuances. This approach
addresses some important gaps in our knowledge and offers a model that can be scaled up and
down, as necessary, to apply to other health areas and geographical locations.

CONCLUSION

This study represents a significant step toward developing accessible evidence-based health
information tools, with the potential to increase literacy levels on STIs and improve health-
seeking behaviors. The Q&A dataset from this study will enable the development of AI tools to
address critical gaps in sexual health education thus fostering informed decision-making. The
open-access nature of the dataset will encourage collaboration while providing a resource for

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

researchers and developers worldwide.

Future projects should prioritize expanding linguistic diversity and accessibility for underserved
populations to ensure broader applicability and equity in health information dissemination.

Acknowledgments

This study was made possible through the financial support provided by the Academy for Health
Innovation, Uganda, with funding from the International Development Research Centre (IDRC)
and the Swedish International Development Cooperation Agency (Sida), Grant No. 109804 –
001. We are grateful to all those who have participated in the study and provided their time and
questions, as well as those who reviewed and generated answers including Dr Elizabeth Oseku,
Dr Martin Balaba, Dr Hope Mackline, Dr Annet Onzia, Annet Nanungi Kabuye, Brenda Dawa,
Mable Nanozi, Ana Beatrice Magoba, Futumu Sadik, Mike Mugude, Lillian Rutaisire, Dr Derek
Ngabirano, and Martin Sejja.

Authors’ Contributions

Conceptualization: RPR (lead), EO (supporting), RLK (supporting)


Data curation: EO (lead), HS (equal)
Formal analysis: HS (lead)
Funding acquisition: RPR (lead)
Investigation: RPR (lead), EO (equal), MB (equal), HS (supporting)
Methodology: RPR (lead), EO (equal)
Project administration: EO (lead), HS (supporting), CKA (supporting)
Resources: EO (lead), CKA (equal)
Software: HS (lead), CKA (equal)
Supervision: RPR (lead), ABN (equal), RLK (Supporting)
Validation: RPR (lead), ABN (equal)
Visualization: PKM (lead), HS (equal)
Writing – original draft: EO (lead), RPR (equal), PKM (equal), ABN (equal), HS (supporting)
Writing – review & editing: EO (lead), PKM (equal), RPR (equal), ABN (supporting), MB
(supporting), RLK (supporting), CKA (supporting), HS (supporting)

Conflicts of Interest

None declared.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Abbreviations

AI Artificial Intelligence
CDC Centre for Disease Control
CRF Case Report Form
FAIR Findability, Accessibility, Interoperability, Reusability
FHIR Fast Healthcare Interoperability Resources
GCP Good Clinical Practice
GP General practitioner
HASH Hub for Artificial Intelligence in Maternal, Sexual and Reproductive Health
HIV Human Immunodeficiency Virus
IDI Infectious Diseases Institute
IDRC International Development Research Centre
IRB Institutional Review Board
IT Information Technology
LLM Large Language Model
NLP Natural Language Processing
PI Principal Investigator
Q&A Question and Answer
QR Quick response
SAE Serious Adverse Event
Sida Swedish International Development Cooperation Agency
SRH Sexual Reproductive Health
SSA Sub-Saharan Africa
STI Sexually Transmitted Infection
UNCS Uganda National Council of Science and Technology
T
WHO World Health Organization

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

References

1. World Health Organization. Sexual Health. 2024. Accessed February 20, 2024.
https://www.who.int/health-topics/sexual-health#tab=tab_1
2. World Health Organization. Sexually Transmitted Infections (STIs) Factsheet. July 10, 2023.
Accessed February 20, 2024. https://www.who.int/news-room/fact-sheets/detail/sexually-
transmitted-infections-(stis)
3. World Health Organization Regional Office for Africa. GLOBAL HEALTH SECTOR STRATEGY
ON SEXUALLY TRANSMITTED INFECTIONS 2016–2021: IMPLEMENTATION
FRAMEWORK FOR THE AFRICAN REGION. Report of the Secretariat.; 2017.
http://www.who.int/hiv/pub/toolkits/stis_strategy[1]en.pdf
4. Du M, Yan W, Jing W, et al. Increasing incidence rates of sexually transmitted infections from
2010 to 2019: an analysis of temporal trends by geographical regions and age groups from the
2019 Global Burden of Disease Study. BMC Infect Dis. 2022;22(1). doi:10.1186/s12879-022-
07544-7
5. Awuoche HC, Joseph RH, Magut F, et al. Prevalence and risk factors of sexually transmitted
infections in the setting of a generalized HIV epidemic—a population-based study, western
Kenya. Int J STD AIDS. 2024;35(6):418-429. doi:10.1177/09564624241226487
6. Shewarega ES, Fentie EA, Asmamaw DB, et al. Sexually transmitted infections related care-
seeking behavior and associated factors among reproductive age women in East Africa: a
multilevel analysis of demographic and health surveys. BMC Public Health. 2022;22(1).
doi:10.1186/s12889-022-14120-w
7. Gottlieb SL, Low N, Newman LM, Bolan G, Kamb M, Broutet N. Toward global prevention of
sexually transmitted infections (STIs): The need for STI vaccines. Vaccine. 2014;32(14):1527-
1535. doi:10.1016/j.vaccine.2013.07.087
8. Sani AS, Abraham C, Denford S, Ball S. School-based sexual health education interventions to
prevent STI/HIV in sub-Saharan Africa: a systematic review and meta-analysis. BMC Public
Health. 2016;16(1):1-26. doi:10.1186/s12889-016-3715-4
9. Uganda Bureau of Statistics. GOVERNMENT OF UGANDA Uganda Demographic and Health
Survey 2016.; 2018. www.DHSprogram.com
10. Nigussie T, Yosef T. Knowledge of sexually transmitted infections and its associated factors
among polytechnic college students in Southwest Ethiopia. Pan African Medical Journal.
2020;37(68):1-11. doi:10.11604/pamj.2020.37.68.22718

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

11. Ninsiima LR, Chiumia IK, Ndejjo R. Factors influencing access to and utilisation of youth-
friendly sexual and reproductive health services in sub-Saharan Africa: a systematic review.
Reprod Health. 2021;18(1):135. doi:10.1186/s12978-021-01183-y
12. World Health Organization. Global Health Sector on, Respectively, HIV, Viral Hepatitis and
sexually Transmitted for the Period-2030.; 2022. Accessed February 20, 2024. chrome-
extension://oemmndcbldboiebfnladdacbdfmadadm/https://apps.who.int/iris/rest/bitstreams/
1451670/retrieve
13. GSMA Intelligence. GSMA The Mobile Economy Sub-Saharan Africa 2024.; 2024. Accessed
December 12, 2024. https://www.gsma.com/solutions-and-impact/connectivity-for-good/mobile-
economy/wp-content/uploads/2024/11/GSMA_ME_SSA_2024_Web.pdf
14. Kemp S. Digital 2023:Uganda. DATAREPORTAL. February 14, 2023. Accessed February 20,
2024. https://datareportal.com/reports/digital-2023-uganda
15. Mubiru A. Uganda Registers 1.2 Million New Users in One Year. The New Vision Newspaper.
March 26, 2024.
16. Wikipedia Contributors. Chatbot. Wikipedia.
17. Wilson L, Marasoiu M. The Development and Use of Chatbots in Public Health: Scoping
Review. JMIR Hum Factors. 2022;9(4):e35882. doi:10.2196/35882
18. Chang IC, Shih YS, Kuo KM. Why would you use medical chatbots? interview and survey. Int J
Med Inform. 2022;165:104827. doi:https://doi.org/10.1016/j.ijmedinf.2022.104827
19. Miles O, West R, Nadarzynski T. Health chatbots acceptability moderated by perceived stigma
and severity: A cross-sectional survey. Digit Health. 2021;7. doi:10.1177/20552076211063012
20. Palanica A, Flaschner P, Thommandram A, Li M, Fossat Y. Physicians’ Perceptions of Chatbots
in Health Care: Cross-Sectional Web-Based Survey. J Med Internet Res. 2019;21(4):e12887.
doi:10.2196/12887
21. Stryker C, Holdsworth J. What is NLP (natural language processing)? IBM Topics.
22. DataKind UK, Raheema A. Decolonising Data. Medium. Published online September 29, 2021.
Accessed February 20, 2024. https://medium.com/datakinduk/decolonising-data-1d7976aaa12f
23. Ochieng H, Awosiku O. Chatbots in Africa. Digital Health Africa Learning Series: Article 2.0.
September 16, 2023. Accessed December 12, 2024. https://medium.com/@dhealthafrica/an-
overview-of-healthcare-chatbots-in-africa-fb72a31f6297
24. Phiri M, Munoriyarwa A. Health Chatbots in Africa: Scoping Review. J Med Internet Res.
2023;25:e35573. doi:10.2196/35573

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

25. Mills R, Mangone E, Lesh N, Mohan D, Baraitser P. IRIS Learning Brief 3 How Might Chatbots
Support Reproductive Health? Findings from Three New Studies.; 2023. Accessed February 20,
2024. https://www.opml.co.uk/sites/default/files/migrated_bolt_files/iris-learning-brief3-v2.pdf
26. Lin Y, Jiang Y, Li Y, Zhou Y. Privacy-preserving batch-based task assignment over spatial
crowdsourcing platforms. Computer Networks. 2024;241:110196.
doi:https://doi.org/10.1016/j.comnet.2024.110196
27. B.S. A, Soni N, Dixit S. Crowdsourcing – A Step Towards Advanced Machine Learning.
Procedia Comput Sci. 2018;132:632-642. doi:https://doi.org/10.1016/j.procs.2018.05.062
28. Javaid S. Top 6 Data Collection Methods for AI & Machine Learning in 2024. AI Multiple
Research. January 3, 2024. Accessed February 20, 2024. https://research.aimultiple.com/data-
collection-methods/
29. Raza MM, Venkatesh KP, Kvedar JC. Generative AI and large language models in health care:
pathways to implementation. NPJ Digit Med. 2024;7(1). doi:10.1038/s41746-023-00988-4
30. Hub for Artificial Intelligence in Maternal S and RH. Hub for Artificial Intelligence in Maternal,
Sexual and Reproductive Health (HASH). February 2023. Accessed December 12, 2024. https://
hash.theacademy.co.ug/
31. Cisco Systems Inc. Slido. 2024. Accessed December 12, 2024. https://www.slido.com/
32. Tech Target Editorial Staff. Breaking Down the Fast Healthcare Interoperability Resource
(FHIR). TechTarget HealthTech Analytics. December 27, 2023. Accessed December 12, 2024.
https://www.techtarget.com/healthtechanalytics/feature/Breaking-Down-the-Fast-Healthcare-
Interoperability-Resource-FHIR
33. Wilkinson MD, Dumontier M, Aalbersberg IjJ, et al. The FAIR Guiding Principles for scientific
data management and stewardship. Sci Data. 2016;3. doi:10.1038/sdata.2016.18
34. The President & Fellows of Harvard College. Harvard Dataverse. Harvard Dataverse. 2024.
Accessed December 12, 2024. https://dataverse.harvard.edu/
35. Council for International Organizations of Medical Sciences. International Ethical Guidelines
for Biomedical Research Involving Human Subjects.; 2002. Accessed December 12, 2024.
https://media.tghn.org/medialibrary/2011/04/CIOMS_International_Ethical_Guidelines_for_Bio
medical_Research_Involving_Human_Subjects.pdf
36. World Medical Association Inc. DECLARATION OF HELSINKI; Ethical Principles for Medical
Research Involving Human Subjects.; 2008. Accessed December 12, 2024. https://www.wma.net/
wp-content/uploads/2016/11/DoH-Oct2008.pdf

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Supplementary Files

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Figures

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Slido interface.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]


JMIR Preprints Oseku et al

Schematic of study design.

https://preprints.jmir.org/preprint/70005 [unpublished, non-peer-reviewed preprint]

View publication stats


Powered by TCPDF (www.tcpdf.org)

You might also like