KAI: An AI-powered Chatbot To Support Therapy: Bachelor Thesis Project Specilization in Computer Science
KAI: An AI-powered Chatbot To Support Therapy: Bachelor Thesis Project Specilization in Computer Science
Support Therapy
Bachelor Thesis Project
Specilization in Computer Science
Mariama C. Djalo D.
Date: 23/01/2023
Director: Javier Béjar Alonso
Department: Computer Science
Degree: Bachelor of Computer Science
Center: FACULTAT D’INFORMÀTICA DE BARCELONA (FIB)
University: UNIVERSITAT POLITÈCNICA DE CATALUNYA (UPC) – BarcelonaTech
Contents
1 Context 8
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Cognitive Behavioral Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Self-Directed Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Cognitive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Cognitive Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.9 Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Project Planning 18
2.1 Description of tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Project Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Project Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Project Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.5 Project Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.6 Project Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.7 Project Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.8 Thesis Defense Preparation . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Risk management: alternative plans . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Costs Per Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Generic Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Contingency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.4 Incidental Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.5 Management control . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Deviations in the project development . . . . . . . . . . . . . . . . . 29
2.4.2 Deviations in the project documentation . . . . . . . . . . . . . . . . 30
2.4.3 Deviations in the budget . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
3 Identification of Laws and Regulations 31
3.1 Academic Regulations for the Degree Final Project . . . . . . . . . . . . . . 31
3.2 GDPR Privacy Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Company’s contact details . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 The Purposes and Legal Basis for Processing . . . . . . . . . . . . . . 32
3.2.3 Sharing of user’s personal data . . . . . . . . . . . . . . . . . . . . . 32
3.2.4 Sharing of user’s personal data to a third country . . . . . . . . . . . 32
3.2.5 Period of time storage of user’s personal data . . . . . . . . . . . . . 33
3.2.6 User’s Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 The EU Regulatory Environment of Medical Device Software Development . 33
4 Sustainability report 35
4.1 Self assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Environmental dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Economic dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Social dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Technical Competences 37
6 Dialogue Flow 38
6.1 Identifying Potential Cognitive Distortions . . . . . . . . . . . . . . . . . . . 38
6.2 Challenging Potential Cognitive Distortions . . . . . . . . . . . . . . . . . . 38
6.3 Chatbot’s Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3.1 Relaxation Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Dataset 40
7.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.1.1 CountVectorizer and TfidfTransformer . . . . . . . . . . . . . . . . . 40
7.1.2 Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8 Model Experimentation 42
8.1 Multinomial Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.1.1 alpha Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Multinomial Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.2.1 penalty Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2.2 C Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2.3 solver Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3.1 decision function shape Hyperparameter . . . . . . . . . . . . . . . . 45
8.4 K-Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.4.1 n neighbors Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . 46
8.4.2 weights Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.4.3 p Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.5 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.6 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2
8.6.1 Hyperparameters Tuning . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9 Implementation 50
9.1 Model Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2 Chatbot Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2.1 Rule-Based Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.3 Dialogue example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3.1 Dialogue Flow Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3.2 Rule-Based Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
10 Conclusions 54
10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
10.2 Reflexions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
B Dialogue Flow 57
List of Figures
1 A drawing that depicts how a situation causes feelings, from the book A
Therapist’s Guide to Brief Cognitive Behavioral Therapy [1]. . . . . . . . . . 10
2 Diagram illustrating the structure of the Cognitive Model, from the book A
Therapist’s Guide to Brief Cognitive Behavioral Therapy [1]. . . . . . . . . . 10
3 A diagram showing the different subfields of Artificial Intelligence, from Eu-
ropeanValley [2], a website. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 ML model workflow overview, from Google Machine Learning Education [3],
a website. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Gantt Chart illustrating the project’s schedule following a Waterfall model.
[Own Creation] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Results of the Hyperparameters Tuning [Own Creation]. . . . . . . . . . . . 48
7 Model Experimentation’s Results [Own Creation] . . . . . . . . . . . . . . . 49
8 Screenshot showing the first part of the Dialogue flow [Own Creation] . . . . 52
9 Screenshot showing the second part of the Dialogue flow [Own Creation] . . 52
10 Screenshot showing how the chatbot correctly detects the the keywords and
responds accordingly [Own Creation] . . . . . . . . . . . . . . . . . . . . . . 53
3
11 15 major cognitive distortions by PositivePyschology, a website [4] . . . . . . 56
12 Chatbot’s Dialogue Flow [Own Creation] . . . . . . . . . . . . . . . . . . . . 57
13 EU Regulatory Environment for MDSW Development. [5] . . . . . . . . . . 58
14 Number of examples grouped by cognitive distortions [Own Creation] . . . . 60
15 There are numerous alternative hyperplanes that might be used to divide the
two groups of data points (left image). Finding a plane with the greatest mar-
gin—that is, the greatest separation between data points from both classes—is
our goal (right image) in SVM. Images obtained from Towards Data Science
[6], a website. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
List of Tables
1 Task Table containing a summary of all task information. T and GEPT means
Tutor and GEP Tutor respectively. [Own Creation] . . . . . . . . . . . . . . 24
2 Salary of the different roles extracted from PayScale, a compensation software
company [7] multiplied by 1.35 to include the cost of social security [Own
creation]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Budget Structure of the project [Own creation] . . . . . . . . . . . . . . . . . 28
4 Final version of the task table [Own creation] . . . . . . . . . . . . . . . . . 29
5 Final version of the budget structure [Own creation] . . . . . . . . . . . . . . 30
6 Standard and guidance documents useful to demonstrate MDSW compliance
with MDR. [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4
Acknowledgement
First of all, I want to thank my family for supporting me in my worst times and for sharing
with me my best moments. Also, I want to thank my friends for accompanying me through
this journey and for forming part of my good memories from College. Finally, I must express
my gratitude to my director for guiding me during the development of the project, I really
appreciate his support.
5
Abstract
This project attempts to bridge the huge gap between people who struggle with
mental health and people who actually get treated with the design and implementation
of Kai, an AI-powered Chatbot that supports Cognitive Behavioral Therapy (CBT).
CBT is based on identifying cognitive distortions (negative thoughts) and challeng-
ing them to improve mood and overall mental health. This was done by using Text
Classification, a Natural Language Processing (NLP) technique to identify potential
cognitive distortions in text.
During the project, a model experimentation was done to compare different super-
vised machine learning models in order to choose the best one for the text classification.
The dataset needed to train the models was generated by manually giving labelled ex-
amples of the 15 major cognitive distortions. The model experimentation was entirely
done in python.
Furthermore, the design of the dialogue flow of the chatbot was done following the
CBT’s guidelines and the implementation of the chatbot was done in Python using the
TKinter in Framework for the interface. Finally, two test were made to check for the
correct functioning of the chatbot.
Resum
Aquest projecte intenta reduir l’enorme bretxa existent entre les persones que
pateixen problemes de salut mental i les que realment reben tractament amb el disseny
i la implementació de Kai, un xatbot impulsat per IA que dóna suport a la teràpia
cognitivoconductual (TCC).
La TCC es basa en la identificació de les distorsions cognitives (pensaments negatius)
per qüestionar-les amb l’objectiu de millorar l’estat d’ànim i la salut mental. Per això
es va utilitzar la Classificació de Text, una tècnica de Processament del Llenguatge
Natural (PLN) per identificar potencials distorsions cognitives en el text.
Durant el projecte es va fer una experimentació de models per comparar diferents
models supervisats d’aprenentatge automàtic per triar el millor per a la classificació
de textos. El conjunt de dades necessari per entrenar els models es va generar pro-
porcionant manualment exemples etiquetats de les 15 distorsions cognitives principals.
L’experimentació del model es va fer ı́ntegrament en python.
A més, el disseny del flux de diàleg del chatbot es va fer seguint les directrius
del TCC i la implementació del chatbot es va fer a Python utilitzant TKinter, un
Framework per a la interfı́cie. Finalment, es van realitzar dos tests per comprovar el
funcionament correcte del chatbot.
6
Resumen
Este proyecto intenta reducir la enorme brecha existente entre las personas que
sufren problemas de salud mental y las que realmente reciben tratamiento con el diseño
y la implementación de Kai, un chatbot impulsado por IA que da soporte a la terapia
cognitivo-conductual (TCC).
La TCC se basa en la identificación de las distorsiones cognitivas (pensamientos neg-
ativos) para luego cuestionarlas con el objetivo de mejorar el estado de ánimo y la salud
mental. Para ello se utilizó la Clasificación de Texto, una técnica de Procesamiento
del Lenguaje Natural (PLN) para identificar potenciales distorsiones cognitivas en el
texto.
Durante el proyecto se realizó una experimentación de modelos para comparar dis-
tintos modelos supervisados de aprendizaje automático con el fin de elegir el mejor para
la clasificación de textos. El conjunto de datos necesario para entrenar los modelos se
generó proporcionando manualmente ejemplos etiquetados de las 15 distorsiones cog-
nitivas principales. La experimentación del modelo se realizó ı́ntegramente en python.
Además, el diseño del flujo de diálogo del chatbot se hizo siguiendo las directrices
del TCC y la implementación del chatbot se hizo en Python utilizando TKinter, un
Framework para la interfaz. Por último, se realizaron dos tests para comprobar el
correcto funcionamiento del chatbot.
7
1 Context
This is a Bachelor Project Thesis done at the Barcelona School of Informatics (FIB) under
the supervision of Javier Béjar Alonso for the Computer Science Degree with a specialization
in Computing.
1.1 Introduction
According to WHO [8], depression is a leading cause of disability and among people aged
15 to 29, suicide is the fourth most common cause of death. There is still a substantial
disparity between those who require care and those who have access to care, despite the fact
that many mental health illnesses may be adequately treated at low cost [9].
Therapeutic and mental health Chatbots are an intriguing way to bridge this gap. A
chatbot is a computer system tool that is designed to simulate human communication.
On the other hand, it is widely known that a few sessions of cognitive-behavioral therapy
(CBT) can be extremely beneficial in the treatment of anxiety and depression. However,
many people do not have access to a CBT therapist because they cannot afford it, it is not
covered by their insurance, or there are none nearby. It can also be difficult to go to therapy
due to lack of time because of work or having to take care of kids for instance. However, in
some cases, a therapist may not be required. There are numerous options for doing CBT
without a therapist, such as self-help books and internet-based treatment, or by conducting
your own research on the material. Self-directed CBT has been shown in numerous studies
to be very effective [10]. This particular type of therapy promotes independence and self-
therapy. The methodology of CBT is based on identifying cognitive distortions, known
as negative thoughts and challenging them by replacing them for alternative and more
positive thoughts in order to improve mood and the overall mental health.
The goal of this project is to combine the two previously mentioned methodologies to help
people dealing with mental health issues. KAI, a Chatbot that provides therapy support in
self-directed CBT, will be designed and implemented as part of the project. KAI will detect
Cognitive Distortions in text using Text Classification, an NLP technique that will assist in
the application of CBT.
This project aims to address the problem of a significant gap that exists between people
who are struggling with mental health issues and need assistance and those who have access
to mental health care. In this project, the design and development of an AI-powered chat-
bot that supports therapy by automatically detecting cognitive distortions is done in order
to achieve this goal. The goal of this project is to train various supervised machine learn-
ing techniques for text classification in order to identify cognitive distortions from chatbot
conversations and use them to aid in self-directed CBT.
The project is aimed at people who meet all of the following conditions:
• People who have mild to moderate symptoms of mental health issues and can function
normally. 1
1
Those who are severely depressed or have severe mental health problems will most likely require one-on-
one therapy with a professional
8
• People who do not have access to or prefer not to speak with a therapist due to privacy
concerns and/or a fear of being judged as a result of the stigma associated with mental
health issues.
• People who prefer to be autonomous and prefer to learn and treat themselves with
self-directed therapy.
Bear in mind that the chatbot is not a therapist and thus does not provide
mental health diagnosis; rather, the chatbot is a tool that assists with the automatic
detection of potential cognitive distortions and guides you through the process of identifying
and challenging them. In other words, KAI is a self-directed AI-based CBT tool that assists
people in learning how to deal with cognitive distortions.
9
Figure 1: A drawing that depicts how a situation causes feelings, from the book A Therapist’s
Guide to Brief Cognitive Behavioral Therapy [1].
Figure 2: Diagram illustrating the structure of the Cognitive Model, from the book A Ther-
apist’s Guide to Brief Cognitive Behavioral Therapy [1].
10
1.3 Artificial Intelligence
Artificial intelligence (AI) is another important concept of the project and will be explained in
this section. Artificial intelligence (AI) are computer systems that mimic human intelligence,
for instance, learning, problem-solving, decision-making and understanding languages. AI
applications include face recognition, chatbots, and self-driving cars.
1.3.1 NLP
Natural language processing (NLP), as illustrated in Figure 3, is a subfield of artificial
intelligence. NLP focuses on teaching computers to comprehend text and spoken language
in the same way that humans do.
Figure 3: A diagram showing the different subfields of Artificial Intelligence, from European-
Valley [2], a website.
11
In unsupervised learning, the model is not given labeled training examples; instead, it
must use methods like clustering to determine the underlying structure of the data.
In machine learning, features refer to the input variables or characteristics that are used
to describe and predict the output or target variable. Feature selection is the process of
selecting a subset of the most relevant and informative features for building a model.
Overfitting occurs when a model learns the information and noise in the training data
to the extent that it negatively affects the model’s performance on new data. This indicates
that the machine learns concepts from the noise or random fluctuations in the training data.
These concepts don’t apply to new data, which poses a difficulty for the models’ capacity to
generalize.
Nonparametric and nonlinear models, which have more flexibility while learning a target
function, are more susceptible to overfitting. As a result, a lot of nonparametric machine
learning algorithms additionally incorporate parameters or methods to restrict and limit the
amount of information the model may learn.
High Variance
Bias
12
Data Crowdsourcing
Data crowdsourcing is the process of gathering data from a big group of people, fre-
quently via an internet platform. This may be a useful method for quickly and inexpensively
gathering a lot of data.
Figure 4: ML model workflow overview, from Google Machine Learning Education [3], a
website.
1.4 Justification
The use of AI in mental healthcare and psychiatry, particularly therapeutic chatbots, is still
in its early stages, with limited research data and datasets available to fully explore the
field’s true potential.
Shickel et al. [11] published the most relevant research in 2019. The authors used su-
pervised machine learning techniques such as SVM, XGBoost, and RNN to find cognitive
distortions in text automatically. The authors obtained a weighted F1 scores of 0.68.
There are currently no datasets containing labeled text sections with distortions that
are available to the general public because the detection of cognitive distortions is a novel
13
machine learning task. They gathered information using a real-world online therapy program
and crowdsourcing in order to resolve this difficult problem.
Toledo et al. [12], on the other hand, took a different and very innovative approach:
they developed cognitive distortion responses to support CBT interactions. Which is ex-
tremely important because Cognitive Behavioural Therapy’s (CBT’s) core idea is the ability
to change distorted or negative beliefs (cognitive distortions) into more realistic alternatives
(positive thoughts). The authors used Transformers learners to generate the responses.
It’s also worth mentioning that there are self-help apps and chatbots for mental health,
many of which use CBT, on the market, including in Spain2 . Wysa and Woebot are the
most well-known.
As mentioned before, there are currently no publicly available datasets. Crouwdsourcing
and data generation are two options for resolving this challenging issue. Since Crowdsourcing
is a paid service, I will opt for the second option, in other words: I will create the dataset
myself. The process will be completed by providing sufficient examples of the 15 major
cognitive distortions. To ensure that the data is accurate, examples will be drawn or be
inspired from psychological books, websites, and articles specializing in psychology.
Unfortunately there is no public information available on how to design a CBT chatbot.
Therefore, the design will be done from scratch with the aid of a very helpful and introductory
manual book to CBT [1] that will help have a better understanding about the structure of
cognitive behavioural therapy session.
1.5 Scope
The project’s objectives, functional requirements, and potential risks and obstacles are all
discussed in this section.
1.5.1 Objectives
The main goal of this project is to design and implement an AI-powered Chatbot that
supports therapy to assist people suffering from mental health issues such as anxiety or de-
pression. Kai will use CBT technique and with the help of AI he will automatically detect
cognitive distortions. To achieve this goal, the project has been divided into several sub-
objective:
Theoretical part
3. Investigate the best supervised machine learning techniques for text classification.
2
This fact confirms and clarifies that it is perfectly legal to make support therapy chatbots
14
Practical Part
1.6 Requirements
There are some requirements that must be met in order to ensure the final project’s quality:
• Data pre-processing. Data pre-processing is essential for ensuring that the models
perform optimally.
• Optimization of the code. Optimize the code for all imputation methods as well to
improve efficiency.
• Avoid bias. Avoiding bias is key to ensure that the models are accurate.
1.7 Risks
There may be some risks that prevent the project from progressing smoothly:
• Deadline of the project. The project also has a deadline for completion that must
be met. This forces to make difficult decisions. As a result, strong organizational skills
and the ability to meet deadlines are essential for finishing the project on time. Some
libraries, on the other hand, have flaws.
• Bugs in libraries. Some libraries may have bugs in certain functions, resulting in
incorrect code.
15
1.8 Methodology
In order to have more flexibility, I will use a hybrid of Waterfall and Agile workflow method-
ology for this project. Waterfall is a project management methodology that is based on
a sequential design process. Agile is a methodology that prioritizes development through
evolution. This method enables sprint work and the resolution of issues that arise during
iterations. The Agile-Waterfall hybrid method combines the best features of both methods:
Agile allows you to check for bugs, test the code, and correct it in a progressive way without
having to wait until the entire implementation is completed. Waterfall, on the other hand,
allows you to keep track of all the dependencies between tasks to better organize the project.
The Kanban framework which falls under the Agile methodology will be used. Since the
1950s, the Japanese phrase ”kanban,” which means ”visual board” or a ”sign,” has been
used to refer to a process definition. Toyota invented it and used it as the first just-in-time
factory scheduling system. The ”Kanban Method,” which was initially defined in 2007, is
known and connected with the capitalized term ”Kanban,” on the other hand.
Kanban boards are used to efficiently show and control the workflows. The essential
elements are:
- Kanban Cards are used to express tasks visually. Each card contains details on the
work and its progress, including the due date, the person assigned to it, the description,
etc.
- Kanban Columns: On the board, each column corresponds to a distinct step of your
operation. The workflow is applied to the cards till they are finished completely.
1. To Do: composed by all of the tasks that haven’t been started yet.
2. In progress: composed by all of the tasks that are still in progress.
3. Tested: composed by all of the tasks that have already been completed but need to
be tested to ensure they work properly.
4. Completed: composed by all completed and tested tasks.
There are many project management tools that follow the kanban methodology, but I’m
going to use Trello because I believe it’s the better option for small teams or one-person
teams, such as freelancers (which is my case).
On the other hand, a Gantt chart with a Waterfall workflow will also be used to keep
track of the dependencies and the required time for each task.
I’ll use a Github repository as a version control tool to make sure I can restore earlier
versions in the event of serious errors because it’s securely kept in the cloud.
I’m planning to use the GitHub Flow as my Git branching technique. Its branches
are organized into a main branch where the code that is ready for production is kept and
additional branches, referred to as feature branches, where work on new features and bug
fixes is done and then merged back into the main branch. Smaller teams, like mine, as I’ve
already mentioned, benefit most from this approach.
16
For the practical part, the cross-validation method will be used to choose the optimum
Hyper-parameters for the models and also to check their performance and if there is
bias. I’ll also set the random state to an integer to prevent having different outcomes each
time I run the model.
Last but not least, whenever I have queries or run into problems, I shall email or meet
with my tutor online. Extraordinary in-person meetings will be scheduled if I encounter any
serious issues with my project or if I believe that communicating with my tutor in person
will be more convenient.
1.9 Stakeholders
In this section the stakeholders who will benefits from the completion of this project will be
enumerated.
The project’s completion is important not only for people suffering from mental illnesses,
but also for hospitals, clinics, educational institutions, and communities/groups in general.
They could use the chatbot to treat and improve the mental health of patients/community
members, thereby improving people’s overall well-being.
- The project’s completion is specially important for people struggling with mental
health.
17
2 Project Planning
The project will last approximately 579 hours spread over 126 days, beginning on September
20th, 2022 and ending on January 23rd, 2022. Since the date for the project defense has
not yet been determined, the previous deadline is the earliest we can have. It is planned to
work an average of 5 hours per day, but some flexibility may be required due to exams or
personal issues.
This section begins with a task description, followed by the resources required for the
project’s development, and finally by an explanation of risk management. Furthermore,
Table 1 summarizes all of the defined tasks, as well as their dependencies and required
resources, and Figure 5 captures the project schedule.
18
• PM3 - Time planning.
• PM5 - Meetings.
- Description: Meetings with the project’s tutor will be scheduled as needed (E.g.
when doubts or critical problems that impede the proper development of the
project arise. I have added a ”reserved time” for the meetings to make a better
estimation of the total hours required for the project.
- Resources: Tutor.
- Approximate duration: 1 hour a week (in total, 18 hours).
• PR2 - ML research.
19
2.1.3 Project Theory
In the theoretical part, I will study the structure of a CBT Therapy session in order to
properly design the dialogue flow. Furthermore, I will consider what are the best ML models
for both text classification and small datasets 4 .
This part is divided into the following tasks:
• PT1 - Design.
• PT2 - Choose.
- Description: Choose between the top 3-5 supervised ML models for text classi-
fication and small datasets.
- Resources: PC, Books, Research Papers, Articles.
- Approximate duration: 10 hours.
20
2.1.5 Project Experimentation
In the experimentation section, the ml model workflow is completed as described in the
previous assignment, and the best model is selected after analyzing the metrics.
In summary, the tasks are divided into the following:
• PE3 - Choose the best ml model. Analyse the performance of every model and
choose the best one for the text classification. This part will require 2 hours.
All these tasks will be done in Colaboratory, best known as ”Colab”. This tool is a product
from Google Research. Colab is the best tool for machine learning, data analysis, and
education since it enables anyone to create and execute arbitrary Python code in the browser.
Google Drive is where Colab notebooks are kept, making it a secure cloud storage option.
Furthermore, a PC, books, research papers, articles, programming languages and Github
will be needed.
• PDEV1 - Implementation.
• PDEV2 - Testing.
- Description: Test the correct functioning of the telegram bot. This will be done
during and after the implementation of the apps in order to make sure all parts
work correctly on time.
- Resources: PC, Github, Colab and programming languages.
- Approximate duration: 60 hours.
21
2.1.7 Project Documentation
To avoid having to do everything at the end, the project documentation will be completed
concurrently with the project’s development (after the research part is done). For these
task we need a PC, Overleaf/Texifier and Trello to keep track of the progress. The Project
Documentation has been broken down into the following tasks:
• PDOC1 - Annotation of events: Annotation of all the events that are done during
the project development. This task will be done intermittently and will aproximately
require 10 hours.
• PDOC2 - Revision of the events: Once the project development is done is time
to check all the documentation done during the project to better organize the ideas,
correct changes and structure the final document correctly. This task will require 20
hours.
• PDOC3 - Write Final Document: After PDOC2 is done the writing of the final
documentation begins. This task will require approximately 60 hours.
22
would give examples (that may or may not apply to their real-life situation) that
would most likely differ from mine due to differences in backgrounds, for example.
– Resources to reuse: PC, programming languages, and TeamGantt.
– Estimated delay: between 1-2 weeks.
• Bugs in libraries [Medium Risk]. Third-party libraries will be used during project
development, and they may contain bugs. Waiting until the library is updated, which
should hopefully fix the bug, is one possible solution. However, due to the tight
deadline, this option is out of the question. As a result, coding the function from scratch
and testing its correct operation would be required, increasing the overall duration of
the project.
23
24
Table 1: Task Table containing a summary of all task information. T and GEPT means Tutor and GEP Tutor respectively.
[Own Creation]
25
Figure 5: Gantt Chart illustrating the project’s schedule following a Waterfall model. [Own Creation]
2.3 Budget
In this section the economic cost of the project is discussed. First, the staff cost is described
and analysed, then the generic and indirect costs are also calculated. Furthermore, the
mechanism for controlling potential budget deviations is also explained. Finally, in Figure 3
we can see that the budget estimation is 17066,08€.
• Project Manager. The project manager is in charge of the project’s planning and
development; in other words, the project manager oversees the project’s progress.
• Research ML. The researcher is responsible for investigating the best supervised
machine learning models for the project and selecting the best hyperparameters for
optimization.
The Project Managements roles are going to be played by the tutors and the rest of roles
by me.
In this section it is computed the Total Personnel Cost Per Activity (CPA). Each task
or activity (previously defined in 2.1) is associated with the staff cost who are involved in
that task. In this project there are 7 roles, each one with a different hourly salary which
translates into cost per hour shown in Figure 2.
26
Role Gross Annual Salary (€) Price per hour (€)
Project Manager 52899,6 25,425
Software Engineer 46556,9 22,375
Tester 39533 19
Research ML 47239,4 22,7
Research psychologist 71436,3 34,35
Technical Writer 41600 20
ML engineer 47239,4 22,7
Table 2: Salary of the different roles extracted from PayScale, a compensation software
company [7] multiplied by 1.35 to include the cost of social security [Own creation].
The computation of CPA is done by multiplying the hours required per task/activity
with the cost per hour of the role that is involved in the activity. The total CPA is the
sum of the CPA of every task of the Gantt Chart. As shown in Figure 3, the total cost of
recruitment (CPA) is 13366,05€.
1 1 1
Amortisation(e ) = Resource Price· · · ·Hours Used
Years of Use Days of Work Hours per Day
(1)
The indirect costs are identified to make the budget more realistic. Since I’ll be working
from home for the project (unless an extraordinary in-person meeting with the tutor is
required), the transportation cost is zero. On the other hand, internet costs around 70€ per
month, and electricity costs 100€ per month. The total Generic Cost, as shown in Figure 3,
is 1223,11€.
2.3.3 Contingency
Unexpected events are common during the development of a project, and one must plan
ahead to account for them. As a result, a contingency plan is created in order to avoid
potential delays during the planning process. Since contingency margins in the IT sector
typically range from 10% to 20%, I decided to have a 15% contingency margin for this
project, which amounts to 2188,37€.
27
2.3.4 Incidental Costs
Incidental costs define all potential risks that could cause project delays. The most extreme
risk of the project in this case is detecting bias in the machine learning models and thus
having to generate more data alternatively, as explained in previous sections. The project is
delayed as a result of this risk. Total Incidental Costs are 288,54€.
28
2.4 Deviations
The project’s methodology hasn’t changed; the hybrid mode between waterfall and agile
technique is well suited for the project, and it’s because of this that the previously mentioned
deviations haven’t had a significant impact on the project’s proper development.
The gantt chart, which employs a waterfall methodology, was used throughout the
project’s development to determine the dependencies and the order in which to strategize the
tasks. On the other hand, the project’s coding and testing phases used an agile methodology
with kanban boards to keep track of all the tasks.
As can be seen in Table 4, there were two significant changes: one that affected the
project development and the other that affected the project documentation. Both changes
had an impact on the budget. The changes will be described in the section that follows.
29
Additionally, the time required to do the project development was reduced from 200
hours to 146 hours.
30
3 Identification of Laws and Regulations
Understanding the laws and regulations that have an impact on the design and development
of the chatbot is one of the most crucial components of the project thesis.
• Delivered on schedule
In the following sections, all the information that must be included in a privacy notice is
explained.
31
Article 13(1)(b) [16] of the GDPR also requires providing: ”the contact details of the
data protection officer, where applicable”. A data protection officer is required for
some firms of a specific size or those that consistently handle sensitive personal data (DPO).
• To carry out or enter into a contract with them, you must process their personal data.
• Failure to process their personal data could endanger their lives or the life of another
person.
• Processing their personal data is something you’re doing in the public interest.
The app falls under the category ”You have a legitimate interest in processing
their personal data” [16] since it collects user data in order to identify potential cognitive
distortions based on user input.
32
3.2.5 Period of time storage of user’s personal data
Article 13(2)(a) [16] of the GDPR requires informing users: ”the period for which the
personal data will be stored, or if that is not possible, the criteria used to de-
termine that period”. It’s crucial to comply with the GDPR’s prohibition on keeping
personal data longer than necessary. In the case of the app, user data is never stored.
33
– The software’s output is then validated clinically to guarantee that it is accurate
and dependable in the context of the clinical setting.
• Usability requirements. SW developers must make sure that as many user errors
as possible are prevented via the user interface. IEC 62366 Medical devices — Part 1:
Application of usability engineering to medical devices must be followed when planning
and conducting usability tests for this. From a cybersecurity and safety standpoint,
each and every one of the discovered user errors must be taken into account in the
risk analysis and contributed to the risk management strategy and report. As with
any other risk, preventive steps must be taken if the possibility of user errors cannot
be entirely removed. Some of them include increasing training or adding particular
warnings to the user handbook.
• Cybersecurity requirements. This mainly regard patient data protection and pro-
tection from other cyber threats.
As with any other MD, it is advised that MDSW developers use approved techniques and
standardized procedures like the ones listed below in order to adhere to the relevant GSPRs:
Table 6 provides an exhaustive list of the standards and guidelines now in existence that
are advised to be followed by MDSW developers in order to accomplish with applicable
GSPRs, together with their most recent updates.
5
International Organization for Standardization (ISO)
6
International Electrotechnical Commission (IEC)
7
American National Standards Institute (ANSI)/ Association for the Advancement of Medical Instru-
mentation (AAMI)
34
4 Sustainability report
It is well known about climate change and the consequences that we are going to suffer or that
we are even suffering now. Thus is really important for humans and companies to stop being
selfish and think about the future of the world and cooperate in order to reduce pollution
urgently. Thus, it is important to check the footprint of a project to see how does it impact
in the environment. Assessing for the economical impact is also important, it helps us to
optimize cost and savings. Finally keeping track of the social impact of a company/project
is also very important. New technologies in particular have changed millions of people life
including minorities and in developing countries.
gr CO2
137 · 0.2 kWh · 540 h (total duration of the project) = 14796 gr of CO2) (3)
kW h
One approach to reduce the the impact would be to execute in parallel to reduce training
and evaluation time. This can be accomplished by increasing the njobs parameters above 0:
the sklearn python library includes a parameter for determining the number of jobs to run
in parallel for cross-validation.
Regarding the exploitation, as mentioned before, most people go to therapy in the tra-
ditional way. Since people can have access to KAI without the need to move from home,
the project helps in reducing the pollution because they no longer need to take a mains of
transport to go to a session. So we can conclude that KAI is more environmentally-friendly.
Regarding the risks, the project doesn’t pose any, in fact the projects helps in reducing
people’s ecological footprint as mentioned previously.
35
4.3 Economic dimension
Regarding PPP, in section 2.3 the estimated costs of the project are identified and calculated
and the budget is also shown. The hours required for the project have been revised and finally
reduced resulting in saving costs.
Regarding the exploitation, nowadays most people go to in-person therapy with a pro-
fessional therapist which is really expensive: in Spain the average price for one session is
50e and taking into account that on average a person needs between 8-20 sessions, the final
costs amounts to 400-1000e . Online therapy sessions are getting popular giving people the
flexibility to receive support and help with the need to transport. This option is usually
more economical than traditional therapy (in person). Self-help therapy chatbots currently
available in the market work on a free basis but if you want access to more content you need
to subscribe, furthermore if you want to have access to a therapist: rates vary depending on
the therapist or works on a subscription basis. Since KAI is free, it will help people embark
in their self-help journey in therapy with the guide of the chatbot in a more affordable way.
In the future, the project will have an almost inevitable cost: human resources. This cost
could be reduce with the automation of tasks and with the availability of datasets.
Regarding the risks, the project is very dependant on data, if the quality of data is not
good enough it could lead to very inaccurate predictions.
Regarding the exploitation, the project will help to close the gap between those who need
and those who receive mental health care. They will also have 24/7 access to support. More
importantly, it will allow people to have access to mental health support in a more
affordable, autonomous, and time-efficient way. This project is aimed at people who
have mild to low symptoms of depression or anxiety. People who suffer from severe mental
health or struggle with severe depression and anxiety are advised against using the chatbot.
Regarding the risks, as mentioned before, the project is not aimed at people who suffer
from severe mental issues, instead is a tool for people with mild symptoms of anxiety and/or
depression to do a self-guided CBT therapy.
36
5 Technical Competences
During the development of the project thesis, the following technical competences from com-
puting specialization were followed:
CCO2.1
To demonstrate knowledge about the fundamentals, paradigms and the own
techniques of intelligent systems, and analyse, design and build computer sys-
tems, services and applications which use these techniques in any applicable
field. [Quite]
During the project design and development of an application (a chatbot) using machine
learning has been done.
CCO2.2
Capacity to acquire, obtain, formalize and represent human knowledge in a com-
putable way to solve problems through a computer system in any applicable field,
in particular in the fields related to computation, perception and operation in
intelligent environments. [Quite]
This competence was achieved with the acquisition of human knowledge and the repre-
sentation through machine learning to detect possible cognitive distortions.
CCO2.3
To develop and evaluate interactive systems and systems that show complex in-
formation, and its application to solve person-computer interaction problems.
[A little]
This competence was achieved with the design and development of the chatbot that
extracts and shows complex information (possible cognitive distortions) from user input, a
form of human-computer interaction (conversational user interface).
CCO2.4
To demonstrate knowledge and develop techniques about computational learn-
ing; to design and implement applications and system that use them, including
these ones dedicated to the automatic extraction of information and knowledge
from large data volumes. [In depth]
The study included extensive research on the top machine learning methods for topic
classification. In addition, the design and development of the chatbot that automatically
detects potential cognitive distortions using a machine learning technique.
37
6 Dialogue Flow
The design of the chatbot’s dialogue flow is described in this section. The dialogue flow
is divided in three main parts as shown in Figure 12 which are the following: identifying
potential cognitive distortions, challenging potential cognitive distortions, and
chatbot’s feedback. The design of the dialogue flow is based on Module 9 and Module 10
of A Therapist’s Guide to Brief Cognitive Behavioral Therpy [1].
• Asking how the user has been feeling. If the user’s answer is positive then the support
to therapy ends and the dialogue starts over. Otherwise the dialogue flow continues.
• Asking the user what made them feel this negative feeling.
• Asking the user what where they thinking when they were in that situation that made
them feel bad in order to detect an automatic thought. In this part is where the ml
model to detect potential cognitive distortions will be used.
• Asking the user how would they rate their negative feeling/mood that they previously
mention in the first step.
38
6.3 Chatbot’s Feedback
The chatbot provides feedback to the user as the final stage of the conversation flow. This
is accomplished by providing the user with a definition of the potential cognitive distortion
that has been identified during the conversation, along with some helpful advice. If the
chatbot notices that the user’s mood hasn’t improved, there is also a relaxing exercise (a
link to a video).
39
7 Dataset
The dataset of the project recollects phrases examples of 15 cognitive distortions (see Figure
11 to see the lists of cognitive distortions with their definitions). The dataset is composed
of 595 rows and 2 columns. In the first columns there are example phrases of cognitive
distortions and the second column indicates the type of cognitive distortion. Since not all
automatic thoughts are negative or cognitive distortions, non-cognitive distortions examples
have been added too.
As mentioned in section 2.1.3, the data has been generated by recollecting examples of
cognitive distortions from trustful sources (books, articles, official psychology pages, etc).
7.1 Preprocessing
In this section the preprocessing done to the dataset is explained.
7.1.2 Balancing
When there are significantly less samples from one or more classes than there are from other
classes, the dataset is said to be unbalanced. This might happen if the data were gathered
from a real-world situation where there might not be an equal distribution of examples among
the various classes. The dataset is clearly unbalanced, as can be seen in Figure 14 (there are
almost 50 examples of Emotional Reasoning but only a little more of 20 examples of Always
Being Right).
Machine learning algorithms may encounter difficulties when given unbalanced datasets
because they may be biased in favor of the dominant class and may not adequately repre-
sent the minority class. This may result in models that are not generalizable to real-world
scenarios and poor performance on the minority class.
The SMOTE methodology is used for balancing in this project. SMOTE (Synthetic
Minority Oversampling Technique) is an oversampling technique used in machine learning to
40
overcome the problem of unbalanced datasets. Instead of just duplicating existing examples,
it generates synthetic examples of the minority class to balance the distribution of classes.
To generate synthetic examples, SMOTE first selects a minority class example and finds
its K nearest minority class neighbors. It then interpolates a new synthetic example between
the selected example and one of its neighbors, by sampling from a line between the two
examples. This process is repeated until the desired amount of oversampling is achieved.
41
8 Model Experimentation
Each machine learning model that was used in the model experimentation is introduced
in the subsections that follow, along with an explanation of how each hyperparameter was
tuned. Additionally, the model experimentation results are presented and analyzed.
P (B|A) · P (A)
P (A|B) = (4)
P (B)
where:
• P (B|A) is the probability of condition B being true given that event A has occurred.
42
smoothing option is frequently set to 1, which increases the count of each feature by one.
This can be mathematically expressed with Formula 5 where:
• P (f eature|class) is the probability of the feature occurring given the class label.
• count(feature, class) is the number of times the feature has been observed in the train-
ing data for the class label.
• num features is the total number of unique features in the training data.
p(y = k)
!
log = b0 + b1 · x1 + b2 · x2 + ... + bn · xn (6)
1 − p(y = k)
Maximum likelihood estimation is the most used technique for calculating the beta pa-
rameter, or coefficient, in this model (MLE). This method tests several beta values repeat-
edly in search of the best match for the log odds. Logistic regression aims to maximize this
function after each of these iterations in order to determine the optimal parameter estimate.
8
The probability of success divided by the probability of failure
43
Once the optimal coefficient (or coefficients, if there are numerous independent variables) has
been identified, the conditional probabilities for each observation can be computed, logged,
and summed to obtain a predicted probability.
There are several advantages to using multinomial logistic regression, including its ability
to handle multiple categories and its ability to model the relationships between the inde-
pendent variables and the dependent variable. It’s crucial to keep in mind that the model
makes the assumption that the independent variables are unrelated to one another, which
may not always hold true in actual life.
8.2.2 C Hyperparameter
According to the scikit-learn API library [13] the C hyperparameter is the ”Inverse of reg-
ularization strength; must be a positive float. Like in support vector machines,
smaller values specify stronger regularization”.
44
The hyperplane is defined by a weight vector (w) and a bias term (b). The equation
of the hyperplane is given by Formula 7 where x is a feature vector and w and b are the
parameters of the hyperplane.
w·x+b=0 (7)
The distance of a point x from the hyperplane is given by Formula 8 where ||w|| is the
norm of the weight vector.
w·x+b
distance = (8)
||w||
An SVM’s objective is to determine the hyperplane with the greatest margin—that is,
the distance between the hyperplane and the nearest data points from either class—and the
hyperplane that maximum separates the classes.
The SVM algorithm uses a method known as the ”kernel trick” to find the hyperplane.
The input data is mapped into a higher-dimensional space using the kernel method, making
it simpler to locate the hyperplane. The type of data and problem complexity determine
the kernel function that is employed. The linear, polynomial, and radial basis functions are
often employed kernel functions.
After locating the hyperplane, the SVM can be used to categorize additional data points
by determining how far they are from the hyperplane. The point is categorized as belonging
to one class if the distance is positive, and to the other class if the distance is negative.
SVMs have several advantages over other classification algorithms. They are robust to
noise and can handle high-dimensional data. They have also been extensively investigated
and employed in a wide range of applications. They also have a strong mathematical foun-
dation.
f (x) = w · x + b (9)
45
In the KNN algorithm, a new data point is classified based on the majority class of its
”nearest neighbors”. The number of neighbors, ”K”, is a hyperparameter that is specified by
the user.
To classify a new data point, the KNN algorithm follows these steps:
• Calculate the distance between the new data point and all the training data points.
• Select the K training data points that are closest to the new data point.
8.4.3 p Hyperparameter
According to the scikit-learn API library [13] the p hyperparameter is the ”Power pa-
rameter for the Minkowski metric. When p = 1, this is equivalent to using
manhattan distance (l1), and euclidean distance (l2) for p = 2. For arbitrary p,
minkowski distance (lp) is used”.
46
Gini impurity is a measure of the probability of misclassifying a randomly chosen element
in a set, and is defined with the Formula 10 where p(i) is the proportion of elements in the
set that belong to class i and n are the number of elements in the set.
n
Gini impurity = 1 − p(i)2 (10)
X
i=1
Entropy is a measure of the amount of uncertainty in a set, and is defined with the
Formula 11 where p(i) is the proportion of elements in the set that belong to class i and n
are the number of elements in the set.
n
Entropy = − (11)
X
p(i) · log(p(i))
i=1
47
9
. On the other hand, for the Support Vector Machine a very high C value was the best
value, meaning there is more regularization. Finally, for the KNN (K-Nearest Neighbor),
the euclidean distance was the best hyperparameter (p hyperparameter = 2).
8.6.2 Evaluation
For evaluating the performance of the different models, the analysis will be done between
four metrics: Accuracy, Precision, Recall, F1-Score. Note that the results to be analyzed
are the weighted average of the metric mentioned before to take into account the number of
instances of each class in the testing. The weighted average is specially preferred when there
is an imbalance in the testing data.
Accuracy is a more simple metric that doesn’t take into account the cost of having a
missclassification. Accuracy is expressed with Formula 12.
Number of correct predictions
Accuracy = (12)
Total number of predictions
The precision measures how many positive predictions are actually positive (True pos-
itive). This metric is particularly appropriate when the cost of False Positive is high. For
instance, in the case of spam detection, a false positive indicates that an email that’s it’s
actually important (no spam) has been classified as spam and thus the user is losing valuable
information. Precision is expressed with Formula 13.
Number of True Positive
P recision = (13)
Total predicted positive
The recall measures how many of the actual positives are predicted as positive. This
metric is the most appropriate when there is a high cost of False Negative. For example, if
a person with cancer is labelled as not having cancer, the consequences could be disastrous.
Recall is expressed with Formula 14.
Number of True Positive
Recall = (14)
Total actual positive
The F1-Score gives us a balance between Recall and Precision and is expressed with
Formula 15.
9
Fitting the training data refers to the process of adjusting a model’s parameters so that it accurately
predicts the output values for the input values in the training set.
48
Precision · Recall
F1-Score = 2 · (15)
Precision + Recall
As we can see in Figure 7, all the models had a better performance in the precision
metric. In other words, they have a lower False Positive rate. Models with high precision
can be considered ”conservative” since it will only predict a positive sample when it’s very
confident that is actually positive. The Support Vector Machine is particularly conservative
since it has the highest precision with a 0.75 score.
Furthermore, all of the models scored lower in the F1-Score. This could be an indication
that the models struggle in correctly identifying positive examples, in other words, there is
a low recall. This can also be caused when the class distribution is imbalanced (specially in
the training set).
Overall, the KNN (K-Nearest Neighbour) had the worst performance. This bad result
could be an indication that the dataset generated is complex 10 since KNN is better suited
for small and more simple datasets.
On the other hand, both Random Forest and Multinomial Naive Bayes curiosly had
the same performance except for the GridSearchCV where Multinomial Naive Bayes had a
slightly better score. Both models are overall the best models. The final model chosen to
integrate to the implementation of the chatbot is Random Forest. The criteria to make the
tiebreaker between the two models was done by taking into account the following:
• Complexity: Random forests are more complex than multinomial Naive Bayes, as
they involve building and training multiple decision trees. On more complicated
datasets, though, they can frequently attain higher accuracy.
• Feature Importance: The relevance of each feature in the model can be determined
by random forests 11 , which is helpful for feature selection or for comprehending the
model’s decision-making process. This kind of information is not provided by Multino-
mial Naive Bayes. This is crucial for understanding how the model ”made the decision”
or ”justified” of choosing an specific cognitive distortion.
10
A complex dataset is one that has a large number of examples, a large number of features, or a high
degree of complexity or non-linearity in the relationships between the input variables and the output variable.
11
The importance of each feature in a random forest model can be determined by examining the amount
by which the model’s accuracy declines when the values of that feature are randomly permuted. This is
known as permutation importance, and it provides a way to quantify how important each feature is to the
predictions made by the model.
49
9 Implementation
In this section, the implementation of the project is going to be explained. The implementa-
tion has been divided between the model experimentation and the chatbot implementation
(the interface and the dialogue flow). The implementation has been done entirely in python
and Jupyter Notebooks. I have decided to use the Python programming language because it
has an extensive libraries specially for machine learning, it’s easy to learn and use and due
to the simplicity of the syntax, it’s also quicker to code.
50
The code has the following structure:
1. Importing Libraries: The first step is importing all the required libraries. The
re library is the package that handles regular expressions as mentioned before. The
wordnet from the NLTK library is also going to be used. Wordnet is a lexical English
database that defines semantics relationships between words (useful to find the meaning
of the words, synonyms, antonyms, etc ...). The main purpose of wordnet for the project
is to build up a dictionary of keywords to the keywords. This will allow me to avoid
manually expand having to introduce every possible alternative word a user could use
to match the specific keyword. The itertools is a module that allows us to handle the
iterators in an efficient way.
2. Building the Keyword List: This part consists of building the list of keywords that
the chatbot needs to look for.
3. Building a dictionary of Intents: Once we have the list of keywords, the next step
is building a dictionary of intents to match intents with keywords.
5. Matching Intents and Generating Responses: In this part the user input is
taken and evaluated to see if there is any keyword. This is done with the RegEx search
function. If there is no keyword match, the dialogue flow is continued (see Figure 12).
As we can see in Figure 8, the conversation is started by Kai by greeting and asking
how the user feels. After receiving the user’s answer, the chatbot detects that the feeling
is negative with the aid of the SentimentAnalyzer of the NLTK library. Then the Kai asks
the user to explain more and what was the person thinking at the time. This last person
is really important since is the automatic thought that is going to be used for the model to
detect the potential cognitive distortion.
51
Figure 8: Screenshot showing the first part of the Dialogue flow [Own Creation]
As we can see in Figure 9, the second part of the dialogue starts which consists in chal-
lenging the Potential Cognitive Distortion by asking questions to make the user rationalize
and think of an alternative way of seeing things. This is a fundamental part of the CBT
therapy. Towards the end of the conversation, Kai tells the user the potential cognitive
distortion detected by the model and gives some tips. If the mood is not improved, which
means that the user doesn’t feel any better after the session, a relaxation exercise is provided.
Figure 9: Screenshot showing the second part of the Dialogue flow [Own Creation]
52
9.3.2 Rule-Based Test
As seen in Figure 10, the chatbot correctly detects the keywords and responds accordingly.
Figure 10: Screenshot showing how the chatbot correctly detects the the keywords and
responds accordingly [Own Creation]
53
10 Conclusions
The main objective of the project was designing and developing a chatbot to support therapy
for people who have mild to moderate sympotms of mental health issues. To achieve that a
through research has been done for the design of the dialogue flow following the Cognitive
Behavioral Therapy (CBT) methodology.
Finding an appropriate dataset for training the machine learning models for the auto-
matic detection of potential cognitive distortions in text presented a significant challenge
for the project. Unfortunately, there isn’t a publicly accessible dataset, thus it was decided
to generate one by gathering examples of cognitive distortions from reliable sources. The
resulting dataset was small and could have been biased. SMOTE and cross validation were
used to resolve this.
Furthermore, a review the different hyperparameters that could significantly impact the
performance of the various models for model experimentation has been done for the hy-
perparameter tuning. We can conclude that the objective of detecting cognitive distortions
was accomplished based on the performance that the model experimentation (see Figure 7)
appears to have achieved.
Finally the chatbot development was done in python and the GUI with TKinter, a
python’s framework. Additionaly, to make the dialogue flow more dynamic, a set of rules
have been defined, to make the chatbot respond accordingly when it detects that the user is
doing certain actions, for instance greeting.
10.2 Reflexions
I was inspired to do this project because I thought it would be interesting to use AI in health
care. After completing this project, I have come to the conclusion that the hype surrounding
machine learning has blinded me. In reality, machine learning models are highly dependent
on data, both in terms of quantity and quality.
It is particularly challenging to apply AI successfully in industries like healthcare, where
data is limited and there are specific regulations with which we must comply. Another
challenge is that most machine learning models end up being ”black boxes” (it is difficult to
54
understand how the model makes the decision based on data). Furthermore, the fact that
we must train the models can be time-consuming, exceedingly error-prone, and frequently
results in failure due to poor data quality. In fact, according to TechRepublic [20], about
85% of AI projects fail in bussines settings and most causes include bad data quality and
problems with data labelling.
If we look at the definition of AI, is a discipline that strives or looks for designing computer
systems that mimic human intelligence without the need of human intervention (except for
the fact that you have obtain the data, clean it and then in some cases manually label
it). While working on the project, I couldn’t help but wonder whether babies and young
children needed to see many dogs in order to recognize them. How do people identify things?
Understanding how our brains truly function and how we learn things is perhaps the first
step toward developing ”real” AI. After all, using massive amounts of data to train the
models may not be the best strategy; perhaps there is another way. Perhaps neuroscience
holds the key to real AI development.
55
A 15 Major Cognitive Distortions
56
B Dialogue Flow
57
development is far more complex than t
compromising the functioning of MD with malicious intent).
software lifecycle process. This is depict
As for the RMS, an Information Security Management
System (ISMS) should be implemented, in which a where this typical software lifecycle proce
within the MDSW EU regulated environment
cybersecurity management plan must be defined and
the previously mentioned general and
implemented, whereby all related risks are identified and
C EU Regulatory Environment for MDSW
requirements.Develop-
adequate risk mitigation measures established. It is
ment
important to consider that cybersecurity risk mitigation
!"# !"#$%3&+2
!45/6-"745$89-:03
!"#-!)*$02,0,
!"#-!)*$02,%0
!45/6-"745$89-:%, !"#-!)*-!)))$%2'3'
45*9$0,%'1%%
$"# 2,(.1&.351)+,
;*'/
!"#$%&'(% 45*9$0,%'1%<
!)*$+,,,% !"#-!)*$0(,,,>?@AB@?C
)*-./$+,,,01% !)*-95$+%,,%121%
!"#$%2003 !"#$0((''
!"#-."$0203+ "')/+./'/3. !45/6-*DE)/$89-:<,
!45/6-"745$89!-:%0 @#AB@C2-DEFGE $.=5)1.>./+&
!45/6-;)$89-:&3
#678'1.
9):.3,3*.
!)*$<03,&
!)*$+03,&
%&'()*)+,- ?.I*6,>./+
!45/6-4*-:32
?.&)0/ 2*)/)3'*-
./0)/..1)/0 .4'*5'+)6/
!)*$<03<< 45*9$0,0,1%
!"#$'0&%10%, !"#$%&%22
;:"!-;;4!$=)(2 !45/6-"745$89-:&%
;;4!$.!/2,
<.&+)/0 26H)/0
11 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard
personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
12 Directive (EU) 2016/1148 of the European Parliament and of the Council of 6 July 2016 concerning measures for a high common level of sec
© Asphali
58
WHITE PAPER
The EURegulatory Environment of Medical DeviceSoftwareDevelopment
Clinical
MDCG 2020-1 Guidance on clinical evaluation (MDR) / Performance evaluation (IVDR) of medical
Evaluation device software
IMDRF/SaMD WG/N41 FINAL:2017 Software as a Medical Device (SaMD): Clinical Evaluation
ISO/IEC 27000:2018(en) Information technology — Security techniques — Information security management
(series) systems — Overview and vocabulary
ISO 27799:2016 Health informatics — Information security management in health using ISO/IEC 27002
Cybersecurity
IEC/CD 81001-5-1 (draft 2021) Health software and health IT systems safety, effectiveness and security — Part 5-1:
Security — Activities in the product lifecycle
MDCG 2019-16 rev.1 Guidance on cybersecurity for medical devices
IMDRF/CYBER WG/N60FINAL:2020 Principles and Practices for Medical Device Cybersecurity
IEC 62366-1:2015 (*) Medical devices - Application of usability engineering to medical devices
ISO 9241-210:2010 Ergonomics of human-system interaction - Human-centered design for interactive
Usability systems
ANSI/AAMI HE75:2009/(R)2018 (*) Human factors engineering- Design of medical devices
AAMI TIR50:2014 (*) Post-market surveillance of use error management
EN 62304:2006/AC:2008(*) Medical device software - Software life-cycle processes
IEC 82304-1:2016 Health software — Part 1: General requirements for product safety
Software lifecycle ISO/IEC 14764:2006 Software Engineering — Software Life Cycle Processes — Maintenance
IMDRF/MC/N35 FINAL:2015 Statement regarding Use of IEC 62304:2006 "Medical device software -- Software life
cycle processes"
(*) Although they are not software-specific, these standards are highly relevant for the development of MDSW.
Table 1 | Standard and guidance documents useful to demonstrate MDSW compliance with MDR.
Table 6: Standard and guidance documents useful to demonstrate MDSW compliance with
MDR. [5]
59
E Number of examples grouped by cognitive distor-
tions
Figure 15: There are numerous alternative hyperplanes that might be used to divide the
two groups of data points (left image). Finding a plane with the greatest margin—that is,
the greatest separation between data points from both classes—is our goal (right image) in
SVM. Images obtained from Towards Data Science [6], a website.
60
References
[1] J. A. Cully and A. L. Teten. A Therapist’s Guide to Brief Cognitive Behavioral Therapy.
Department of Veterans Affairs South Central MIRECC, Houston, 2008.
[2] J. Calvo. 2022. Aprendizaje por transferencia: NLP Blog Europeanvalley. [online]
Europeanvalley. es. Available at: https: // www. europeanvalley. es/ noticias/
aprendizaje-por-transferencia-nlp/ [Accessed, 26, September 2022.
[3] Introduction — Machine Learning — Google Developers — develop-
ers.google.com. https://developers.google.com/machine-learning/guides/
text-classification. [Accessed 26-Sep-2022].
[4] MA. Courtney E. Ackerman. CBT Techniques: 25 Cognitive Behavioral Ther-
apy Worksheets — positivepsychology.com. https://positivepsychology.com/
cbt-cognitive-behavioral-therapy-techniques-worksheets/. [Accessed 27-Sep-
2022].
[5] Asphalion - Scientific and Regulatory Affairs consultancy — asphalion.com. https:
//www.asphalion.com. [Accessed 06-Dec-2022].
[6] Rohith Gandhi. Support Vector Machine — Introduction to Machine Learn-
ing Algorithms — towardsdatascience.com. https://towardsdatascience.com/
support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
[Accessed 01-Jan-2023].
[7] Payscale - Salary Comparison, Salary Survey, Search Wages — payscale.com. https:
//www.payscale.com. [Accessed 06-Oct-2022].
[8] Mental health — who.int. https://www.who.int/health-topics/mental-health#
tab=tab_1. [Accessed 25-Sep-2022].
[9] The treatment gap in mental health care - PubMed — pubmed.ncbi.nlm.nih.gov. https:
//pubmed.ncbi.nlm.nih.gov/15640922/. [Accessed 11-Jan-2023].
[10] Michiyo Hirai and George A. Clum. A meta-analytic study of self-help interventions for
anxiety problems. Behavior Therapy, 37(2):99–111, June 2006.
[11] Benjamin Shickel, Scott Siegel, Martin Heesacker, Sherry Benton, and Parisa Rashidi.
Automatic detection and classification of cognitive distortions in mental health text.
Automatic Detection and Classification of Cognitive Distortions in Mental Health Text,
9 2019.
[12] Ignacio de Toledo Rodriguez, Giancarlo Salton, and Robert Ross. Formulating auto-
mated responses to cognitive distortions for cbt interactions. 2021.
[13] Daniel M Low, Laurie Rumker, John Torous, Guillermo Cecchi, Satrajit S Ghosh, and
Tanya Talkar. Natural language processing reveals vulnerable mental health support
groups and heightened health anxiety on reddit during covid-19: Observational study.
Journal of medical Internet research, 22(10):e22635, 2020.
61
[14] Degree final project — FIB - Barcelona School of Informatics —
fib.upc.edu. https://www.fib.upc.edu/en/studies/bachelors-degrees/
bachelor-degree-informatics-engineering/degree-final-project. [Accessed
29-Nov-2022].
[16] Art. 13 GDPR - Information to be provided where personal data are col-
lected from the data subject - GDPR.eu — gdpr.eu. https://gdpr.eu/
article-13-personal-data-collected/. [Accessed 05-Dec-2022].
62