Full Text 01
Full Text 01
Churn refers to the discontinuation of a contract; consequently, customer churn occurs when exist-
ing customers stop being customers. Predicting customer churn is a challenging task in customer
retention, but with the advancements made in the field of artificial intelligence and machine learn-
ing, the feasibility to predict customer churn has increased. Prior studies have demonstrated that
machine learning can be utilized to forecast customer churn. The aim of this thesis was to develop
and implement a machine learning model to predict customer churn and identify the customer
features that have a significant impact on churn. This Study has been conducted in cooperation
with the Swedish insurance company Bliwa, who expressed interest in gaining an increased under-
standing of why customers choose to leave.
Three models, Logistic Regression, Random Forest, and Gradient Boosting, were used and eval-
uated. Bayesian optimization was used to optimize the models. After obtaining an indication
of their predictive performance during evaluation using Cross-Validation, it was concluded that
LightGBM provided the best result in terms of PR-AUC, making it the most e↵ective approach
for the problem at hand.
Subsequently, a SHAP-analysis was carried out to gain insights into which customer features
that have an impact on whether or not a customer churn. The outcome of the SHAP-analysis
revealed specific customer features that had a significant influence on churn. This knowledge can
be utilized to proactively implement measures aimed at reducing the probability of churn.
i
Sammanfattning
Att förutsäga kundbortfall är en utmanande uppgift inom kundbehållning, men med de framsteg
som gjorts inom artificiell intelligens och maskininlärning har möjligheten att förutsäga kundbort-
fall ökat. Tidigare studier har visat att maskinlärning kan användas för att prognostisera kund-
bortfall. Syftet med denna studie var att utveckla och implementera en maskininlärningsmodell
för att förutsäga kundbortfall och identifiera kundegenskaper som har en betydande inverkan på
varför en kund väljer att lämna eller inte. Denna studie har genomförts i samarbete med det
svenska försäkringsbolaget Bliwa, som uttryckte sitt intresse över att få en ökad förståelse för
varför kunder väljer att lämna.
Tre modeller, Logistisk Regression, Random Forest och Gradient Boosting användes och utvärderades.
Bayesiansk optimering användes för att optimera dessa modeller. Efter att ha utvärderat predik-
tiv noggrannhet i samband med krossvalidering drogs slutsatsen att LightGBM gav det bästa
resultatet i termer av PR-AUC och ansågs därför vara den mest e↵ektiva metoden för det aktuella
problemet.
Därefter genomfördes en SHAP-analys för att ge insikter om vilka kundegenskaper som påverkar
varför en kund riskerar, eller inte riskerar att lämna. Resultatet av SHAP-analysen visade att vissa
kundegenskaper stack ut och verkade ha en betydande påverkan på kundbortfall. Denna kunskap
kan användas för att vidta proaktiva åtgärder för att minska sannolikheten för kundbortfall.
ii
Contents
1 Introduction 1
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The Insurance Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Current Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Customer Churn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Methodology 11
4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Data Collection and Description . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Variable Selection and Feature Engineering . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.1 Hold-Out Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.2 K-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.4 Model Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.5 Evaluation and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Attributes Impacting Decision (SHAP-Analysis) . . . . . . . . . . . . . . . . . . . 13
iii
5 Results 14
5.1 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 SHAP-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Discussion 17
6.1 Business Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 Considerations Concerning Future Implementation . . . . . . . . . . . . . . . . . . 18
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iv
Chapter 1
Introduction
Customer Relation Management (CRM) is about the relationship between organization and cus-
tomer. During the twentieth century, executives and academics became concerned with the topic
of CRM. CRM is a broad subject, but it has four key elements, namely, customer identification,
customer development, customer attraction and customer retention. Customer retention is a cen-
tral concern of CRM, and it is about pleasing customers expectations and demands so that they
become loyal to the organization. When these demands are not met, the opposite situation can
occur, customers churn. Customer churn is when existing customers stop being customers. In
order to manage customer churn, the customers who are at risk of churning should be recognized
and then they should be convinced to stay [1]. The cost of acquiring new customers can be 12
times larger than retaining already existing customers [2]. This makes customer retention an im-
portant topic for many businesses.
With businesses having access to an ever-increasing amount of data, there is a growing inter-
est in using data-driven operations and business analysis to obtain valuable and relevant insights.
A key concern in this area of focus is identifying valuable and tangible metrics that have a business
impact, as well as the methodology for extracting and calculating these metrics through various
models. One such model which can be used in strategic business development is a churn pre-
diction model, a predictive binary classification model that estimates the likelihood of individual
customers discontinuing their services [3]. This makes churn prediction a matter of significance
within a company’s data driven CRM strategy concerning customer retention. By analyzing this
model in terms of feature importance of the outcome of the classification, it is possible to determine
the features that have a significant impact on churn [4].
1.1 Purpose
This study has an exploratory approach with two main objectives. Firstly, to shed light on how
churn can be suitably modeled within the insurance industry based on related work. Secondly, to
investigate the feasibility of implementing such a churn prediction model and identifying which
attributes that have a significant impact on churn.
1
2. Which of the models random forest, logistic regression and gradient boosting yields the best
results regarding predictive performance?
3. Which customer features a↵ect customer churn the most?
1.3 Delimitations
The delimitations stated below has been made:
• This work is based on one associated cluster of customer data provided by Bliwa.
• The models that will be evaluated are logistic regression, random forests and gradient boost-
ing.
• Customer costs will not be considered.
• This thesis treats confidential customer information, hence all sensitive information will be
anonymized throughout the report.
2
Chapter 2
Background
This chapter provides a description of previous work in the field of churn prediction, which has
shaped the machine learning framework of this project. Additionally, it highlights current trends
that impact the insurance industry, along with the definition of customer churn.
Lalwani et al. [3] conducted a customer churn prediction, evaluating a variety of machine learning
algorithms. These were random forest, decision trees, support vector machine, naive bayes, logis-
tic regression and boosting algorithms & extra tree classifiers such as XGBoost, AdaBoost and
CatBoost. They found that XGBoost and AdaBoost performed superior regarding accuracy and
AUC score. Random forest obtained a result of 78.04% in accuracy and 82% AUC score. Logistic
regression got a result of 80.45% in accuracy and 82% AUC score. XGBoost received a result of
80.8% in accuracy and 84% AUC score.
Peng et al. [4] conducted a research on modeling a prediction of customer churn using GA-
XGBoost, optimized using the genetic algorithm. They also integrated the model with a SHAP
analysis in order to predict the actual reason behind churn which they meant was more in line
with the business.
Adbelrahim et al. [5] used algorithms based on a tree structure in order to predict customer
churn. These were decision tree, GBM tree algorithm, XGBoost and random forest. In the com-
parative analysis, they found XGBoost to perform the best in regard of AUC accuracy. However,
using an optimization algorithm for the process of feature selection could further improve the
performance.
Vafeiadis et al. [6] performed a comparative study of machine learning models for the purpose
of customer churn prediction. They used decision tree, support vector machine, logistic regression
and naive Bayes. The result they obtained showed that support vector machine using AdaBoost
yielded the best performance. However, using feature selection strategies could improve the per-
formance.
Coussement et al. [7] conducted an analysis of customer churn using random forest, logistic re-
gression and support vector machine. Initially, the performance of support vector machine was
3
almost equal to that of random forest and logistic regression, but after hyperparameter optimiza-
tion, support vector machine was superior to the others in terms of AUC score and classification
accuracy.
Y. Huang et al. [8] used a variety of classifiers in a project of churn prediction. The results
showed that random forests performed superior in terms of around 55% in PR-AUC and around
87.5% in AUC score. When dividing the data into a number of the top U customers that were the
most likely to churn they obtained a result of 71.55% PR-AUC and 93.26% AUC score. However,
using optimization techniques for the process of feature extraction could improve the performance.
The Riksbank and other central banks have since the beginning of 2022 tightened monetary policy
through an increasing key interest rate to dampen the high inflation and rising inflation expec-
tations. The higher interest rates have led to a slowdown in economic development and the high
inflation a↵ects the insurance industry in several di↵erent ways, especially if it persists for a long
time. For non-life insurance companies, it is about higher costs for repairs that make the handling
of damages more expensive, and for life insurance and occupational pension companies about
larger future payments of value-guaranteed defined benefit pensions. The extent to which the pre-
miums will rise as a result of the high inflation depends, among other things, on the competitive
situation. A slowdown in the economy a↵ects the amount of premiums paid for non-life insurance
because demand for insurance typically declines when consumption and investment decline. The
growth in premiums paid for non-life insurance is therefore expected to decrease in the coming
years when the premiums are adjusted for inflation [9].
• Deliberate/active - the customer chooses to quit the contract in order to change provider.
• Incidental/rotational - the customer chooses to quit the contract without switching to an-
other provider.
• Non-voluntary/passive - the company chooses to discontinue the contract.
4
Chapter 3
This chapter presents a depiction of the machine learning framework used in this project. It
includes a theoretical description of the di↵erent methods and algorithms employed throughout
the project. In the subsequent chapter, a description is provided on how this framework was
applied on the problem at hand.
Feature engineering typically involves four main steps. Firstly, brainstorming features is car-
ried out by studying relevant literature to gather inspiration. Then, the most appropriate features
are selected and extracted based on the specific problem and the characteristics of the available
data. The third step involves selecting the relevant features to be used in training the model.
Finally, the model is evaluated using the selected features [12].
3.2 Models
3.2.1 Logistic Regression
Logistic regression is a discriminative classifier where the response variable is the log of the expec-
tation of being classified in a certain group of a multi-class or binary response. Logistic regression
makes a number of assumptions such as normally distributed responses of the explanatory vari-
able, independence and constant variance. A transformation is applied to the response variable
to achieve a probability distribution that is continuous over the output classes which are bounded
between 0 and 1. This transformation is called the sigmoid function where z corresponds to the
5
log expectation divided by the logit. The sigmoid function is given by
1
(Z) = ·
1 + exp( z)
In the case of binary classification, the logistic regression model can be phrased by a summation
over the linear combinations of weighted input features, plus a bias term which is given by
1
p(y (i) = 1 | x(i) , w) =
1 + exp( wT x(i) b)
respectively
1
p(y (i) = 0 | x(i) , w) = 1 ·
1 + exp( wT x(i) b)
The objective is to determine the set of weights which corresponds to a minimization of the
negative log likelihood over the training set by using optimization techniques such as stochastic
gradient descent or gradient descent. The loss function which is also referred to as the so called
cross-entropy measures the di↵erence between the predicted class and the true label. The loss
function is the object of minimization in order to minimize this di↵erence between the predicted
class and the true label. The loss function is given by
1 X
L(✓) = pi log(yi ) + (1 pi )log(1 yi )
m
as per [13].
In the case of classification, random forest uses multiple trees to compute majority votes in the
concluding leaf nodes when predicting the label. Decision trees are simply a tree-like structure
where the node at the top is considered the root and equals the premier predictor variable and
is recursively split until the decision node or concluding node is reached. The decision tree algo-
rithm is a greedy algorithm, that is, an algorithm which takes the simplest solution, top down
approach which partition the data set into small subsets. To decide which feature to split at each
decision node, the entropy, which is the di↵erence between the true label and the predicted class,
is computed [13]:
Entropy = plog2 (p) qlog2 (q)·
6
Figure 3.1: Algorithm of Random Forests
Instead of instantly solving the optimization problem, each hm can be regarded as a greedy step
⇤
in an optimization
PN following gradient descent for F . Hence, each model hm is trained on another
data set D = 1 (xi , rmi ), where rmi are the pseudo-residuals which are calculated by
@L(yi , F (x))
rmi = ·
@F (x) F (x)=Fm 1 (x)
Subsequently, the value of pm is determined through a line search optimization problem. Gradient
boosting is an algorithm which can su↵er from over-fitting, which means, generalize bad to new,
unseen data, if the iterative process is not regularized properly. To control this additive process
various regularization hyperparameters can be considered [15].
7
3.3 Model Optimization
Machine learning models have hyperparameters which are manually set parameters as opposed to
model parameters which are internally coefficients and found through training. Hyperparameters
often have a known e↵ect on the associated model, but it can be ambiguous how to choose the
optimal ones, especially since the models can have a variety of hyperparameters which can operate
in non-linear ways. Consequently, hyperparameter optimization is the problem of determining such
a set of optimal hyperparameters for the learning algorithm. This optimization process includes
defining a search space which can be thought of in terms of an n-dimensional geometric space,
where each hyperparameter corresponds to a di↵erent dimension and the dimension scale is the
value that the hyperparameter takes. Each point in this search space is a vector which represents
one specific model configuration with values for each hyperparameter. The goal is hence to find
the vector that generates the best model performance [16].
that minimizes f globally and X represents the search space of x. The aim of Bayesian optimization
is to combine the prior distribution of f (x), p(f ), with the sample information to determine the
posterior of this function. Subsequently, the posterior information is utilized to obtain where f (x)
is minimized pursuant to a criterion which is represented by an acquisition function, ap(f ) : X ! R
[17]. The acquisition function is usually an inexpensive function which can be evaluated at a given
point which is proportional with how beneficial it is expected to be for the minimization problem
to evaluate f at x. The acquisition function is then optimized to obtain the next sample point to
be able to maximize the expected utility [18]. The Bayesian optimization algorithm is displayed
below.
Often the evaluation of the acquisition function is cheap compared to the evaluation of f such
that the optimization e↵ort is insignificant [17].
8
overly high variance nor high bias [20].
Predicted Class
Predicted True Predicted False
Actual True True Positive False Negative
Target Class
Actual False False Positive True Negative
The results can be interpreted and evaluated with the measures accuracy, precision and recall.
Accuracy is a ratio of the correctly predicted labels. The accuracy measure is given by
T rueP ositive + T rueN egative
Accuracy = ·
T rueP ositive + T rueN egative + F alseP ositive + F alseN egative
Precision is a measure of how good the model is at predicting the observed true class as true. The
precision measure is given by
T rueP ositive
P recision = ·
T rueP ositive + F alseP ositive
Recall provides notion of the coverage of the positive class. The recall measure is given by
T rueP ositive
Recall =
T rueP ositive + F alseN egative
as per [21].
3.4.3 ROC-AUC
ROC-AUC is a performance metric used to evaluate binary classification models [22]. The ROC
curve is a plot of the true positive rate against the false positive rate at di↵erent classification
thresholds. The AUC is the area under the ROC curve and it ranges from 0.5 to 1. A higher AUC
indicates better performance of the model in distinguishing between the two classes.
9
3.4.4 PR-AUC
PR-AUC is another performance metric that evaluates a binary classifier’s ability to accurately
predict positive instances [23]. It measures the area under the precision-recall curve, which plots
the precision against the recall. PR-AUC is particularly useful for evaluating models on imbalanced
data sets, where the number of positive instances is much smaller than the number of negative
instances [23].
where w = {0, 1}N , N is the amount of input features, '0 = f (hx (0)) and 'i is the value of the
feature attribution:
X | A |!(N | A |! 1)!
'i = [fx (A [ {i} fx (A))]
N!
A2F \{i}
10
Chapter 4
Methodology
This chapter provides a descriptive account of the di↵erent steps of the methodology and outlines
the actions undertaken in each step.
4.1 Data
4.1.1 Data Collection and Description
The data used in this project was provided by the insurance company Bliwa. The provided data
was hashed and all confidential information was excluded in order to avoid the risk of connecting
the values of an independent ID to its real identity. Furthermore, in the subsequent chapters
of this report, some values in the figures and tables may be missing or anonymized in order to
prohibit the disclosure of information regarding the insurance company’s customers.
The data set consisted of approximately 300.000 customers and their corresponding personal fea-
tures and their insurance-related features from the past 5 years. Every customer had 4 distinct
personal features and 5 insurance features that are confidential and, as such, cannot be disclosed.
At this point, a definition of churn was formulated in order to categorize each customer as ei-
ther a churned customer or not. The definition of churn weighted in both the aspect of timeframe
and number of churned insurances. However, due to the confidentiality of the treated information,
the exact definition cannot be disclosed. This definition of churn resulted in a churn rate of 28.11%
for the data set.
Subsequently, the data set was modified to meet the model’s requirements, which included numeric
values in each column, as well as performing feature engineering to retrieve additional information
that could assist the model in predicting the outcome. To transform the data set into the model
requirements, Scikit-learn’s ordinal encoding was used to convert categorical features into integers
11
for the tree models, while one-hot encoding was used for logistic regression to transform categorical
values.
Regarding feature engineering, several additional features were created based on the original set
of features. This was accomplished by using aggregations, such as minimum, maximum, mean,
count and mode, which is the most frequent class in a feature. This was done in order to condense
several insurances to one set of features on the customer level and extract further information
that the original features did not contain, which could be utilized by the model when making
predictions. After performing feature engineering, there were 28 new features, counting up to a
total of 32 di↵erent personal and insurance related features.
12
4.3.3 Modeling
The modeling was performed by iterating cross-validation folds and training each model type
on the 4 training folds, while evaluating metrics on the remaining validation fold. The models
used were a dummy baseline that always predicted the majority class, as well as random forests,
logistic regression and gradient boosting. The baseline, logistic regression and random forest were
implemented using Scikit-learn, while gradient boosting model used LightGBM from Microsoft.
13
Chapter 5
Results
This chapter presents the obtained results, which includes the results regarding model performance
and the SHAP-analysis.
Table 5.3: Model performance of LightGBM when dividing into risk quantiles
14
5.2 SHAP-analysis
Figure 5.1 shows a bar plot of the features which had the most impact on the model output
regarding churn. The features are arranged on the y-axis in the order of importance, with the
most important at the top and the least important at the bottom.
The scatter plot can be interpreted as follows: as previously mentioned, the features are dis-
played on the y-axis in order of importance. The SHAP value, which indicates the change in
log-probability on the outcome, is shown on the x-axis. Each point on the plot represents one
data point, i.e., one costumer. The color of the point corresponds to the original value of that
feature, with red indicating a high feature value and blue indicating a low feature value. For
boolean features, only two colors can be used, whereas for numerical features, the entire spectrum
is available. While examining the graph, if a large proportion of the blue data points are on the
right side of the vertical zero-axis and at the same time a large proportion of the red data points
are on the left side of the zero-axis, this means that a low feature value implicates a higher risk of
churn and vice versa.
15
Figure 5.2: Scatter plot of SHAP analysis Figure 5.3: Violin plot of SHAP analysis
16
Chapter 6
Discussion
This chapter provides a discussion on the obtained results, focusing specifically on their relation to
business analysis and the comparison of predictive accuracy. Additionally, considerations regarding
future implementations are presented, and a conclusion is drawn concerning the research questions.
Regarding which customer features that a↵ect customer churn the most, it can be seen in Figure
5.1 that features 3, 23 and 27-30 seem to have the most impact on the model output regarding
whether a customer is going to churn or not, according to the SHAP analysis.
When reviewing the scatter plot, Figure 5.2, and the violin plot, Figure 5.3 of the SHAP analysis,
it can be observed that a lower value on features 23, 27 and 29 caused the model to assign a higher
probability of churn, and vice versa. For features 3 and 30, a higher value caused the model to
assign a higher probability of churn, and vice versa. As for feature 28, the result is difficult to
interpret.
The feature importance results can guide proactive actions such as targeted marketing campaigns
or personalized o↵ers for customers with high churn risk based on these features. Although all
features in this report are anonymized, a comment can be made on the relevance of the results
regarding feature importance. The features that had most impact on churn respectively non-churn
were sufficiently evident and comprehensible to be able to direct proactive actions.
17
of Y . Huang et al. [8], which used a similar method. Identifying high-risk customers is crucial for
the company to take proactive actions before they potentially churn. By dividing customers into
di↵erent risk groups based on quantiles, the company can take corresponding proactive actions
depending on risk group. Metrics can therefore be weighted against the cost of action and the
potential efficiency of these actions.
Regarding predictive accuracy, these results align with those of related studies. Achieving even
higher predictive accuracy may be challenging due to difficulty in capturing through data what
causes people to churn in reality. The reasons for churn can be diverse, ranging from competitor
marketing campaigns to changes in family relationships, which are hard to predict and capture
in terms of data that the model can interpret. However, a very high accuracy is not necessarily
required for the model to be useful, as its goals can be to act as a guideline or direction to identify
specific groups of customers at high risk of churning. If the goal is to prevent high-risk cus-
tomers from churning through proactive measures, mislabeling a customer as churn and providing
proactive actions would not cause significant issues if they were not actually going to churn.
A more extensive feature engineering could also improve the performance of the model. This
includes trying to extract more customer and insurance features from the data system, which were
not used in this project. It also involves having a more extensive discussion with Bliwa and po-
tential experts on the topic of customer churn to understand what makes people churn and what
kind of features that could be created to capture this information in an interpretable manner for
the model to understand. This extends beyond the data already in their system and could include
external factors such as inflation rates and other environmental factors.
6.4 Conclusion
To summarize this project, it can be concluded that it is feasible to create and implement a rela-
tively accurate prediction model of customer churn which determines the likelihood of a customer
churning and provides the corresponding probability. Additionally, the model o↵ers insights into
why a specific customer is likely to churn or not, in terms of feature importance of the outcome
of the model.
Regarding predictive performance, it could be concluded that random forest and LightGBM per-
formed quite equally, but LightGBM scored slightly better and seems to be the best approach to
the described problem and therefore also answers the research question of which model yields the
best results regarding predictive performance.
In order to address the question regarding the customer behaviour and characteristic that have
the greatest impact on customer churn, it can be concluded that lower values on feature 27, 29
and 23, as well as higher values on feature 30 and 3, have the most significant e↵ect customer
churn.
18
Bibliography
[1] C. Huigevoort and R. Dijkman, “Customer churn prediction for an insurance company,”
Eindhoven Teknoloji Üniversitesi, 2015.
[4] K. Peng and Y. Peng, “Research on telecom customer churn prediction based on ga-xgboost
and shap,” Journal of Computer and Communications, vol. 10, no. 11, pp. 107–120, 2022.
[5] A. K. Ahmad, A. Jafar, and K. Aljoumaa, “Customer churn prediction in telecom using
machine learning in big data platform,” Journal of Big Data, vol. 6, no. 1, pp. 1–24, 2019.
[13] K. Kirasich, T. Smith, and B. Sadler, “Random forest vs logistic regression: binary
classification for heterogeneous datasets,” SMU Data Science Review, vol. 1, no. 3, p. 9,
2018.
[14] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
19
[15] C. Bentéjac, A. Csörgő, and G. Martı́nez-Muñoz, “A comparative analysis of gradient
boosting algorithms,” Artificial Intelligence Review, vol. 54, pp. 1937–1967, 2021.
[16] J. Brownlee, “Hyperparameter optimization with random search and grid search,” Machine
Learning Mastery, 2020. https://machinelearningmastery.com/hyperparameter-
optimization-with-random-search-and-grid-search/.
[17] A. Klein, S. Falkner, S. Bartels, P. Hennig, and F. Hutter, “Fast bayesian optimization of
machine learning hyperparameters on large datasets,” Artificial Intelligence and Statistics,
pp. 528–536, 2017.
[18] J. Wilson, F. Hutter, and M. Deisenroth, “Maximizing acquisition functions for bayesian
optimization,” Advances in Neural Information Processing Systems, vol. 31, 2018.
[19] D. Berrar, “Cross-validation,” Artificial Intelligence and Statistics, 2018.
https://doi.org/10.1016/B978-0-12-809633-8.20349-X.
[20] I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of machine learning algorithms
with di↵erent k values in k-fold cross-validation,” International Journal of Information
Technology and Computer Science, vol. 13, no. 6, pp. 61–71, 2021.
[21] A. Luque, A. Carrasco, A. Martı́n, and A. de Las Heras, “The impact of class imbalance in
classification performance metrics based on the binary confusion matrix,” Pattern
Recognition, vol. 91, pp. 216–231, 2019.
[22] S. Narkhede, “Understanding auc-roc curve,” Towards Data Science, vol. 26, no. 1,
pp. 220–227, 2018.
[23] H. R. Sofaer, J. A. Hoeting, and C. S. Jarnevich, “The area under the precision-recall curve
as a performance metric for rare binary events,” Methods in Ecology and Evolution, vol. 10,
no. 4, pp. 565–577, 2019.
[24] D. Wang, S. Thunéll, U. Lindberg, L. Jiang, J. Trygg, and M. Tysklind, “Towards better
process management in wastewater treatment plants: Process analytics based on shap
values for tree-based machine learning methods,” Journal of Environmental Management,
vol. 301, pp. 113–941, 2022.
20
TRITA – XXX-XXX 20XX:XX
www.kth.se