Make 06 00004
Make 06 00004
knowledge extraction
Article
Machine Learning for an Enhanced Credit Risk Analysis:
A Comparative Study of Loan Approval Prediction Models
Integrating Mental Health Data
Adnan Alagic 1 , Natasa Zivic 2, * , Esad Kadusic 3 , Dzenan Hamzic 1 , Narcisa Hadzajlic 1 , Mejra Dizdarevic 1
and Elmedin Selmanovic 4
Abstract: The number of loan requests is rapidly growing worldwide representing a multi-billion-
dollar business in the credit approval industry. Large data volumes extracted from the banking
transactions that represent customers’ behavior are available, but processing loan applications is a
complex and time-consuming task for banking institutions. In 2022, over 20 million Americans had
open loans, totaling USD 178 billion in debt, although over 20% of loan applications were rejected.
Numerous statistical methods have been deployed to estimate loan risks opening the field to estimate
whether machine learning techniques can better predict the potential risks. To study the machine
learning paradigm in this sector, the mental health dataset and loan approval dataset presenting
survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction
Citation: Alagic, A.; Zivic, N.; ability of the chosen machine learning algorithms. Giving a comprehensive comparative analysis,
Kadusic, E.; Hamzic, D.; Hadzajlic, N.; this paper shows how the chosen machine learning algorithms can distinguish between normal and
Dizdarevic, M.; Selmanovic, E. risky loan customers who might never pay their debts back. The results from the tested algorithms
Machine Learning for an Enhanced show that XGBoost achieves the highest accuracy of 84% in the first dataset, surpassing gradient
Credit Risk Analysis: A Comparative boost (83%) and KNN (83%). In the second dataset, random forest achieved the highest accuracy of
Study of Loan Approval Prediction 85%, followed by decision tree and KNN with 83%. Alongside accuracy, the precision, recall, and
Models Integrating Mental Health
overall performance of the algorithms were tested and a confusion matrix analysis was performed
Data. Mach. Learn. Knowl. Extr. 2024,
producing numerical results that emphasized the superior performance of XGBoost and random
6, 53–77. https://doi.org/10.3390/
forest in the classification tasks in the first dataset, and XGBoost and decision tree in the second
make6010004
dataset. Researchers and practitioners can rely on these findings to form their model selection process
Academic Editor: Andreas Holzinger and enhance the accuracy and precision of their classification models.
Received: 19 November 2023
Revised: 21 December 2023
Keywords: machine learning; prediction; supervised learning; classification; business intelligence;
Accepted: 30 December 2023 boosting algorithms; credit risk; loan approval
Published: 4 January 2024
1. Introduction
Copyright: © 2024 by the authors.
Conventional methods of credit risk assessment to approximate the likelihood of
Licensee MDPI, Basel, Switzerland.
potential losses rely mainly on credit scores and reports. Such reports usually do not
This article is an open access article
provide comprehensive information about the borrower’s creditworthiness since multiple
distributed under the terms and
factors, like financial indicators, demographic data, and customer behavior, like transactions
conditions of the Creative Commons
and spending history, also play a significant role in credit risk assessments. To handle such
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
a large number of factors and provide more comprehensive and large-scale assessments,
4.0/).
supervised machine learning algorithms can be used. The study conducted by Singh
Saini, Bhatnagar, and Rani in 2023 illustrates that the Random Forest Classifier exhibited
the highest accuracy at 98.04%, surpassing the K-Nearest Neighbors Classifier (78.49%)
and Logistic Regression (79.60%). These findings underscore the significant potential of
machine learning algorithms, as highlighted in their research, to enhance the loan approval
process and diminish the risk of loan defaults [1].
Only in America, more than 20 million Americans were burdened with open loans in
2022 accumulating a total debt of USD 178 billion, although over 20% of loan applications
were met with rejection. Such situations lead to missed opportunities for both parties
involved. A bank’s profit or loss depends mostly on loans i.e., whether the customers
are paying back the loan or defaulting. By predicting the loan defaulters, the bank can
reduce its Non-Performing Assets [2]. Loan candidates differ by a large number of factors
ranging from financial habits to demographics. Machine learning algorithms can be used
as a significant tool to incorporate additional factors to identify potential risks and suitable
loan candidates, making the process of decision making for banking industries easier and
more reliable [3,4].
Various cultures notably influence financial habits, which makes demographics an
important factor in loan management. Hence, different countries usually need unique
approaches for credit management. In 2008 in Spain, for instance, the high level of exposure
to the real estate market and substantial borrowing needs placed the sector in a position of
vulnerability to a deterioration in economic activity in Spain and borrowing conditions on
international capital markets. The period from 2007 to 2011 was therefore characterized
by a rapid slowdown in lending up to mid-2009, followed by a drop that subsequently
accentuated, going from annual growth of over 17% in 2007 to a drop of 3.8% in 2011.
In turn, there was a sharp rise in bad loans, particularly loans to the real estate and
construction sector, in a context of worsening financial conditions for real estate businesses,
whose situation progressively deteriorated, given the impossibility of freeing themselves
from their financial burdens by liquidating their real estate assets [5]. The research shows
that, in the UK, people from ethnic minorities are more likely to be denied loans, even those
having good income and credit scores [6].
On the other hand, financial behavior is generally affected by mental health conditions.
Reports show that 56% of those with mental health problems experienced financial hardship
when managing their credit, compared with only 28% of those without mental health
problems [7].
The primary motivation for this research stems from the increasing importance of
assessing borrowers’ creditworthiness in the modern financial landscape.
The goal of this study is to investigate whether machine learning algorithms can
enhance the credit risk prediction process, ultimately benefiting both financial institutions
and borrowers. To achieve this objective, experiments using a mental health dataset and a
loan approval dataset are conducted. Specifically, this paper aims to:
• Propose an intelligent credit risk prediction system that integrates mental health data
into supervised machine learning algorithms;
• Conduct a comprehensive evaluation of multiple classification techniques to identify
the optimal methodology that minimizes overfitting and maximizes performance in
credit risk predictions;
• Analyze the key factors that influence loan approvals and explore their interdepen-
dencies with the response variable, and;
• Establish a framework for future research endeavors that can enhance the accuracy of
predictive models, while also shedding light on the ethical considerations associated
with the utilization of mental health data.
By addressing these objectives, this study contributes to the ongoing efforts to improve
credit risk assessment and lending practices, aiming to provide more accurate assessments
and reduce the potential financial burden on both borrowers and lending institutions.
This paper is meticulously structured to guide readers through a logical progression
of the research. It commences with Section 1, where the context, motivations, objectives,
and contributions of the study are presented. Following the introduction, Section 3 offers a
Mach. Learn. Knowl. Extr. 2024, 6 55
detailed description of the two datasets used in this research—the mental health dataset and
the loan approval dataset—highlighting their relevance to the study. Section 4, introduces
the selected machine learning algorithms and explains the reasons for choosing them,
thereby laying the foundation for the subsequent analytical components. Section 5 outlines
the steps involved in the data acquisition, analysis, and model evaluation, offering valuable
insights into the research process. Progressing to Section 6, a comprehensive analysis of
the outcomes for both datasets is presented, providing an in-depth examination of the
findings. Both Sections 2 and 7 conduct a thorough comparison with prior studies on
similar topics, providing a crucial research context. Section 8 succinctly summarizes the
findings, contributions, and implications of this study, offering a comprehensive overview
of its significance. This well-organized and structured approach ensures that readers can
seamlessly navigate the paper, gaining a comprehensive understanding of the research
presented.
2. Related Research
In their 2023 study, Bhargav and Sashirekha introduced a novel approach by leverag-
ing random forest classifiers to evaluate diverse machine learning methods for forecasting
loan approvals. Drawing from Kaggle’s repository, they employed loan prediction datasets
to scrutinize accuracy and loss metrics. The random forest method presented a precision of
79.44% and a loss of 21.03%, surpassing the conventional decision tree algorithm, which
yielded a precision of 67.28% and a loss of 32.71% in a sample of 20 instances. The subse-
quent statistical examination via an independent sample t-test resulted in a p-value of 0.33,
indicating no noteworthy differences between the techniques at a 95% confidence level.
This investigation suggests that random forest emerges as a more adept predictor of loan
approval compared to the decision tree model [8].
Wang and colleagues (2023) [9] devised a novel stacking-based model aimed at eval-
uating the risks in financial institutions, determining the most effective model through
performance comparisons. Their work extended to crafting a bank approval model using
deep learning on imbalanced data, employing a convolutional neural network for feature
extraction, and implementing counterfactual augmentation for achieving balanced sam-
pling results. The fine-tuning of the auto finance prediction model, grounded in bank model
features, resulted in a substantial 6% boost in the joint loan approval, as substantiated by
the experiments conducted on real-world data.
Abdullah and colleagues (2023) [10] explored a range of machine learning techniques
to forecast nonperforming loans within financial institutions in emerging countries. By
examining the data from 322 banks spanning 15 nations, their comprehensive analysis
revealed that advanced machine learning models, particularly random forest, surpassed
linear methods, achieving an accuracy of 76.10%. Notably, bank diversification surfaced as
the pivotal predictor, surpassing macroeconomic factors in the prediction of nonperforming
loans.
In their 2020 study, Alsaleem and Hasoon examined the performance of machine learn-
ing algorithms in the assessment of bank loan risks, focusing on conventional methods with
the aim of achieving higher accuracy. Notably, they observed that multilayer perceptron
(MLP) outperformed the random forest, naive Bayes (NB), and DTJ48 algorithms in catego-
rizing bank loan risks. The evaluation of the model’s performance was conducted using
traditional metrics on a dataset comprising 1000 loans and their corresponding repayment
status [11].
In Section 7, additional research papers are analyzed and discussed, focusing on their
contributions and shortcomings that are resolved within the scope of this research.
issues can find it challenging to maintain steady employment or earn a stable income, making
it difficult for them to meet their financial obligations and be approved to obtain loans.
Lenders evaluate various factors, such as credit score, income, and the debt-to-income
ratio, to determine the borrower’s ability to repay the loan. Mental health is also one of the
crucial factors that can significantly impact a person’s financial situation.
Mental health problems, such as depression, post-traumatic disorder, anxiety, and
others, can lead to periods of instability, resulting in missed payments, affecting a person’s
decision-making ability, leading to impulsive spending or poor financial choices, or even
bankruptcy. If an applicant has a history of mental health conditions or presents signs of
instability, a lender can perceive him or her as a higher-risk borrower, leading to either a
denial of the loan application or stricter loan terms, such as higher interest rates or collateral
requirements.
The research shows that mental health issues in the workplace can lead to increased
absenteeism, reduced productivity, and decreased employee engagement. Around 70%
of adults in the USA are employed with depression resulting in an estimated 35 million
missed work days annually and costing employers USD 105 billion due to reduced pro-
ductivity. To mitigate this problem, Mental Health First Aid should be introduced to
improve mental health literacy among employees. It is the responsibility of employers to
provide comprehensive benefits packages that support mental health and flexible working
arrangements [13–15].
However, it is essential to note that mental health should not be a barrier to accessing
loans or financial resources. Instead, financial institutions should adopt inclusive policies
and practices that recognize the diverse needs of borrowers, including those with mental
health conditions. This can include providing tailored financial education and counseling
services, flexible loan terms, or partnering with mental health professionals to support
borrowers in managing their finances.
Table 1. Cont.
By understanding these factors and their impacts on loan approval outcomes, lenders
can make more informed decisions when considering loan applications. Such factors are
shown in Table 2.
Delving deeper into the data, a noteworthy distribution within this output variable
was encountered. The examination revealed that there were approximately 40,000 instances
where loans were not approved (No), signifying a substantial portion of loan applications
that did not meet the necessary criteria for approval. This was a critical statistic that
prompted further investigations into the factors contributing to these disapprovals.
Conversely, there were approximately 30,996 instances of loans that were approved
(Yes), suggesting a substantial number of successful loan applications. These data points
are equally important, as they indicate the effectiveness of the loan approval process in
facilitating financial support for those in need.
Understanding the balance between these two categories—false (No) and true (Yes)—
was pivotal for the analysis. It allowed us to assess the overall performance of the loan
approval system, identify areas for improvement, and ultimately enhance the financial
well-being of our clients.
4.4. KNN
The K-nearest neighbors (KNN) algorithm [21] is a classification algorithm used in
fields like image recognition, speech recognition, and natural language processing. Its basic
concept is to determine the class of a new data point based on the classes of its K-nearest
neighbors in the training dataset. The reason for predictions using the k-nearest neighbors
method is based on the assumption that objects of neighbors have similar prediction values.
In other words, KNN measures the distance between a new data point and each point in
the training dataset making it a distance-based measuring approach using methods, like
Euclidean, Manhattan, and Chebyshev distances.
Mach. Learn. Knowl. Extr. 2024, 6 60
There are modifications to the algorithm, like KNN-SI (KNN with sparse interactions),
which proposes using sparse matrices to represent pairwise interactions between features
in a dataset bottom, reducing the computational complexity and enhancing the accuracy of
the KNN algorithm [22]. Another modification of the KNN algorithm can be the weighted
KNN, which improves the accuracy of the algorithm by giving more weight to the closest
neighbors [23].
4.5.1. AdaBoost
Adaptive boosting or AdaBoost is a popular binary classification algorithm that is
used in the training set, meaning that various weights are assigned to each training example
and the predictions of multiple weak learners are combined to obtain the final prediction.
AdaBoost can adapt to the complexity of the data, which was ideal for the case of this study,
and handle noisy or imbalanced datasets. In each iteration, the algorithm assigns higher
weights to misclassified examples, allowing it to focus on the most difficult examples and
learn from its own mistakes. AdaBoost can also work with weak learners, like decision
trees, neural networks, and support vector machines.
However, when AdaBoost becomes too complex and memorizes the training data, it
becomes more susceptible to overfitting, which can be prevented by limiting the number of
iterations. Adaptive boosting with differentiable loss functions (AdaBoostDL) is among the
several extensions of AdaBoost that use the loss function to train weak learners. In such
a way, the non-differentiable data are handled and the stability of the algorithm can be
improved [24,25].
4.5.3. XGBoost
Extreme gradient boosting (XGBoost) is introduced as an extension of gradient boost-
ing that uses a combination of gradient descent and second-order Taylor expansion to
improve the accuracy and speed of the algorithm. The algorithm starts with an initial
model and then adds new models to the ensemble iteratively, to upgrade the performance
of the current ensemble. The new models are trained to predict the negative gradients of
the loss function concerning the current predictions.
One of the key innovations of XGBoost is the use of the second-order Taylor expansion
to approximate the loss function. This allows XGBoost to model the curvature of the loss
function and improve the accuracy of the predictions. Additionally, XGBoost includes sev-
eral regularization techniques and can handle missing values and sparse data without the
Mach. Learn. Knowl. Extr. 2024, 6 61
need for preprocessing. It also supports parallel processing and can be run on distributed
systems, which makes it suitable for large-scale machine-learning tasks [25,28].
In recent years, several improvements and extensions of XGBoost have been proposed.
For example, LightGBM, introduced by Ke et al. in 2017 [29], is a similar algorithm that
uses a different approach to handling the gradient and Hessian matrices. LightGBM is
designed to be even faster and more memory efficient than XGBoost, and has achieved a
state-of-the-art performance for several benchmark datasets.
The process of selecting machine learning algorithms for this study involved a rigorous
evaluation of their technical characteristics and suitability for the credit risk prediction task.
The criteria for algorithm selection encompassed considerations, such as their performance
in prior credit risk analysis studies, ability to handle diverse and high-dimensional datasets,
and relevance to the financial domain.
Two gradient boosting algorithms, XGBoost and GradientBoost, were chosen for
their exceptional robustness in determining complex relationships in the datasets. Their
ensemble-based methodology allowed for sequential improvements in the predictive accu-
racy, making them adept at discerning intricate patterns in the credit-related data.
K-nearest neighbors (KNN) was included due to its effectiveness in capturing patterns
based on proximity, particularly in scenarios where spatial correlations among data points
were significant. Its non-parametric nature makes it especially useful when the underlying
distribution of the data is not explicitly known.
For versatility in handling both categorical and numerical features, we incorporated
both random forest and decision tree algorithms into our ensemble. The former enhanced
the predictive accuracy and robustness against overfitting.
The simplicity and efficiency of naive Bayes made it a valuable addition to our ensem-
ble. It operated well under the assumption of feature independence, offering a probabilistic
classification approach that complemented the other models.
AdaBoost was chosen for its ability to adapt to the weaknesses of individual models.
Through iterative training, AdaBoost enhanced the predictive performance and contributed
to the ensemble’s overall resilience.
The rationale behind assembling this diverse set of algorithms lies in harnessing
their complementary strengths and characteristics. This approach aims to synergize the
predictive capabilities of the ensemble, ensuring a comprehensive exploration of the credit
risk prediction landscape. The detailed technical criteria considered during the algorithm
selection contribute to the methodological rigor of this study, providing a foundation for
robust and insightful analyses.
5. Methodology
This section presents a detailed description of the methodology used to obtain the
results for the loan approval prediction. The code snippet implemented in Python version
3.11.0 included various steps, including data preprocessing, model selection, training,
evaluation, and visualization. Each step is explained in the subsequent text, highlighting
the reasoning behind the selected choices and techniques.
Furthermore, the dataset was sourced from a GitHub repository and loaded into
a Pandas DataFrame. The dataset served as the basis for the loan approval prediction
research in this paper.
By applying the isnull() method to the DataFrame(df) and then summing the resulting
Boolean values, the count of missing values for each column was obtained. This allowed
us to identify the features that had missing values and assess the extent of missingness in
the dataset.
Once the missing values were identified, we needed to proceed with handling them.
Depending on the specific characteristics of the dataset and the nature of the missing
values, different strategies could be employed. In this research, the following approach
was adopted:
Numeric Features: for numeric features with missing values, the missing values were
replaced with the mean value of the respective column, assuming that the missing values
were missing at random and that the mean value was a suitable estimate:
Mach. Learn. Knowl. Extr. 2024, 6 63
Categorical Features: for categorical features with missing values, the missing values
were replaced with the most frequent category (mode) of the respective column assuming
that the missing values could be imputed to the most common category:
categorical_features = [‘categorical_attribute_1’, ‘categorical_attribute_2’]
for feature in categorical_features:
df[feature].fillna(df[feature].mode()[0], inplace = True)
By employing these strategies, we ensured that the missing values for both the numeric
and categorical features were appropriately handled, maintaining the integrity of the
dataset.
After handling the missing values, the label encoding technique was used. The
encoded features were then used as inputs to train the machine-learning models.
To enhance the effectiveness of the data analysis and machine learning models, the
one-hot encoding technique was used, which was particularly valuable when dealing with
categorical variables, as it helped to convert them into a numerical format that could be
comprehended by machine learning algorithms.
In the dataset preprocessing stage of this research, the focus was on the ‘Age’ variable
and a set of categorical columns. The goal was to transform these categorical columns into
a structured format that could be seamlessly integrated into the analytical workflow.
dfX = pd.concat([dataset[“Age”],pd.get_dummies(dataset[categorical_columns])], axis = 1)
dfY = dataset[“obs_consequence”]
dfX
In the first dataset, categorical columns, such as ‘Gender’, ‘Country’, ‘self_employed’, ‘fam-
ily_history’, ‘treatment’, ‘work_interfere’, ‘no_employees’, ‘remote_work’, ‘tech_company’, ‘benefits’,
‘care_options’, ‘wellness_program’, ‘seek_help’, ‘anonymity’, ‘leave’, ‘mental_health_consequence’,
‘phys_health_consequence’, ‘coworkers’, ‘supervisor’, ‘mental_health_interview’, and ‘phys_health_
interview’ were created. Additionally, there was a column called ‘mental_vs_physical.’
In the second dataset, the categorical columns were ‘Married/Single’, ‘House_Ownership’,
‘Car_Ownership’, ‘Profession’, and ‘City’. These columns provided categorical information
about individuals in this dataset.
The process was begun by isolating the ‘Age’ column, as well as the categorical
columns that required encoding. These categorical columns contained valuable information,
but needed to be converted for machine-learning compatibility.
With the ‘Age’ column and the one-hot encoded representations of the categorical
columns prepared, the pd.concat function was used to combine these DataFrames horizon-
tally (along the columns axis). This resulted in a new data frame named dfX.
The power of one-hot encoding was represented by the pd.get_dummies function.
This function efficiently transformed the categorical columns into binary columns, where
each unique category became a new binary column. If a row belonged to a particular
category, the corresponding binary column received a value of 1; otherwise, it received a
value of 0.
The resulting dfX DataFrame presented a mixture of numerical and binary columns
at this stage, where the binary columns represented the various categories present in the
original categorical columns. This transformation enabled us to effectively include these
categorical variables in the machine learning models.
As a final step, the target variable ‘obs_consequence’, was also prepared and stored in
a separate data frame named dfY. This separation of predictors (dfXs) and target variables
(dfYs) adhered to the best practices for machine learning.
Mach. Learn. Knowl. Extr. 2024, 6 64
In the provided code snippet, the preprocessing LabelEncoder() function from scikit-
learn was utilized to perform label encoding. The categorical features that needed to be
encoded in the encoded_features list were specified. By iterating over these features, the
label encoding transformation of each feature in the dataset could be applied.
The code snippet below demonstrates the application of label encoding to the categor-
ical features in the dataset:
abel_encoder = preprocessing.LabelEncoder()
encoded_features = [‘attribute_1’, ‘attribute_2’, ‘attribute_3’, ‘attribute_4’]
Stratified sampling is also crucial when working with small datasets or datasets where
certain classes are underrepresented. By maintaining the class distribution during the
train–test split, each class was adequately represented in both sets, enabling the model to
learn from and evaluate diverse instances.
As seen in Figure 1, since the loan approval dataset has a relatively balanced distribu-
tion between the two classes (approved and not approved), the need for stratification is
reduced.
Stratification is more crucial when dealing with highly imbalanced datasets, where
one class significantly outnumbers the other. In such cases, it helps ensure that both classes
are represented adequately in the training and testing sets.
To ensure the reproducibility of the experiments, it was essential to set a random seed
when performing the train–test split. This allowed us to obtain the same split each time the
code was run, facilitating a consistent evaluation and comparison of the models.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
the train–test split, each class was adequately represented in both sets, enabling the model
to learn from and evaluate diverse instances.
As seen in Figure 1, since the loan approval dataset has a relatively balanced distri-
Mach. Learn. Knowl. Extr. 2024, 6 bution between the two classes (approved and not approved), the need for stratification 65
is reduced.
Figure 1.
Figure 1. Distribution
Distribution of
of target
target variables.
variables.
Stratification
By setting theis‘random_state’
more crucial when dealing
parameter to awith highly
specific imbalanced
value (e.g., 42),datasets, where
it was ensured
one the
that classtrain–test
significantly
split outnumbers the other.
was deterministic, In such
yielding thecases,
same it helps
split ensure
every timethat
the both
code clas-
was
ses are represented adequately in the training and testing sets.
used.
To ensure
Taking intothe reproducibility
account of the
these corner experiments,
cases enhanced the it was essential and
robustness to set a randomofseed
reliability the
when performing
model evaluation. the train–test
It ensured split.
that theThis
modelallowed us to obtain
was evaluated for the
the same split each data,
representative time
the code was
considered run,
class facilitating
imbalance, anda consistent
allowed for evaluation and comparison
the reproducibility of the models.
of the experimental results.
X_train,
5.5. ModelX_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
Training
The chosen machine learning model was trained using the fit method, which was a
By setting
fundamental the ‘random_state’
function parameter
in scikit-learn to a specific
that allowed value
the model (e.g., from
to learn 42), itthe
wasprovided
ensured
that the train–test
training data. Thesplit was deterministic,
training yielding
data, consisting the
of the same split
encoded every (X_train)
features time the code
and was
the
used.
corresponding risk flag labels (y_train), were provided as inputs for the fit method for each
Taking
machine into account
learning algorithmthese corner
in this cases enhanced the robustness and reliability of the
research:
model evaluation. It ensured that the model was evaluated for the representative data,
model.fit(X_train,
considered y_train)
class imbalance, and allowed for the reproducibility of the experimental re-
sults.
5.6. Model Evaluation
5.5. Model Training
To evaluate the models’ performances after training, various evaluation metrics to
assessThe
their predictive
chosen machine capabilities
learning were
modelused. The evaluation
was trained using themetrics provided
fit method, whichinsights
was a
into the models’ accuracy, precision, recall, and F1 score (all described in Section
fundamental function in scikit-learn that allowed the model to learn from the provided6: Results),
providing a comprehensive understanding of their performance outcomes. During the
training data. The training data, consisting of the encoded features (X_train) and the cor-
model evaluation stage, the prediction method of the trained model was used to generate
responding risk flag labels (y_train), were provided as inputs for the fit method for each
predictions (y_pred) for the testing set (X_test). These predictions were compared to the
machine learning algorithm in this research:
true labels (y_test) to calculate the evaluation metrics.
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
the analysis of true negatives (TNs), false positives (FPs), false negatives (FNs), and true
positives (TPs), thereby enabling a more comprehensive understanding of the model’s
performance.
To enhance the interpretability of the confusion matrix, the seaborn library was im-
ported to create a heatmap visualization. The heatmap displayed the confusion matrix with
annotations for each category, offering a visual representation of the model’s predictive
accuracy and potential misclassifications.
labels = [‘Negative prediction’, ‘Affirmative prediction’]
confusion_mat = confusion_matrix(y_test, y_pred, labels = labels)
In the code, a subplot with a specified figure size was created to accommodate the
heatmap. The sns.heatmap function was used to generate the heatmap, with the following
parameters:
• confusion_mat: the confusion matrix to be visualized.
• annot = True: enabled the annotation of each cell in the heatmap with the correspond-
ing count.
• fmt = ‘d’: formatted the annotations as integers.
• cmap = ‘Blues’: specified the color map for the heatmap.
• xticklabels = labels: set the labels for the x-axis tick marks to the specified labels.
• yticklabels = labels: set the labels for the y-axis tick marks to the specified labels.
• ax = ax: specified the subplot to which the heatmap was plotted.
The resulting heatmap provided a clear and concise visualization of the confusion
matrix, allowing us to observe the distribution of correct and incorrect predictions across
different categories. By examining the heatmap, the patterns and potential areas of im-
provement in the model’s performance could be identified.
6. Results
The results obtained from the model evaluation, including the accuracy, F1 score,
precision, recall, and the confusion matrix, are printed by the console providing valuable in-
sights into the effectiveness of the chosen machine learning algorithm for the loan approval
prediction.
The base metric used for model evaluations is often accuracy, describing the number
of correct predictions over all the predictions:
TN + TP
Accuracy =
TN + FP + FN + TP
The subsequent metric is precision, which measures how many of the positive predic-
tions made are correct (true positives):
TP
Precision =
TP + FP
Recall is a measure of how many of the positive cases the classifier correctly predictes,
over all the positive cases in the data:
TP
Recall =
TP + FN
Mach. Learn. Knowl. Extr. 2024, 6 67
The F1 score is a measure combining both precision and recall. It is generally described
as the harmonic mean of the two:
Precision ∗ Recall
F1 Score = 2 ∗
Precision + Recall
The confusion matrix visualization allowed for a comprehensive analysis of the
model’s classification accuracy and error rates. The practical realization of the algorithms
was carried out in the Python programming language, with several established libraries,
such as Pandas, Numpy, and Sklearn, being utilized for data processing. Upon loading the
dataset, preprocessing was performed to enhance the algorithm’s efficacy, leading to more
favorable results.
The results of the analysis shed light on the performance of various classification
models based on the key evaluation metrics. Accuracy, which measures the percentage
of correctly classified instances out of the total instances, provides an initial assessment
of a model’s overall effectiveness. In this study, XGBoost emerged as the top-performing
model with an impressive accuracy of 84%. It was closely followed by gradient boost
and random forest, which achieved accuracy scores of 83%. AdaBoost also demonstrated
a strong performance with an accuracy rate of 81%. However, KNN and decision tree
algorithms yielded lower accuracy scores of 80% and 75%, respectively.
The discrepancies in the accuracy scores between the models can be attributed to their
underlying algorithms. Naive Bayes relies on the assumption of feature independence,
which may not function in complex real-world datasets. Consequently, this assumption
could have contributed to naive Bayes’ lower accuracy score. On the other hand, AdaBoost
is known to be sensitive to noisy data and outliers, which could have affected its accuracy
in this study.
While accuracy is a crucial metric, it alone does not provide a comprehensive picture
of a model’s performance. Precision, which quantifies the proportion of true positives
among instances classified as positive, is another vital measure. The findings of this paper
indicate that XGBoost and random forest algorithms exhibit the highest precision scores,
achieving 62% and 60%, respectively. Gradient boost also demonstrated a strong precision
performance with a score of 47%. In contrast, the KNN and naive Bayes models had the
lowest precision scores, scoring 23% and 17%, respectively.
The recall metric evaluates a model’s ability to correctly identify positive instances
among all actual positive instances. Naive Bayes outperformed the other models with the
highest recall score of 91%, followed by decision tree with 21% and gradient boost with
16%. In contrast, the random forest and KNN models exhibited the lowest recall scores,
Mach. Learn. Knowl. Extr. 2024, 6 68
with values of 7%. These low recall scores indicate that the random forest and KNN models
struggle to effectively capture the underlying patterns in the data.
Finally, the F1 score, which considered both precision and recall, was examined. The
F1 score provides a balanced measure of a model’s performance. Naive Bayes achieved the
highest F1 score of 28%, followed by gradient boost with 24% and decision tree with 22%.
In contrast, the random forest and KNN models presented the lowest F1 scores, achieving
values of 12% and 11%, respectively.
Based on these evaluation metrics, XGBoost emerged as the best-performing model,
closely followed by gradient boost and random forest. However, it is important to consider
that the choice of the most suitable model depends on the specific problem and dataset at
hand. Therefore, a thorough evaluation across multiple metrics and a comparative analysis
of alternative models are essential to make an informed decision.
Additionally, computational resources should also be taken into account, especially
when dealing with large datasets. Some models can be computationally expensive, and
their performance needs to be balanced against the available resources. Furthermore, it is
important to interpret the analysis results cautiously. For instance, a model that achieves
high accuracy, precision, and recall scores on the training set might not perform well on the
testing set, indicating overfitting. Techniques, such as cross-validation and regularization,
can help ensure that the model adapts well to the new data.
To evaluate the effectiveness of a classification model, it is common practice to utilize
various metrics derived from the confusion matrix. The confusion matrix provides a
comprehensive overview of the model’s performance by presenting four key metrics: true
negatives (TNs), false positives (FPs), false negatives (FNs), and true positives (TPs). In our
analysis, the results of the confusion matrix are presented in Table 5 and Figures 2 and 3.
The observed confusion matrix shows a noteworthy rate of false negatives (1.59%) and
a relatively higher rate of true positives (4.76%). These findings suggest that naive Bayes
exhibits high sensitivity when identifying positive cases, implying a lower specificity for
distinguishing negative cases.
Turning our attention to the K-nearest neighbors (KNN) model, a high number of true
positives (78.97%) and a lower rate of false positives (3.97%) were observed. This indi-
cates that KNN is proficient at identifying positive cases, but can occasionally misclassify
negative cases as positive.
When analyzing the confusion matrix results for the decision tree model, the per-
centages of true negatives (3.97%), false positives (11.90%), false negatives (13.10%), and
true positives (71.03%) were observed. These findings suggest that the decision tree
model demonstrates a relatively high true-positive rate and notable false-positive and
false-negative rates in its predictions.
In the case of random forest, the second highest rate of true positives (81.35%) and a low
rate of false positives (1.59%) were observed. However, the model also exhibited a relatively
high rate of false negatives (15.48%), indicating that, while it exceled at correctly identifying
positive cases, it struggled with missing some actual positive instances, potentially affecting
its overall performance in specific applications.
Bayes exhibits highexhibits
Bayes sensitivity
highwhen identifying
sensitivity when positive cases,
identifying implying
positive a lower
cases, specific-
implying a lower specific-
ity for distinguishing negative cases.
ity for distinguishing negative cases.
Turning ourTurning
attentionourto attention
the K-nearest
to theneighbors
K-nearest(KNN) model,
neighbors a high
(KNN) number
model, of number of
a high
true positives (78.97%)
true and
positives a lowerand
(78.97%) rateaoflower
false rate
positives (3.97%)
of false were(3.97%)
positives observed.
wereThis in-
observed. This in-
dicates
Mach. Learn. Knowl. Extr. 2024, 6that dicates
KNN isthatproficient
KNN isatproficient
identifyingat positive cases,
identifying but can
positive occasionally
cases, misclas-
but can occasionally misclas-
69
sify negativesify cases as positive.
negative cases as positive.
Figure 2. Graphical
Figure presentation
Figure 2. Graphical of
2. Graphical the confusion
presentation
presentation of matrix
of the
the valuesmatrix
confusion
confusion for thevalues
matrix mentalfor
values health
for dataset.
the mental
the mental health dataset.
health dataset.
Figure 3. Graphical
Figurepresentation
Figure 3. Graphical
3. of the confusion
Graphical presentation
presentation of matrix
ofthe
the values
confusion
confusion for the
matrix
matrix mental
values
values health
for
for the
the dataset
mental
mental (boost-
health
health dataset
dataset (boost-
(boosting
ing algorithms).
ing algorithms).
algorithms).
When analyzing
Whenthe
Shifting confusion
analyzing
the tomatrix
focusthe results
confusion
AdaBoost, for the
matrix decision
results
a relatively treedecision
forrate
high the ofmodel, the
treepercent-
model,
false negatives the percent-
(14.68%) and a
ages of truerelatively
negatives
ages of true(3.97%),
high false
negatives
rate of positives
(3.97%),
true (11.90%),
false
positives false
positives
(78.57%) negatives
(11.90%),
were found. false(13.10%),
This negatives
suggestsand true
(13.10%),
that and true
the AdaBoost
model is effective in correctly identifying positive cases, but may need improvement in
reducing the number of false negatives to enhance its overall performance.
Similarly, the gradient boost model displayed a relatively high rate of false negatives
(13.89%) and a high rate of true positives (79.76%). These results indicate that the model
excels at correctly identifying positive cases, but can benefit from further optimizations to
reduce the occurrence of false negatives and enhance its overall performance.
Finally, when examining XGBoost, the smallest rate of false positives (1.19%) and
the highest rate of true positives (81.75%) were observed. This suggests that XGBoost is
particularly effective at minimizing false-positive errors and excels at correctly identifying
positive cases, making it a strong candidate for tasks where precision is crucial.
Comparing the confusion matrix metrics across these classification models, it can
be observed that each model has its strengths and weaknesses. While some models can
excel in correctly identifying positive cases, they can miss out on some negative cases.
On the other hand, some models can accurately identify negative cases, but also tend
highest rate of true positives (81.75%) were observed. This suggests that XGBoost is par-
ticularly effective at minimizing false-positive errors and excels at correctly identifying
positive cases, making it a strong candidate for tasks where precision is crucial.
Comparing the confusion matrix metrics across these classification models, it can be
Mach. Learn. Knowl. Extr. 2024, 6 observed that each model has its strengths and weaknesses. While some models can excel
70
in correctly identifying positive cases, they can miss out on some negative cases. On the
other hand, some models can accurately identify negative cases, but also tend to misclas-
sify negative cases
to misclassify as positive.
negative cases asThese variations
positive. Thesehighlight the highlight
variations importance theofimportance
carefully con-
of
sidering the specific characteristics of the dataset and the desired classification goals when
carefully considering the specific characteristics of the dataset and the desired classification
selecting
goals whenan selecting
appropriate model.
an appropriate model.
The
The analysis of feature importance
analysis of feature importance in in the
the dataset,
dataset, as
as seen
seen in
in Figure
Figure 4,
4, conducted
conducted using
using
aa random
random forest
forest classifier,
classifier, provided
provided valuable
valuable insights
insights into the factors that wielded the
into the factors that wielded the
most
most substantial
substantial influence
influence when
when itit came
came toto predicting mental health
predicting mental health consequences.
consequences. These
These
findings are essential for understanding the dynamics of mental health in
findings are essential for understanding the dynamics of mental health in the workplace the workplace
and
and can
can inform strategies for
inform strategies for better
better support
support and
and intervention.
intervention.
Figure
Figure 4.
4. Graphical
Graphical presentation
presentation of
of feature
feature importance
importance using
using the
the random
random forest
forest algorithm
algorithm for the
for the
mental health dataset.
mental health dataset.
The mental_vs_physical feature is at the top of the list of influential features, with
an importance value of approximately 6.28%. This suggests that employees who are
uncertain or unaware of their company’s stance regarding mental health benefits are more
likely to experience mental health problems. This uncertainty appears to be a significant
contributing factor.
The mental_health_consequence feature follows closely behind, with an importance
value of about 4.78%. This feature indicates that individuals who perceive no mental
health consequences in their workplace environment are less likely to face such issues. It
underscores the roles of awareness and perception in shaping mental health outcomes.
Age, a fundamental demographic factor, is another significant predictor, with an
importance value of around 4.01%. This suggests that age plays a role in determining
mental health outcomes, with different age groups experiencing varying levels of mental
health challenges. It highlights the need for age-sensitive mental health support strategies.
Regarding recall, which measures a model’s capability to capture all pertinent positive
instances, the highest recall score is 83%. Considering these findings presented in Table 6,
it can be deduced that the random forest algorithm represents the premier-performing
algorithm across all four evaluation metrics—accuracy, precision, recall, and F1 score.
The confusion matrix results, as shown in Figures 5 and 6 and Table 7, present XGBoost
as the top-performing algorithm, boasting a high true-positive rate of 53.96%. However, it
is important to note that XGBoost also exhibits the highest false-negative rate of 39.01%,
Mach. Learn. Knowl. Extr. 2024, 6, FOR PEER REVIEW 20
which means it excels at correctly identifying positive cases but can miss out some true
positives, potentially requiring further optimizations in certain scenarios.
Figure 5.
Figure 5. Graphical
Graphical presentation
presentation of
of the
the confusion
confusion matrix
matrix values
values for
for the
the loan
loan approval
approval dataset.
dataset.
The gradient boost model achieved a true-positive rate of 47.05% and a true-negative
rate of 11.75%. Similarly, the AdaBoost model exhibited a true-positive rate of 47.21% and
a true-negative rate of 11.16%. This indicates that both models are fairly adept at correctly
identifying positive cases while also maintaining a reasonable ability to correctly identify
negative cases.
The gradient boost model achieved a true-positive rate of 47.05% and a true-negative
rate of 11.75%. Similarly, the AdaBoost model exhibited a true-positive rate of 47.21% and
a true-negative rate of 11.16%. This indicates that both models are fairly adept at correctly
Mach. Learn. Knowl. Extr. 2024, 6 identifying positive cases while also maintaining a reasonable ability to correctly identify
72
negative cases.
Figure 6.
Figure 6. Graphical
Graphicalpresentation of the
presentation confusion
of the matrix
confusion values
matrix for the
values forloan
theapproval datasetdataset
loan approval (boost-
ing algorithms).
(boosting algorithms).
The gradient boost model achieved a true-positive rate of 47.05% and a true-negative
rate of 11.75%. Similarly, the AdaBoost model exhibited a true-positive rate of 47.21% and
a true-negative rate of 11.16%. This indicates that both models are fairly adept at correctly
identifying positive cases while also maintaining a reasonable ability to correctly identify
negative cases.
In contrast, the naive Bayes model displayed the poorest performance in terms of
true-positive and true-negative rates, achieving a meager true-positive rate of only 26.06%
and a true-negative rate of 30.18%. On the other hand, the decision tree and random forest
models showcased relatively robust performances, with true-positive rates of 46.56% and
49.99%, respectively.
On the whole, considering the evaluation of true-positive and true-negative rates,
as well as false-positive and false-negative rates, the decision tree and random forest
algorithms emerged as the optimal algorithms in this comparative analysis. These models
demonstrated high rates of correctly identifying positive and negative instances, while also
exhibiting relatively low rates of misclassifications. These findings suggest that decision tree
and random forest models have the most favorable balance between accurately detecting
positive and negative instances while minimizing erroneous classifications.
The analysis of feature importance in the dataset, as shown in Figure 7, utilizing a
model, highlighted key factors that significantly influenced the outcomes. Age emerged as
the most dominant feature, with an importance score of approximately 16.28%, indicating
its substantial impact on the predicted results.
model, highlighted key factors that significantly influenced the outcomes. Age emerged
as the most dominant feature, with an importance score of approximately 16.28%, indicat-
ing its substantial impact on the predicted results.
Other features, such as Car_Ownership (both yes and no categories), and various
Mach. Learn. Knowl. Extr. 2024, 6 professions, like flight attendant, physician, drafter, psychologist, web designer, and sec-
73
retary, also played a role in influencing the outcomes to a lesser extent.
Figure
Figure 7.
7. Graphical
Graphical presentation
presentation of
of feature
feature importance
importance using
using the
the random
random forest
forest algorithm
algorithm for
for the
the
loan approval dataset.
loan approval dataset.
Other features, such as Car_Ownership (both yes and no categories), and various pro-
fessions, like flight attendant, physician, drafter, psychologist, web designer, and secretary,
also played a role in influencing the outcomes to a lesser extent.
In summary, age stood out as the primary driver, while certain professions and car
ownership status also exhibited an influence on the results. These findings provide valuable
insights for understanding the factors affecting the predicted outcomes of the dataset.
7. Discussion
Sujatha et al. (2021) [30], Tumuluru et al. (2022) [31], and Mamun et al. (2022) [32] used
different machine learning algorithms and techniques to predict loan approvals. However,
despite the differences in the methodologies, all three studies used supervised learning
algorithms to predict loan approvals.
Sujatha et al. (2021) [30] used four machine learning algorithms, namely, logistic
regression, decision tree, random forest, and KNN, to predict loan approvals. With the use
of data preprocessing techniques in their study, such as missing value imputation, feature
scaling, and encoding categorical variables, the logistic regression algorithm achieved
the highest accuracy of 84.55%, followed by the random forest, decision tree, and KNN
algorithms, with accuracy scores of 80.49%, 70.73%, and 65.04%, respectively. The authors
attribute the high accuracy to the fact that logistic regression is a linear model, which is
more suitable for this kind of problem where there is a clear boundary between the classes.
Tumuluru et al. (2022) [31] also used data preprocessing techniques, such as feature
scaling, normalization, and one-hot encoding, as well as four machine learning algorithms,
namely, logistic regression, random forest, k-nearest neighbor, and support vector machine,
to predict loan approvals. They found that random forest had the highest accuracy of
81%, followed by logistic regression (77%), SVM (73.2%), and KNN (68%). The authors
attributed the success of the random forest algorithm to the fact that it was an ensemble
learning algorithm that combined multiple decision trees, which could achieve a better
performance than individual decision trees.
Mamun et al. (2022) [32] used six machine learning algorithms, namely, XGBoost,
AdaBoost, LightGBM, decision tree, random forest, and KNN, to predict loan approvals.
They used data preprocessing techniques, such as feature scaling, missing value imputation,
and encoding categorical variables. The authors found that LightGBM had the highest
accuracy of 91.89%, followed by random forest with an accuracy of 91.88%, then XgBoost,
AdaBoost, and KNN with accuracies of 91%, 91.87%, and 91.67%, respectively. The lowest
accuracy of 84.97% belonged to the decision tree algorithm. The authors attributed the high
Mach. Learn. Knowl. Extr. 2024, 6 74
accuracy of the random forest algorithm to its ability to handle both linear and non-linear
relationships between the features and the target variable.
These findings are consistent with the results obtained in this study. However, it is
important to note that, in this paper, the models were evaluated using additional metrics,
such as recall, precision, and F1 score, which provided a more comprehensive evaluation
of the models’ performances.
One limitation of the abovementioned studies was that they did not consider the
interpretability of the models. While machine learning models show high accuracy out-
comes when predicting loan approvals, their lack of interpretability makes it difficult to
understand how they make decisions. As a result, it can be challenging to explain to
customers why their loan applications are accepted or rejected.
Another limitation was that these studies did not consider the impact of the imbalanced
dataset. In the loan approval prediction, the number of rejected loan applications was often
higher than the number of approved applications. This imbalance can affect the accuracy of
the models and can lead to biased predictions. Future research should explore the methods
for addressing imbalanced datasets and improving the interpretability of the models.
Although the accuracy of the models varied depending on the used algorithms, all
three studies achieved high accuracy results when predicting loan approvals. The findings
of these studies can help banks and financial institutions make informed decisions and
reduce the risk of defaults. However, the interpretability and the impact of imbalanced
datasets need to be considered in future research.
When handling mental health data in the context of credit risk prediction in the
European Union, it is imperative to adhere to the General Data Protection Regulation
(GDPR), a comprehensive legal framework effective from 25 May 2018. Several key articles
within the GDPR are particularly relevant to the processing of sensitive personal data,
including mental health information.
Article 6 of the GDPR addresses the lawfulness of processing personal data. Consent
(Article 6(1)(a)), the necessity of processing for the performance of a contract (Article
6(1)(b)), and compliance with a legal obligation (Article 6(1)(c)) are examples of legal bases
applicable to the processing of mental health data.
Article 9 specifically deals with the processing of special categories of personal data,
including health data. Article 9(2)(a) permits processing with explicit consent, while 9(2)(b)
allowing the processing to carry out tasks in the field of employment and social security.
Article 5 outlines the principles of data processing, including data minimization
(Article 5(1)(c)) and the integrity and confidentiality of processing (Article 5(1)(f)), both
crucial considerations when handling mental health data.
Recital 22 provides an additional context regarding the conditions for processing spe-
cial categories of personal data. It emphasizes the need for explicit consent and underscores
the importance of safeguarding individual rights.
While the GDPR does not explicitly mention credit risk prediction, its conditions
regarding the processing of special categories of personal data, such as mental health
information, are applicable across various contexts. Therefore, organizations, including
banks, must adhere to the principles and requirements outlined in these relevant articles
of the GDPR when implementing systems involving mental health data for credit risk
prediction.
8. Conclusions
This study aimed to demonstrate the potential to revolutionize the evaluation of
borrowers’ creditworthiness by financial institutions by integrating mental health data as
an important predictor variable in the loan approval process. The supervised machine
learning algorithms used in this research showcased superior performances in accurately
identifying individuals with a higher risk of defaulting on their loans. For instance, the
XGBoost algorithm achieved the highest accuracy of 84% in the first dataset, surpassing
gradient boost (83%) and KNN (83%). In the second dataset, the random forest algorithm
Mach. Learn. Knowl. Extr. 2024, 6 75
achieved the highest accuracy of 85%, followed by the decision tree and KNN algorithms
with 83%.
In comparison to the studies discussed in the previous section, this study utilized a
similar data preprocessing approach, with techniques, such as feature scaling and one-hot
encoding. However, the missing value imputation was not performed as the dataset used
in this paper did not have any missing values. Additionally, different sets of machine
learning algorithms, including naive Bayes and AdaBoost, were used in this study, which
achieved lower accuracy scores compared to the other models. This could be attributed to
the underlying algorithms’ limitations, as previously discussed.
The findings of this paper align with the previous studies showing that machine
learning algorithms can achieve high accuracy results when predicting loan approvals.
However, it is important to consider the limitations of the models, such as their lack of
interpretability and the impact of imbalanced datasets. Future studies should focus on
developing more accurate predictive models by incorporating additional variables related
to the mental health states of borrowers, such as stress levels, anxiety, and depression.
Another important area of research pertains to the ethical considerations surrounding
the use of mental health data in the loan approval process. While the integration of
these data can improve decision making, this raises valid concerns regarding privacy and
discrimination.
Adhering to the General Data Protection Regulation (GDPR) is crucial when handling
mental health data for credit risk predictions in the European Union. Key GDPR articles,
including those addressing the lawfulness of processing personal data (Article 6) and
special categories, like health data (Article 9), emphasize obtaining explicit consent and
safeguarding individual rights. Organizations, including banks, must follow these GDPR
provisions when utilizing mental health data for credit risk predictions, despite the absence
of explicit mentions of credit risk predictions in the GDPR.
Therefore, future investigations should also address the development of ethical guide-
lines and policies to ensure the responsible use of mental health data in the loan approval
process.
Future research should explore the methods for improving the interpretability of the
models and addressing imbalanced datasets to improve the models’ performances.
In conclusion, the numerical results obtained in this study highlight the superior
performance of supervised machine learning algorithms, such as random forest, when
accurately predicting loan default risks based on mental health data. These findings indicate
the potential benefits for financial institutions of adopting machine learning algorithms in
their loan approval processes, particularly when evaluating borrowers with mental health
issues. By embracing these advancements, financial institutions can enhance their risk
assessment capabilities and make more informed lending decisions.
Author Contributions: Conceptualization, E.K., D.H., A.A., M.D. and N.H.; methodology, E.K., A.A.,
M.D. and N.H.; software, A.A., D.H. and M.D.; validation, D.H., E.K., A.A. and M.D.; formal analysis,
D.H., E.K., N.Z., A.A., M.D., N.H. and E.S.; investigation E.K., A.A. and M.D.; resources, E.K., A.A.,
M.D. and N.H.; data curation, D.H., A.A., M.D. and N.H.; writing—original draft preparation, A.A.,
N.H. and M.D.; writing—review and editing, D.H., E.K., N.Z., A.A., M.D. and N.H.; visualization,
D.H., E.K. and N.H.; supervision, D.H., E.K., N.Z. and E.S.; project administration, E.K. and N.H.;
funding acquisition, N.Z. and E.S. All authors have read and agreed to the published version of the
manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are contained within the article.
Conflicts of Interest: The authors declare no conflict of interest.
Mach. Learn. Knowl. Extr. 2024, 6 76
References
1. Prabaljeet, S.S.; Atush, B.; Lekha, R. Loan Approval Prediction Using Machine Learning: A Comparative Analysis of Classification
Algorithms. 2023. Available online: https://ieeexplore.ieee.org/document/10182799/authors#authors (accessed on 19 December
2023).
2. Yash, D.; Prashant, R.; Pratik, C. Loan Approval Prediction Using Machine Learning. 2021. Available online: https://www.irjet.
net/archives/V8/i5/IRJET-V8I5331.pdf (accessed on 19 December 2023).
3. Mohammad, A.S.; Amit, K.G.; Tapas, K. An Approach for Prediction of Loan Approval Using Machine Learning Algorithm. 2020.
Available online: https://ieeexplore.ieee.org/document/9155614 (accessed on 19 December 2023).
4. Almheiri, A.S. Automated Loan Approval System for Banks. Rochester Institute of Technology, Dubai. 2023. Available online:
https://scholarworks.rit.edu/cgi/viewcontent.cgi?article=12535&context=theses (accessed on 19 December 2023).
5. Banco de España, Eurosistema. Report on the Financial and Banking Crisis in Spain, 2008–2014. 2017. Available online:
https://repositorio.bde.es/bitstream/123456789/15112/1/InformeCrisis_Completo_web_en.pdf (accessed on 19 December
2023).
6. How Much Does Racial Bias Affect Mortgage Lending? Evidence from Human and Algorithmic Credit Decisions—Neil Bhutta,
Aurel Hizmo, Daniel Ringo. Available online: https://www.federalreserve.gov/econres/feds/files/2022067pap.pdf (accessed on
19 December 2023).
7. Roberts, R. Mental Health and Money: A Practical Guide; Money and Mental Health Policy Institute: London, UK, 2019.
8. Bhargav, P.; Sashirekha, K. A Machine Learning Method for Predicting Loan Approval by Comparing the Random Forest and
Decision Tree Algorithms. 2023. Available online: https://sifisheriessciences.com/journal/index.php/journal/article/view/41
4/397 (accessed on 19 December 2023).
9. Wang, Y.; Wang, M.; Yong, P.; Chen, J. Joint loan risk prediction based on deep learning-optimized stacking model. Eng. Rep. 2023,
e12748. [CrossRef]
10. Abdullah, M.; Chowdhury, M.A.F.; Uddin, A.; Moudud-Ul-Huq, S. Forecasting nonperforming loans using machine learning. J.
Forecast. 2023, 42, 1664–1689. [CrossRef]
11. Alsaleem, M.Y.E.; Hasoon, S.O. Predicting bank loan risks using machine learning algorithms. AL-Rafidain J. Comput. Sci. Math.
2020, 14, 159–168. [CrossRef]
12. World Health Organization. Mental Disorders. 2019. Available online: https://www.who.int/health-topics/mental-disorders#
tab=tab_1 (accessed on 19 December 2023).
13. National Alliance on Mental Illness. Mental Health by the Numbers. 2021. Available online: https://www.nami.org/mhstats
(accessed on 19 December 2023).
14. Mental Health America. The State of Mental Health in America. 2021. Available online: https://mhanational.org/sites/default/
files/2021%20State%20of%20Mental%20Health%20in%20America_0.pdf (accessed on 19 December 2023).
15. Mental Health First Aid USA. About Mental Health First Aid. 2021. Available online: https://www.mentalhealthfirstaid.org/
about/ (accessed on 19 December 2023).
16. Javed, K.; Hamid, F. A comparative study of decision tree algorithms for nonlinear and complex relationships between input
features and output variables. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2015, 29, 65–74.
17. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Taylor & Francis: Abingdon, UK, 1984.
18. Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International
Conference on Machine Learning, Bonn, Germany, 7–11 August 2005.
19. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the human out of the loop: A review of Bayesian
optimization. In Proceedings of the IEEE; IEEE: Piscataway, NJ, USA, 2016.
20. Kaviani, P.; Dhotre, M.S. Short Survey on Naive Bayes Algorithm. Int. J. Adv. Res. Comput. Sci. Manag. 2017, 4, 44839. [CrossRef]
21. Jena, B. Gender Recognition of Speech Signal using KNN and SVM. SSRN Electron. J. 2021. [CrossRef]
22. Zhan, Y.; Liu, J.; Gou, J.; Wang, M. A video semantic detection method based on locality-sensitive discriminant sparse representa-
tion and weighted KNN. J. Vis. Commun. Image Represent. 2016, 41, 65–73. [CrossRef]
23. Syaliman, K.U.; Labellapansa, A. Improving the Accuracy of Features Weighted k-Nearest Neighbor Using Distance Weigh; SciTePress:
Setúbal, Portugal, 2019.
24. Freund, Y.; Schapire, R.E. Boosting: Foundations and Algorithms; The MIT Press: Cambridge, MA, USA, 2013.
25. Shahri, N.H.N.B.M.; Lai, S.B.S.; Mohamad, M.B.; Rahman, H.A.B.A.; Bin Rambli, A. Comparing the Performance of AdaBoost,
XGBoost, and Logistic Regression for Imbalanced Data. Math. Stat. 2021, 9, 379–385. [CrossRef]
26. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [CrossRef]
27. Masui, T. All You Need to Know about Gradient Boosting Algorithm—Part 1. Regression. 2022. Available online: https:
//towardsdatascience.com/all-you-need-to-know-about-gradient-boosting-algorithm-part-1-regression-2520a34a502 (accessed
on 19 December 2023).
28. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016.
29. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision
tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9
December 2017.
Mach. Learn. Knowl. Extr. 2024, 6 77
30. Sujatha, C.N.; Gudipalli, A.; Pushyami, B.H.; Karthik, N.; Sanjana, B.N. Loan Prediction Using Machine Learning and Its
Deployment on Web Application. In Proceedings of the 2021 Innovations in Power and Advanced Computing Technologies
(i-PACT), Kuala Lumpur, Malaysia, 27–29 November 2021.
31. Tumuluru, P.; Burra, L.R.; Loukya, M.; Bhavana, S.; CSaiBaba, H.M.H.; Sunanda, N. Comparative Analysis of Customer Loan
Approval Prediction using Machine Learning Algorithms. In Proceedings of the Second International Conference on Artificial
Intelligence and Smart Energy (ICAIS-2022), Coimbatore, India, 23–25 February 2022.
32. Mamun, M.A.; Farjana, A.; Mamun, M. Predicting Bank Loan Eligibility Using Machine Learning Models and Comparison
Analysis. In Proceedings of the 7th North American International Conference on Industrial Engineering and Operations
Management, Orlando, FL, USA, 12–14 June 2022.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.