Preprints202502 2059 v1
Preprints202502 2059 v1
Chameera De Silva *
doi: 10.20944/preprints202502.2059.v1
Keywords: Automated Credit Risk Assessment (ACRA); Explainable AI(XAI); Predictive Performance; Human
in he loop decision making; Finance Technology (FinTech); Findex dataset
Copyright: This open access article is published under a Creative Commons CC BY 4.0
license, which permit the free download, distribution, and reuse, provided that the author
and preprint are cited in any reuse.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Article
Abstract: The financial sector operates on a delicate balance of continuous cash flows between firms
and customers, inherently exposing itself to credit risk. Effective and precise credit risk assessment is
paramount for maintaining financial stability and mitigating potential losses. Leveraging the rapid
advancements in artificial intelligence (AI) and machine learning (ML), this study proposes an
automated, real-time credit risk evaluation framework. Utilizing the World Bank Global Findex
dataset, encompassing transactional data from 123 countries, the research focuses on extracting and
analyzing financial features critical for risk prediction. A suite of machine learning algorithms,
including Support Vector Machines (SVM), Logistic Regression, and Decision Trees, is implemented
to model credit risk. These models achieve remarkable performance, with accuracy reaching 99% and
an F1-score of 1.00, underscoring their robustness in identifying at-risk profiles. Interpretability is
ensured using Explainable AI (XAI) tools, highlighting key financial features such as debit card
usage, mobile banking adoption, and deposit patterns as the most influential predictors of credit risk.
This study emphasizes the integration of domain expertise with ML models, advocating for a human-
in-the-loop approach to ensure reliability and trust in automated systems. By focusing on model
interpretability and transparency, the proposed solution aligns with regulatory requirements and
enhances decision-making processes in the financial sector. This work highlights the transformative
potential of AI in credit risk management, offering a scalable, data-driven framework for global
financial institutions.
Keywords: automated credit risk assessment (ACRA); explainable AI (XAI), predictive performance;
human in he loop decision making; finance technology (FinTech); findex dataset
1. Introduction
The financial sector is pivotal to a country’s growth and development Attigeri et al. (2019).
Institutions such as banks and insurance companies directly manage the rotation of money, making
cash flow their central business model. However, this inherently introduces credit risk, which is
present in the finance industry. Types of credit risks include default risk, institutional risk, and
concentration risk, among others. Non-recovery or partial recovery of loans negatively impacts a
bank’s balance sheet and income statement, reducing interest income, increasing provisions for non-
performing assets (NPAs), escalating capital requirements, and ultimately lowering profits. These
challenges have led to the deterioration of asset quality, particularly in public sector financial
institutions Kadanda and Raj (2018). Consequently, rising NPAs are a significant concern, and
identifying their determinants before loans turn into NPAs is critical Bawa et al. (2018). Therefore, a
systematic approach to identifying, understanding, and assessing key risk parameters is essential for
the early prediction of default behavior Attigeri et al. (2019). Evaluating risks proactively will ensure
the safe and efficient operation of financial institutions.
2 of 18
With advancements in technology, including the widespread adoption of AI and ML, efforts
have been made to automate this process. Numerous studies have developed machine learning-based
models for achieving high accuracy in predicting NPAs and identifying potential credit risks by
analyzing key financial parameters Chow (2017); Yao and Li (2022). However, many of these models
function as black boxes, offering little to no interpretability regarding how predictions are made. This
lack of transparency raises challenges in understanding and trusting the results, particularly in high-
stake financial decision-making.
This study proposes a machine learning-based solution to address credit risk assessment in the
financial sector. It not only offers a data-driven approach to tackling credit risk but also emphasizes
the importance of model interpretability. By integrating domain knowledge and human expertise
with the predictions generated by machine learning models, this study ensures that these insights are
both actionable and trustworthy. Using the Word Bank Global Findex datasetDemirgu¨¸c-Kunt and
Klapper (2012), which covers data from 123 countries, this research prepares a dataset for machine
learning applications, followed by training classification models using the preprocessed data. The
project evaluates the predictive performance and testing accuracies of various machine learning
models to identify the most effective approaches. A key focus of this research is the interpretability
of these classification models. By examining how the models process data and generate predictions,
the study highlights the critical role of domain knowledge and the integration of human-in-the-loop
decision-making in credit risk assessment within the financial sector. This emphasis on
interpretability aims to bridge the gap between complex machine learning algorithms and actionable
insights, ensuring transparency, reliability, and trust in predictive systems. The human-in-the-loop
solution enhances trust and authenticity in predictive systems by combining machine-generated
insights with human expertiseBussmann et al. (2020). The specific objectives of this study on applying
interpretability methods in credit risk assessment in finance are as follows:
• To identify the most effective interpretability methods for credit risk assessment in finance,
striking an optimal balance between interpretability and predictive accuracy.
• To explore how trade-offs between interpretability and accuracy vary across different credit risk
assessment tasks in finance.
• To examine the role of domain knowledge in enhancing the interpretability of credit risk
assessment models in financial contexts.
• To develop a robust framework for integrating human-in-the-loop decisionmaking into credit risk
assessment workflows using interpretable models.
By addressing these objectives, this study seeks to advance the field of credit risk assessment by
fostering greater transparency, reliability, and collaboration between machine learning systems and
human decision-makers.
The research questions this research aims to answer are:
• What are the most effective interpretability methods for credit risk assessment in finance, in terms
of balancing interpretability and accuracy?
• How do the interpretability and accuracy trade-offs differ across different credit risk assessment
tasks in finance?
• How does domain knowledge impact the interpretability of credit risk assessment models in
finance?
• How can human-in-the-loop decision making be effectively incorporated into credit risk
assessment using interpretable models in finance?
2. Literature Review
Credit risk management and risk prediction are critical tasks for financial institutions such as
banks. Leveraging AI and ML for these tasks introduces a modern and efficient approach to
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
3 of 18
enhancing predictive security against credit risks. However, a major challenge in using AI and ML
for automating banking processes is the lack of interpretability and explainability in these systems.
Humans have a tendency to look for “Why” every time they experience an unexpected “What”
Molnar (2022). For machine learning models to be trusted, particularly in the financial sector, they
must provide clear reasoning for their predictions. While interpretability may not be essential for all
ML applications, it is crucial in finance, where the decisions made by these systems have direct and
significant impacts on people’s lives. Therefore, ensuring transparency and interpretability in AI-
driven credit risk assessment systems is vital for building trust and accountability.
4 of 18
two aspects is critical for fostering trust and advancing the application of state-of-the-art ML models
in the finance sector.
Demajo et al. (2020) discusses the growing achievements in AI and the surge of enthusiasm in
financial technology, particularly focusing on the application of ML and AI in credit scoring and other
financial processes. Credit scoring models powered by ML are decision-making tools that help
lenders evaluate whether to approve loan applications based on the likelihood of the applicant
fulfilling their financial obligations Demajo et al. (2020). These models offer significant benefits, such
as reducing loan processing times, enabling loan officers to focus on specific cases, minimizing
human subjectivity, lowering costs, and decreasing default risks.
Despite their advantages, these predictive models, often considered ”blackbox” methods,
present a challenge: the lack of transparency in their decisionmaking process. As a result, financial
experts are unlikely to trust model predictions without adequate justification Demajo et al. (2020).
This has led to the emergence of XAI, a field aimed at demystifying black-box models to enhance
understanding of the reasoning behind their predictions.
However, XAI in finance presents unique challenges, such as determining the intended audience
for explanations (experts or users), identifying the best representation format for explanations, and
developing robust methods to evaluate their effectiveness Demajo et al. (2020). To address these
challenges, Demajo et al. (2020) highlight the importance of applicationgrounded analysis in
evaluating XAI models. This involves engaging domain experts to assess the accuracy and quality of
explanations by performing real-world tasks. For example, in credit scoring, loan officers—who
possess deep knowledge of loan requirements and banking regulations—are considered key experts
for evaluating model explanations. Building on this, these models serve as intelligent systems that
automate financial processes and predictions, streamlining operations and enhancing efficiency.
Developing predictive models for financial applications involves training classification models, such
as Decision Trees and Deep Learning algorithms, to accurately assess risks and make informed
decisions. The following section delves into the existing literature on the use of Deep Learning and
Decision Tree models for credit risk assessment, highlighting the challenges associated with their
training, implementation, and integration into financial systems.
2.3. Deep Learning and Decision Trees Model for Credit Risk Assessment
The financial sector generates and processes vast amounts of data due to the high volume of
financial transactions, making it intrinsically linked to big data. Advances in big data technologies,
coupled with improved data availability and computational power, have prompted many banks and
lending institutions to modernize their business models to remain competitive Addo et al. (2018).
According to Addo et al. (2018), effective credit risk prediction, monitoring, model reliability,
and streamlined loan processing are essential for transparent and informed decision-making in the
sector. Through a study utilizing real-world data, they present a binary classification approach
leveraging machine learning and deep learning models. Their findings reveal that tree-based models
demonstrate greater stability compared to multilayer artificial neural network models, highlighting
their suitability for financial applications where consistency and reliability are paramount. The
integration of technology and big data in the financial sector has enabled the development of
predictive models to automate loan processing and credit risk assessments. In their research, Addo
et al. (2018) employ regression models such as Linear, Logistic, and Multinomial regression, each
with specific parameters, stopping criteria, and activation functions. Additionally, they utilise
classification models like Random Forest and Gradient Boosting. They also explore deep learning
approaches, emphasising the use of multilayer neural networks. The study presents four
architectures: convolutional neural networks (CNNs), recurrent neural networks (RNNs), recursive
neural networks, and standard deep neural networks. Each of these models featured unique
parameters and architectures. The study evaluates model performance using metrics such as AUC
and RMSE, as summarized in Table 1 found below.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
5 of 18
Chow (2017) highlights the growing adoption of machine learning in the corporate sector, noting
that over 40% of large corporations are already leveraging it to enhance their marketing strategies.
These companies attribute approximately 38% of their sales growth to machine learning and,
remarkably, 76% believe that it will play a pivotal role in driving their future sales performance.
Table 1. Models’ performances on the test dataset using AUC and RMSE values Addo et al. (2018).
6 of 18
potential in the finance sector while also identifying challenges in deploying these systems in such a
sensitive domainBuckley et al. (2021). While interpretability and explainability have been
acknowledged as critical factors by some authors, few studies have thoroughly addressed these
challenges in the context of financial technology. Additionally, the integration of human judgment
into the prediction and decision-making processes, known as the human-the-loop approach, remains
an under-explored area in credit risk assessment researchBuckley et al. (2021). The present study
proposes a novel approach to credit risk assessment using machine learning techniques that not only
enhance the interpretability and explainability of black-box models but also incorporate human-in-
the-loop decision-making to address these gaps.
Figure 1. Average ranks of credit scoring models for Nemenyi test Yao and Li (2022);.
3. Methodology
The dataset used in this study is the World Bank “Global Findex 2021”, dataset Demirgu¨¸c-Kunt
and Klapper (2012). This dataset contains survey responses on demographic, income, and finance
from customers in 123 countries. This dataset is used for analysis and training the machine learning
model for credit risk assessment. It covers the data pre-processing steps and their rationale, followed
by an analysis of the data, including graphs that illustrate the relationships between features and
their relevance to the prediction model. Originally containing 127 variables, the dataset was refined
by handling null values and removing duplicate columns, leaving the independent features listed
below with their descriptions and data types.
Figure 2. Number of customers having Account, and Mobile Money Account (account, account fin, account mob).
In Figure 3, the first graph displays the distribution of the borrowed feature, representing
customers who have ever borrowed money from a financial institution, family, friends, or for medical
purposes. The second graph shows fin20 (customers who borrowed for medical reasons), the third
shows fin22a (customers who borrowed from a financial institution), and the fourth shows fin22b
(those who borrowed from family or friends). Since borrowed captures all these scenarios, we kept this
column and dropped fin20, fin22a, and fin22b.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
7 of 18
Figure 3. Number of customers who ever borrowed money (borrowed, fin20, fin22a, fin22b).
In Figure 4, we analyze the saved column. The first graph shows the number of customers who
have ever saved money in any form. The second graph represents the fin16 column, showing
customers who saved for old age. The third graph corresponds to fin17a, indicating customers who
saved using an account, while the fourth graph shows fin17b, representing customers who saved
through informal methods. According to the World Bank’s feature descriptions, the saved column
combines the information from fin16, fin17a, and fin17b. As a result, we retained the saved column and
removed the other three.
This approach was applied to other sets of duplicate features, where we kept the feature that
provided the most comprehensive information and removed the redundant ones. The groups of
duplicate features analyzed are as follows:
Figure 4. Number of customers who ever saved money (saved, fin16, fin17a, fin17b).
8 of 18
From these groups, one feature was retained while the others were dropped. The features
retained in the dataset are listed in Table 2 and 3. Continuing with the analysis, we graphed the
number of customers with mobile phones and internet connections.
Table 2. All the features in the dataset after treating the null values. (Table part I).
2 = secondary,
3 = tertiary or more
9 of 18
Table 3. All the features in the dataset after treating the null values (Table part II).
10 of 18
fin13a Use mobile money account two or more times per float64
month
fin13b Use mobile money account to store money float64
fin13c Use mobile money account to borrow money float64
fin13d Use mobile money account without help float64
fin14 2 Paid digitally for in-store purchase after COVID-19 float64
fin14c Paid online or in cash at delivery float64
fin14c 2 Paid online for the first time after COVID-19 float64
fin24 Main source of emergency funds in 30 days int64
fin24a Difficulty of emergency funds in 30 days float64
fin24b Difficulty of emergency funds in 7 days float64
fin26 Sent domestic remittances float64
fin28 Received domestic remittances float64
Figure 6 shows graphs for customers who have a credit card, have used it, and paid their credit
card bill on time. The relevant features for this information are fin7, fin8a, and fin8b.
The responses to most questions in the questionnaire are formatted as follows:
• 0 = It was a null cell in the raw dataset
Figure 5. Number of customers who have a debit card and have used it (fin2, fin4, fin4a).
Figure 6. Number of customers who have a credit card, have used it, and paid the bill.
11 of 18
Figure 7. Number of customers who have a credit card, have used it, and paid the bill.
Results
This section presents the results of the research, including the performance of machine learning
models trained to predict credit risk based on financial features. It also includes the accuracy of these
models, along with the outcomes from explainable AI techniques for interpreting the black-box
models.
12 of 18
Behera (2017). These models are used for binary classification to assess credit risk, based on a set of
features derived from the financial history and survey responses of customers. The dataset is split
into independent (X) and dependent (Y ) variables, and further divided into training and testing sets
using a 70/30 split for model training and testing.
The classification models are evaluated on their test accuracy, which is provided in Table 4.
These trained models are crucial for automating the process of assessing credit risk for financial
institutions.
The SVM classifier is a supervised machine learning algorithm used for both classification and
regression tasks Mahesh (2020); Das and Behera (2017). In this research, SVM is applied to a binary
classification problem, with the goal of creating the optimal decision boundary to separate
ndimensional space into distinct classesMahesh (2020). For the test dataset, the F1-scores for the SVM
model are 1.00 for both classes, indicating perfect classification performance. The precision and recall
values for both classes are also recorded as 1.00. The confusion matrix and classification report for
the SVM model’s predictions on the test dataset are shown in Table 4.
The machine learning model used for predicting credit risk using the World Bank data is logistic
regressionMahesh (2020); Das and Behera (2017); Choudhary and Gianey (2017). Logistic regression
is a supervised machine learning technique commonly used for classification tasks Choudhary and
Gianey (2017). It is a statistical method that analyzes datasets with multiple independent variables to
determine the outcome. The purpose of using logistic regression in this research is to find the best-
fitting model that describes the relationship between the independent and dependent features in the
dataset. The dataset contains 39 independent features and 1 dependent feature, with the 39
independent features used to predict the outcome through the logistic regression model.
Similar to the SVM model, the classification report for the Logistic Regression model shows F1-
scores of 1.00 for both classes. The precision and recall values for both classes are also 1.00. The
confusion matrix and classification report for Logistic Regression, along with those for SVM, are
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
13 of 18
presented in Table 4. Given that this is a binary classification problem, the third machine learning
model used is the Decision Tree Classifier. The accuracy of the Decision Tree model on the test dataset
was recorded at 98.21%. The classification report for the Decision Tree model shows an F1-score of
0.99 for class ”0” and 0.97 for class ”1,” with corresponding precision and recall values of 0.99 and
0.97, respectively. The confusion matrix and classification report for the Decision Tree predictions are
also provided in Table 4. Bottom of Form
Figure 10. Shap Bar plot (i) and Heatmap (ii) for SVM algorithm.
Figure 11. Shap Waterfall plot (i, ii) and Summary plot (iii) for SVM algorithm.
Figure 12. Shap Bar plot (i) and Heatmap (ii) for Logistic Regression Model.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
14 of 18
Figure 13. Shap Waterfall plot (i, ii) and Summary plot (iii) for Logistic Regression.
The third classification machine learning model is the Decision Tree. The explainable artificial
intelligence model for decision tree in the shap model is shap.TreeExplainer and plots the shap
decision plot. The decision plot for the decision tree model trained on the world bank dataset can be
seen in Figure 14. The multiple lines seen in the decision plot shows the contribution of different
features in the final classification through the decision tree classifier.The other shap plot shows the
processing of the decision tree classifier is the force plot which shows the contribution of difference
features for a single classification. The force plot for the decision tree classifier can be seen in Figure
15 which shows the contribution of different features for the final classification.
5. Discussion
This section discusses the essential interpretation based on the key finding of the research. To
answer the first research question and understand the reasoning behind the predictive performance
of the machine learning models for classifying the dataset against the credit risk and finding the most
effective interpretability methods for credit risk assessment, we apply the explainable artificial
techniques to visualize the processing of the black-box machine learning models. There are several
interpretability methods and tools for interpreting the black-box machine learning models Adadi and
Berrada (2018). These tools are listed in an order from least to greatest complexity below:
15 of 18
or the prediction average, we get the exact prediction value. This feature is not present in other
explainable AI models.
The SHAP bar plot for SVM from Figure 10 shows the influence of features on the prediction
results given by the model. As per the SHAP bar plot, ”fin11” (not having a bank account) is the
feature influencing the prediction followed by the feature ”basic fin” and ”account,” where ”basic
fin” feature is the combination of multiple features ”fin2,” ”fin4,” ”fin5,” ”fin6,” ”fin7,” ”fin8,”
”fin8a,” ”fin8b,” ”fin9,” ”fin9a,” ”fin10,” ”fin10a,” and ”fin10b.” These features capture having and
use of a debit card, use of a mobile phone for banking, and bank deposits. A bank account provides
a record of income, expenses, and savings patterns. Without it, lenders lack critical insights into an
individual’s financial behavior, such as cash flow stability or savings habits. Not having an account
can restrict access to credit, contributing to higher credit risk. Debit cards may have a clear financial
footprint, which helps lenders evaluate spending patterns and financial discipline. The frequency and
volume of debit card transactions can indicate how well a customer manages cash flow and avoids
overdrafts, providing a measure of financial responsibility. Unlike credit cards, debit card use
suggests that customers rely on available funds rather than borrowing, potentially indicating lower
reliance on credit and reduced credit risk. Similarly, mobile phone users are likely to access financial
services more easily, such as tracking expenses, making payments, or receiving alerts, reducing the
likelihood of missed payments or overdrafts. Customers who actively use debit cards, manage
financial transactions via mobile phones, and make regular deposits are likely to exhibit strong
financial discipline, stable income, and access to modern financial tools. This reduces the likelihood
of default. These features provide lenders with a holistic view of a customer’s financial habits,
enabling more accurate credit risk assessment and tailored decision-making. Likewise, regular
deposits indicate consistent income streams, which are essential for assessing repayment capacity.
The size, frequency, and source of deposits provide insights into financial stability. Any deviations
in these behaviors, such as a drop in deposits or irregular mobile banking activity, may act as early
indicators of financial distress, allowing preemptive action.
The summary plot for SVM in Figure 11 (iii) shows the influence of the first 20 features on the
predictive power of the machine learning model. The waterfall plot in Figure 11 (i) and (ii) is
visualized for 2 different predictions and shows the influence of features in the final prediction for
that particular record. The last SHAP graph for the SVM algorithm is the heatmap in Figure 10 (ii),
which shows the influence of features in the prediction at different instances of the predictions. By
analyzing all the SHAP plots, it is clear that ”fin11” has the greatest influence on the prediction,
closely followed by ”basic fin” and ”account.” This suggests that basic financial features and the
status of the financial account play a crucial role in predicting credit risk for an application.
Figure 12 (i) shows the SHAP bar plot for the logistic regression machine learning model, and
similar to the results for the SVM model, in the logistic regression model, the SHAP bar plot reads
the feature ”fin11” as the feature with the highest influence on the prediction, closely followed by
”basic fin” and ”account.” The summary plot for the logistic regression model is shown in Figure 13
(iii), which shows the top 20 features having maximum influence on the prediction of the logistic
regression model. Observing the summary plot for SVM and logistic regression models, we can say
that the top 4 features for both models are the same: ”fin11,” ”basic fin,” ”account,” and
”anydigpayment.” The results indicate that the classification heavily relies on the customer’s financial
data, such as having an account, debit card, credit card, timely credit card payments, and the use of
mobile phones and the internet for digital payments and internet banking. Additionally, features
influencing the rejection of an application or predicting it as risky include ”saved,” ”inc q,”
”financially worries,” and ”emergencyfunds.” Figure 13 (i) and (ii) show the waterfall plot for two
different predictions in the dataset and the positive and negative influence of the features on the final
prediction for that single record. Figure 12 (ii) shows the heatmap for the influence of features on the
final predictions at multiple different instances in the dataset and predictions.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
16 of 18
The third classification machine learning model used is the Decision Tree. For explainability, the
SHAP model utilizes shap.TreeExplainer to generate decision plots. The decision plot for the Decision
Tree model trained on the World Bank dataset is shown in Figure 14. The multiple lines in the
decision plot represent the contributions of various features to the final classification made by the
Decision Tree classifier. Another SHAP visualization used for the Decision Tree classifier is the force
plot, which highlights the contributions of individual features to a single classification. The force plot
for the Decision Tree classifier can be seen in Figure 15, which shows the contribution of different
features for the final classification.
Comparing the results from the SHAP analyses for all three machine learning models, the
primary feature influencing the predictions—or in the case of the Decision Tree classifier, the root
node of the tree—is identified as ”fin11.” This feature, derived through PCA from the group of
features ”fin11a” to ”fin11h,” captures the reason for not having an account in any financial
institution. The SHAP analysis highlights the critical role of ”having an account” in assessing credit
risk. ”Fin11” is closely followed by other influential features such as ”basic fin,” ”saved,” ”inc q,”
”financially worries,” and others, forming a list of the top 10 contributors to the predictive power of
the models. This finding addresses the second research question of this study by demonstrating how
the features driving the predictions reveal the underlying meaning of the model’s outcomes. The
inherently subjective nature of interpreting black-box models highlights the importance of domain
expertise in accurately understanding and contextualizing the results from machine learning models.
The third research question of this project examines how domain knowledge influences the
interpretability of credit risk assessment models in finance. This question is addressed through the
inherently subjective nature of interpretability. As discussed, the impact of features on predictions
can vary depending on the specific prediction task, and interpreting this impact effectively requires
domain expertise. For instance, if a model predicts a customer’s financial instability in repaying a
loan, the contribution of features like credit score and previous bill payments holds a different
meaning compared to their contribution in predicting a customer’s need for a loan. Understanding
and contextualizing these differences demands domain knowledge and a human-in-the-loop
decision-making process in the credit risk assessment using the machine learning models presented
in this research paper.
6. Conclusion
The rapid advancement of technology continues to drive innovation across various industries,
including the finance sector. Automation is becoming a key focus, reducing reliance on human effort
and intelligence in many tasks. The finance industry, which revolves around credit and money, is
particularly susceptible to credit risk, such as loan defaults, market crashes, and inflation impacts.
These risks require proactive and accurate assessment to mitigate potential threats. This research
highlights the potential of leveraging machine learning and artificial intelligence to automate credit
risk assessment. By establishing assumptions and rules, machine learning models can streamline this
critical process. However, the adoption of these technologies brings challenges, particularly
regarding trust in the models’ predictions and decision-making processes.
To address these concerns, this study presents a machine learning-based solution for automated
credit risk assessment, paired with XAI tools and a human-in-the-loop approach. The findings
emphasize that credit risk assessment is a nuanced and subjective task requiring domain knowledge
to interpret predictions meaningfully. Incorporating human expertise ensures the authenticity of
decisions and increases trust in the system. The models developed in this study achieve an impressive
accuracy of 99% and an F1-score of 1.00, demonstrating their effectiveness. Additionally, XAI tools
like SHAP provided insights into the models’ processing, highlighting the influence of each feature
on predictions and validating the models’ performance.
While this research focuses on presenting machine learning models for credit risk assessment
and identifying effective interpretability methods, the full implementation and integration of such
systems into financial institutions remain a task for future exploration. Future research could build
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
17 of 18
on these findings to develop comprehensive systems that combine automation, interpretability, and
expert decision-making for robust credit risk management.
References
1. Adadi, A., Berrada, M., 2018. Peeking inside the black-box: a survey on explainable artificial intelligence
(xai). IEEE access 6, 52138–52160.
2. Addo, P.M., Guegan, D., Hassani, B., 2018. Credit Risk Analysis using Machine and Deep Learning Models.
Technical Report 08/WP/2018. Universit´e Paris 1 Panth´eon-Sorbonne.
3. Aremu, O.O., Hyland-Wood, D., McAree, P.R., 2020. A machine learning approach to circumventing the
curse of dimensionality in discontinuous time series machine data. Reliability Engineering & System Safety
195, 106706.
4. Arrieta, A.B., Rodriguez, N.D., Ser, J.D., Bennetot, A., Tabik, S., Gonz´alez, A.B., Garcia, S., Gil-Lopez, S.,
Molina, D., Benjamins, V.R., Chatila, R., Herrera, F., 2019. Explainable artificial intelligence (xai): Concepts,
taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58.
doi:10.1016/j.inffus.2019.12.012.
5. Attigeri, G., M, M.P.M., Pai, R.M., 2019. Framework to predict npa/willful defaults in corporate loans: A
big data approach. International Journal of Electrical and Computer Engineering (IJECE) 9, 3786–3797.
doi:10. 11591/ijece.v9i5.pp3786-3797.
6. Bawa, J., Goyal, V., Mitra, S., Basu, S., 2018. An analysis of npas of indian banks: Using a comprehensive
framework of 31 financial ratios. IIMB Management Review 31. doi:10.1016/j.iimb.2018.08.004.
7. Bello, O.A., 2023. Machine learning algorithms for credit risk assessment: an economic and financial
analysis. International Journal of Management 10, 109–133.
8. Buckley, R.P., Zetzsche, D.A., Arner, D.W., Tang, B.W., 2021. Regulating artificial intelligence in finance:
Putting the human in the loop. Sydney Law Review, The 43, 43–81.
9. Bussmann, N., Giudici, P., Marinelli, D., Papenbrock, J., 2020. Explainable ai in fintech risk management.
Frontiers in Artificial Intelligence 3, 26.
10. Choudhary, R., Gianey, H.K., 2017. Comprehensive review on supervised machine learning algorithms, in:
2017 International Conference on Machine Learning and Data Science (MLDS), IEEE. pp. 37–43.
11. Chow, J.C.K., 2017. Analysis of financial credit risk using machine learning URL:
https://arxiv.org/ftp/arxiv/papers/1802/1802.05326.pdf.
12. Crespo Ma´rquez, A., 2022. The curse of dimensionality, in: Digital Maintenance Management: Guiding
Digital Transformation in Maintenance. Springer, pp. 67–86.
13. Das, K., Behera, R.N., 2017. A survey on machine learning: concept, algorithms and applications.
International Journal of Innovative Research in Computer and Communication Engineering 5, 1301–1309.
14. Demajo, L., Vella, V., Dingli, A., 2020. Explainable ai for interpretable credit scoring. CSIT
doi:10.5121/csit.2020.101516.
15. Demirgu¨¸c-Kunt, A., Klapper, L.F., 2012. Measuring financial inclusion: The global findex database. World
bank policy research working paper .
16. Emmert-Streib, F., Yli-Harja, O., Dehmer, M., 2020. Explainable artificial intelligence and machine learning:
A reality rooted perspective .
17. Fang, F., Ventre, C., Li, L., Kanthan, L., Wu, F., Basios, M., 2020. Better model selection with a new definition
of feature importance. URL: https: //arxiv.org/abs/2009.07708, arXiv:2009.07708.
18. Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E., 2014. Peeking inside the black box: Visualizing statistical
learning with plots of individual conditional expectation. URL: https://arxiv.org/abs/1309.6392,
arXiv:1309.6392.
19. Kadanda, D., Raj, K., 2018. Non-performing assets (npas) and its determinants: A study of indian public
sector banks. Journal of Social and Economic Development 20, 193–212. doi:10.1007/1047-018-0068-0.
20. Lualdi, P., Sturm, R., Siefkes, T., 2022. Exploration-oriented sampling strategies for global surrogate
modeling: A comparison between one-stage and adaptive methods. Journal of Computational Science 60,
101603.
21. URL: http://dx.doi.org/10.1016/j.jocs.2022.101603, doi:10.1016/ j.jocs.2022.101603.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 February 2025 doi:10.20944/preprints202502.2059.v1
18 of 18
22. Lundberg, S., Lee, S.I., 2017. A unified approach to interpreting model predictions. URL:
https://arxiv.org/abs/1705.07874, arXiv:1705.07874.
23. Mahesh, B., 2020. Machine learning algorithms-a review. International Journal of Science and Research
(IJSR).[Internet] 9, 381–386.
24. Misheva, B.H., Hisra, A., Osterrieder, J., Kulkarni, O., Lin, S.F., 2021. Explainable ai in credit risk
management URL: https://arxiv.org/pdf/ 2103.00949.pdf.
25. Molnar, C., 2022. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.
URL: https://christophm.github.io/ interpretable-ml-book/.
26. Nemenyi, P.B., 1963. Distribution-free multiple comparisons. Princeton University.
27. Ribeiro, M.T., Singh, S., Guestrin, C., 2016. ”why should i trust you?”: Explaining the predictions of any
classifier. URL: https://arxiv.org/ abs/1602.04938, arXiv:1602.04938.
28. Yao, Y., Li, J., 2022. Operational risk assessment of third-party payment platforms: a case study of china.
Financial Innovation 8, 19.
29. Zhou, L., Pan, S., Wang, J., Vasilakos, A.V., 2017. Machine learning on big data: Opportunities and
challenges. Neurocomputing 237, 350–361.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.