0% found this document useful (0 votes)
47 views5 pages

Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models

baba bui

Uploaded by

Dhriman Deka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models

baba bui

Uploaded by

Dhriman Deka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Customer Credit Risk: Application and Evaluation of

Machine Learning and Deep Learning Models


Prathibha Kiran Yemmanuru#1 Jones Yeboah#1 Isaac Kofi Nti#1

School Of Information Technology


School Of Information Technology School Of Information Technology
University Of Cincinnati, USA
2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI) | 979-8-3503-7297-7/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICMI60790.2024.10585896

University Of Cincinnati, USA University Of Cincinnati, USA


1
[email protected] 1
[email protected] 1
ORCID: 0000-0001-9257-4295

Abstract - Financial institutions rely on credit risk evaluation or accuracy. Machine learning has slowly entered this segment
credit scoring models for decision-making, as these models and algorithms like Artificial Neural Networks (ANN), Deep
analyse the creditworthiness of customers. Machine learning Learning algorithms and Support Vector Machines (SVM) are
models can predict with high accuracy, and they are widely being widely used to classify customer credits.
utilized in this sector. This study selected a dataset from
openml.org, and after data pre-processing, visualization, and There have been numerous studies that have applied
exploratory data analysis, seven algorithms were applied. The Machine Learning and Deep Learning algorithms to credit risk
models were evaluated using various performance metrics.
evaluation models. Azaria et al. [1] utilized various non-
Support Vector Machines (SVM) achieved the highest accuracy
(80.67%) with a good Recall of 93.55%. Also, when dimensions parametric and parametric statistics to compare different
were reduced to 10 (from 20) through Principal Component Machine Learning classifiers used on customer loans. The
Analysis, SVM demonstrated the highest accuracy (83.57%), an Deep Neural Network achieved the highest Area Under the
F1 score of 84.70%, a Recall of 88.84%, and an ROC AUC Score Curve (AUC) of 0.638.
of 83.44%. By generating synthetic records using SMOTE, the
open-source algorithm Extreme Gradient Boosting achieved the In other studies [2], Bayesian, Random Forests (RF), SVM,
highest accuracy score of 83.3% and an ROC AUC of 83.29%. and Naïve Bayes classifiers were applied to publicly available
Future work may involve tuning the hyperparameters of these German credit scoring data, with a comparative study
algorithms to improve other performance metrics.
conducted. Different feature selection algorithms such as
Information-Gain, Gain-Ratio, and Chi-Square were
Keywords – Customer classification, ML algorithms, credit employed in conjunction with classifiers. Random Forests and
scoring
Chi-Square provided the highest accuracy, and the result of
I. INTRODUCTION the Decision Tree (C5.0) with Chi-Square was also
noteworthy.
Credit risk evaluation or credit scoring is crucial for any
Louis et al. [3] applied Machine Learning and Deep
financial institution. It is extensively utilized by financial
Learning methods to highly unbalanced 24-month credit card
institutions to analyze creditworthiness [1] before extending
data. AutoML and Neural Architecture Search (NAS)
any debts to customers. Misclassifying customers into
techniques were utilized, with the Extreme Gradient Boosting
incorrect categories can result in bad credits and financial
(XGBoost) algorithm achieving the highest AUC of 0.78.
losses. With strict regulations imposed on financial
Logistic Regression, Decision Trees, and Random Forest
institutions, customer classification models must be accurate
algorithms were applied to microfinance institution data in [4],
as these institutions base decisions on the output of these
with RF achieving 83% accuracy. Ping et al. [5] employed
models, and they are significantly impacted by
Bagging Classification and Regression Trees (CART) and
misclassification. Numerous contributing factors such as age,
AdaBoost CART on Australian Credit Card data from the UCI
existing loans, credit amount, marital status and others make
dataset with 14 features, and AdaBoost CART achieved the
the models complex.
highest accuracy of 85.86%.
There are many traditional statistical models used for credit
I Nyoman et al. [6] applied an ensemble model of XGBoost
risk evaluation since decades. These models are also called as
and Random Forests to telecom and insurance customer data
‘credit scoring’ models. These traditional models have been
and compared the results with those of similar studies. This
using the methods like Discriminant Analysis (DA) and
model achieved better F1 scores (0.85 for telecom data and
Logistic Regression (LR) and they can classify with high

Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
0.947 for insurance data) and AUC scores (0.857 and 0.99 for data is checked for outliers, and as the outliers did not appear
telecom and insurance data respectively) compared to other to be true outliers, they are not treated. Two new features,
studies. In other studies [7], Random Forests, Decision Tree, 'gender' and 'marital status', extracted from 'personal status',
Logistic Regression, and AdaBoost algorithms were applied to appear to be highly correlated (0.94). To avoid
a Kaggle credit card dataset, with RF achieving the highest multicollinearity, they are removed. One-hot encoding using
accuracy of 94.4% in identifying frauds. Pandas.get_dummies is applied for 14 categorical features.
The target feature 'class' is converted into numerical values.
Bing et al. [8] proposed a hybrid Deep Learning model that Standard Scaler is used to scale the data.
combined Convolutional Neural Network (CNN) and Relief
algorithm. They applied this hybrid model to consumer credit Data Partitioning Training data and Test data are obtained
data collected from a Chinese finance company and compared from the dataset by splitting it into 70:30 ratio. 700 are in
it with Random Forest and Logistic Regression models. training set and test set has 300 records.
Relief-CNN achieved an accuracy of 91.64% and yielded
superior results compared to the other two models.

This study applies the traditional Logistic Regression


algorithm along with the powerful Extreme Gradient Boosting
open-source library. Seven different ML algorithms are
utilized on this dataset. These include Conventional Logistic
Regression (Maximum Entropy Based) [9], Multi-Layer
Perceptron (Neutral Network), along with Boosting
algorithms such as Extreme Gradient Boosting and Random
Forest Classifier (Ensemble Based-Bagging), Decision Tree
Classifier (Tree Based), K Nearest Neighbors (Group Based),
and Support Vector Machines. Performance metrics are
utilized to compare the models.
Fig.1 Methodology of this study
II. METHODOLOGY
B. Phase II – Application of ML algorithms

This study methodology is classified into three phases.


Seven different algorithms are chosen for this study.
Logistic Regression, Multi-Layer Perceptron (MLP), Extreme
Phase 1. Data Processing
Gradient Boosting, Random Forest Classifier, Decision Tree
Phase 2. Application of Machine Learning Algorithms
Classifier, K Nearest Neighbors and Support Vector Machines
Phase 3. Performance metrics and Model evaluation
are applied to the dataset. Performance metrics are used to
compare the models. Details of the algorithms are explained
Each phase is explained in detail below.
below.

A. Phase I – Data processing Logistic Regression – This is a traditional algorithm that is


used for classification, and it predicts the target class using
probability. Though it is used for classification, it is still called
There are many open-source datasets available on internet
as regression [11] as it uses the output from linear regression
for customer credit risk evaluation. The dataset chosen for this
function and applies a sigmoid function to identify the output
study is credit_customers.csv taken from openml.org [10].
of a target class.
There are 1000 records in the dataset with 21 features. Seven
features are float datatype, and 14 features are object datatype.
Multi-Layer Perceptron – MLP comes under feedforward
Feature ‘class’ is the target feature that holds value ‘good’ for
algorithms of Artificial Neural Network that has input, hidden
good customers who can be given loans and ‘bad’ for
and output layers. Non-linear mapping [12] exists between
customers who can put the financial institutions in risk.
input and output in this algorithm.
checking_status, other_parties, housing, other_payment_plans,
residence_since, job, duration, age, credit_history, purpose,
Extreme Gradient Boosting – This is a powerful open-
credit_amount, savings_status, employment, foreign_worker,
source library that implements gradient boosting. Extreme
installment_commitment, own_telephone, personal_status,
Gradient Boosting algorithm won [13] many Kaggle
num_dependents, property_magnitude, existing_credits and
competitions for its accuracy in predictions.
class are the features in the dataset.
Data Pre-processing To understand the features and their Random Forests - This is a supervised learning algorithm
relationships, Exploratory Data Analysis (EDA) is conducted that contains many decision trees. The outcome of this
on the dataset. No duplicates or missing values are found. The

Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
algorithm is based on the predictions [14] made by the Experimental setup Dell Inspiron 7506 laptop is used to
decision trees. perform the analysis and it has a 16 GB Intel Core i7
processor and a 512 GB SSD. Jupyter Notebook and Python is
Decision Tree – Decision Tree is a non-parametric algorithm used for this study. Pandas, Numpy, Matplotlib, Scikit-learn,
[15] that can handle multi-output problems and uses decision Seaborn libraries are used.
rules to predict the output.
III. RESULTS
K-Nearest Neighbors – KNN is an extremely easy algorithm
A. Data Visualization
to implement. It uses proximity [16] to decide the outcome.

Support Vector Machines – SVM is a powerful algorithm Bar charts, Histograms, Pie charts, Box plots and Count
that generates the optimal hyperplane [17] to differentiate the plots are used to visualize the data. Heatmap generated for the
target classes. dataset is shown in Fig.2.

Features checking_status (positively correlated), purpose


C. Phase III – Evaluation metrics (positively correlated), savings_status (positively correlated),
personal_status (positively correlated) and duration
Evaluation metrics used in this study are given below. Five (negatively correlated) are the significant features in
metrics - Accuracy, F1, Receiver Operating Characteristic identifying customer class.
Area Under the Curve (ROC AUC), Recall and Precision
scored are used to evaluate the models.

Fig.2 Heatmap generated for the dataset

B. Model Evaluation
Accuracy (1) is the ratio of correct predictions to total number
of predictions made. TP = True positives, TN = True Different evaluation metrics obtained for seven Machine
negatives, FP = False positives and FN = False negatives. Learning models applied to the dataset are presented in Table
1. The best evaluation metrics for different techniques are
Precision is the ability of the model to predict positive labels. highlighted in blue in Table 1, 2 and 3.
A model is expected to have this score closer to one.
The models performed better with one-hot encoding
Recall score defines how well a model can classify the compared to Label encoding. Therefore, one-hot encoding is
positive class when the outcome is positive. subsequently used in the dataset. With one-hot encoding,
Support Vector Machines achieved the highest accuracy of
F1 score is a harmonic mean of Precision and Recall. 80.67%. This demonstrates that SVM is a powerful algorithm
for distinguishing target classes in feature space, achieving a
ROC AUC score indicates how effectively the classification good recall of 93.55%.
algorithm can differentiate between positive and negative
classes. The higher the ROC AUC score, the better the Due to the data imbalance, Synthetic Minority
performance of the model. Oversampling Technique (SMOTE) is applied to create

Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
synthetic records and balance the data. With SMOTE, 3 3 4 9 3 7
Extreme Gradient Boosting attained the highest accuracy KNN 78.1 78.9 81.34 80.3 61.9 63.9 80.02 80.4
4 7 7 4 2 4
score of 83.3% and an ROC AUC of 83.29%. LR 77.0 82.2 80.91 82.8 60.1 69.0 81.15 83.2
1 5 1 7 8 9
Principal Component Analysis is conducted on the data to SVM 76.7 82.1 77.50 80.9 59.8 70.2 80.09 83.4
reduce the dimensions from 21 to 10. With reduced 8 9 3 9 7 4
dimensions, SVM achieved the highest accuracy (83.57%), an
F1 score of 84.70%, a recall of 88.84%, and an ROC AUC SVM has performed well when compared to other ML
Score of 83.44%. models and the evaluation metrics of SVM are shown in Fig. 3.
However, lower ROC AUC score of 70.27% show that the
algorithm must be tuned with the best hyperparameters to
improve the prediction accuracy.

IV. DISCUSSION

Machine Learning has incredibly good algorithms for


classification. Several important studies that used Machine
Learning classification algorithms for customer credit risk
evaluation are discussed in this study. Seven Machine
Fig.3 Evaluation metrics of SVM Learning algorithms, including Deep Learning algorithm
Table.1 Accuracy and F1 scores of algorithms expressed in percentages Multi-Layer Perceptron, Logistic Regression, boosting
Alg Accuracy F1 score algorithms - Extreme Gradient Boosting and Random Forests,
orit Label 1 Hot SMO PC Lab 1 SMO PC Decision Trees, K Nearest Neighbors, and Support Vector
hm Enc TE A el Hot TE A Machines are applied in this study on a publicly available
Enc
RF 78.00 80.33 82.62 81. 85.9 80.3 82.62 82.7
dataset. Support Vector Machines achieved the highest
67 0 3 7 accuracy with Extreme Gradient Boosting being the best for a
DT 67.67 74.33 77.14 74. 76.6 87.3 83.45 75.0 well-balanced dataset. The results are summarized and
76 3 1 0 presented in the tables.
ML 75.00 77.67 83.10 83. 83.0 82.1 77.03 83.7
P 10 7 3 5 V. CONCLUSION
XG 78.33 78.33 83.33 80. 85.3 84.8 83.29 81.9
B 95 9 1 0
KN 74.00 76.33 80.00 80. 83.1 85.2 83.95 81.1 Credit risk evaluation is important for financial institutions
N 48 9 6 1 and the models used for credit scoring must be accurate in
LR 74.67 77.33 81.19 83. 84.1 84.8 80.19 83.9
33 0 6 5
their predictions. In conclusion, this study has applied
SV 75.33 80.67 80.24 83. 84.7 84.8 81.84 84.7 Machine Learning as well as Deep Learning algorithms to
M 57 1 2 0 classify customer credit risk. The results showed that Support
Vector Machines is the best for classification with the open-
Table.2 Recall scores of algorithms expressed in percentages source Extreme Gradient Boosting algorithm performing well
Algorithm Recall
Label 1 Hot SMOTE PCA
on a balanced dataset. In the future, we intend to tune the
RF 92.63 93.55 85.58 86.05 hyperparameters for these algorithms to improve the
DT 73.27 81.57 74.88 73.95 performance metrics.
MLP 84.79 86.18 82.33 85.12
XGB 87.56 86.64 85.12 84.19
KNN 88.94 91.71 79.07 81.86 REFERENCES
LR 92.63 87.56 82.79 85.12
SVM 94.47 93.55 86.51 88.84
[1] Azaria Natasha, Dedy Dwi Prastyo, and Suhartono, “Credit scoring to
Table.3 Precision and ROC AUC scores of algorithms expressed in classify consumer loan using machine learning,” AIP Conference Proceedings,
percentages Dec. 2019, doi: https://doi.org/10.1063/1.5139802.
Algorit Precision ROC AUC
hm [2] S. K. Trivedi, “A study on credit scoring modeling with different feature
Lab 1 SMO PCA Lab 1 SMO PCA
selection and machine learning approaches,” Technology in Society, vol. 63,
el Hot TE el Hot TE
p. 101413, Nov. 2020, doi: https://doi.org/10.1016/j.techsoc.2020.101413.
Enc Enc
RF 80.0 81.8 81.42 79.7 66.1 69.6 82.55 81.5
[3] L. Marceau, L. Qiu, N. Vandewiele, and E. Charton, “A comparison of
8 5 4 9 7 6
Deep Learning performances with other machine learning algorithms on
DT 80.3 82.7 79.31 76.0 63.1 68.4 77.20 74.7
credit scoring unbalanced data,” Feb. 2020, doi:
0 1 8 4 9 8 https://doi.org/10.48550/arXiv.1907.12363
MLP 81.4 83.4 84.29 82.4 67.1 70.8 83.11 83.0
2 8 3 0 0 5
XGB 83.3 83.9 82.81 79.7 70.8 71.6 83.29 80.8

Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
[4] B. Dushimimana, Y. Wambui, T. Lubega, and P. E. McSharry, “Use of
Machine Learning Techniques to Create a Credit Score Model for Airtime
Loans,” Journal of Risk and Financial Management, vol. 13, no. 8, p. 180,
Aug. 2020, doi: https://doi.org/10.3390/jrfm13080180.

[5] P. Yao, “Credit Scoring Using Ensemble Machine Learning,” Jan. 2009,
doi: https://doi.org/10.1109/his.2009.264.

[6] I Nyoman Mahayasa Adiputra and Paweena Wanchai, “Customer Churn


Prediction Using Weight Average Ensemble Machine Learning Model,” Jun.
2023, doi: https://doi.org/10.1109/jcsse58229.2023.10202105.

[7] “Customer behavior-based fraud detection of credit card using a random


forest algorithm | IEEE Conference Publication | IEEE Xplore,”
ieeexplore.ieee.org. https://ieeexplore.ieee.org/document/10169484 (accessed
Dec. 10, 2023).

[8] B. Zhu, W. Yang, H. Wang, and Y. Yuan, “A hybrid deep learning model
for consumer credit scoring,” 2018 International Conference on Artificial
Intelligence and Big Data (ICAIBD), May 2018, doi:
https://doi.org/10.1109/icaibd.2018.8396195.

[9] E. Bozkurt, “Machine Learning Classification Algorithms with Codes,”


Analytics Vidhya, Mar. 15, 2021. https://medium.com/analytics-
vidhya/machine-learning-classification-algorithms-with-codes-5a8af4491fcb

[10] “OpenML,” www.openml.org.


https://www.openml.org/search?type=data&sort=runs&status=active&id=31

[11] “Understanding Logistic Regression,” GeeksforGeeks, May 09, 2017.


https://www.geeksforgeeks.org/understanding-logistic-regression/

[12] C. Bento, “Multilayer Perceptron Explained with a Real-Life Example


and Python Code: Sentiment Analysis,” Medium, Sep. 30, 2021.
https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-
life-example-and-python-code-sentiment-analysis-cb408ee93141

[13] Nvidia, “What is XGBoost?,” NVIDIA Data Science Glossary.


https://www.nvidia.com/en-us/glossary/data-science/xgboost/

[14] O. Mbaabu, “Introduction to Random Forest in Machine Learning,”


Section, Dec. 11, 2020. https://www.section.io/engineering-
education/introduction-to-random-forest-in-machine-learning/

[15] scikit-learn, “1.10. Decision Trees — scikit-learn 0.22 documentation,”


Scikit-learn.org, 2009. https://scikit-learn.org/stable/modules/tree.html

[16] IBM, “What is the k-nearest neighbors algorithm? | IBM,”


www.ibm.com, 2023.
https://www.ibm.com/topics/knn#:~:text=The%20k%2Dnearest%20neighbors
%20algorithm%2C%20also%20known%20as%20KNN%20or

[17] A. Sasidharan, “Support Vector Machine Algorithm,” GeeksforGeeks,


Jan. 20, 2021. https://www.geeksforgeeks.org/support-vector-machine-
algorithm/

Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.

You might also like