Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
Abstract - Financial institutions rely on credit risk evaluation or accuracy. Machine learning has slowly entered this segment
credit scoring models for decision-making, as these models and algorithms like Artificial Neural Networks (ANN), Deep
analyse the creditworthiness of customers. Machine learning Learning algorithms and Support Vector Machines (SVM) are
models can predict with high accuracy, and they are widely being widely used to classify customer credits.
utilized in this sector. This study selected a dataset from
openml.org, and after data pre-processing, visualization, and There have been numerous studies that have applied
exploratory data analysis, seven algorithms were applied. The Machine Learning and Deep Learning algorithms to credit risk
models were evaluated using various performance metrics.
evaluation models. Azaria et al. [1] utilized various non-
Support Vector Machines (SVM) achieved the highest accuracy
(80.67%) with a good Recall of 93.55%. Also, when dimensions parametric and parametric statistics to compare different
were reduced to 10 (from 20) through Principal Component Machine Learning classifiers used on customer loans. The
Analysis, SVM demonstrated the highest accuracy (83.57%), an Deep Neural Network achieved the highest Area Under the
F1 score of 84.70%, a Recall of 88.84%, and an ROC AUC Score Curve (AUC) of 0.638.
of 83.44%. By generating synthetic records using SMOTE, the
open-source algorithm Extreme Gradient Boosting achieved the In other studies [2], Bayesian, Random Forests (RF), SVM,
highest accuracy score of 83.3% and an ROC AUC of 83.29%. and Naïve Bayes classifiers were applied to publicly available
Future work may involve tuning the hyperparameters of these German credit scoring data, with a comparative study
algorithms to improve other performance metrics.
conducted. Different feature selection algorithms such as
Information-Gain, Gain-Ratio, and Chi-Square were
Keywords – Customer classification, ML algorithms, credit employed in conjunction with classifiers. Random Forests and
scoring
Chi-Square provided the highest accuracy, and the result of
I. INTRODUCTION the Decision Tree (C5.0) with Chi-Square was also
noteworthy.
Credit risk evaluation or credit scoring is crucial for any
Louis et al. [3] applied Machine Learning and Deep
financial institution. It is extensively utilized by financial
Learning methods to highly unbalanced 24-month credit card
institutions to analyze creditworthiness [1] before extending
data. AutoML and Neural Architecture Search (NAS)
any debts to customers. Misclassifying customers into
techniques were utilized, with the Extreme Gradient Boosting
incorrect categories can result in bad credits and financial
(XGBoost) algorithm achieving the highest AUC of 0.78.
losses. With strict regulations imposed on financial
Logistic Regression, Decision Trees, and Random Forest
institutions, customer classification models must be accurate
algorithms were applied to microfinance institution data in [4],
as these institutions base decisions on the output of these
with RF achieving 83% accuracy. Ping et al. [5] employed
models, and they are significantly impacted by
Bagging Classification and Regression Trees (CART) and
misclassification. Numerous contributing factors such as age,
AdaBoost CART on Australian Credit Card data from the UCI
existing loans, credit amount, marital status and others make
dataset with 14 features, and AdaBoost CART achieved the
the models complex.
highest accuracy of 85.86%.
There are many traditional statistical models used for credit
I Nyoman et al. [6] applied an ensemble model of XGBoost
risk evaluation since decades. These models are also called as
and Random Forests to telecom and insurance customer data
‘credit scoring’ models. These traditional models have been
and compared the results with those of similar studies. This
using the methods like Discriminant Analysis (DA) and
model achieved better F1 scores (0.85 for telecom data and
Logistic Regression (LR) and they can classify with high
Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
0.947 for insurance data) and AUC scores (0.857 and 0.99 for data is checked for outliers, and as the outliers did not appear
telecom and insurance data respectively) compared to other to be true outliers, they are not treated. Two new features,
studies. In other studies [7], Random Forests, Decision Tree, 'gender' and 'marital status', extracted from 'personal status',
Logistic Regression, and AdaBoost algorithms were applied to appear to be highly correlated (0.94). To avoid
a Kaggle credit card dataset, with RF achieving the highest multicollinearity, they are removed. One-hot encoding using
accuracy of 94.4% in identifying frauds. Pandas.get_dummies is applied for 14 categorical features.
The target feature 'class' is converted into numerical values.
Bing et al. [8] proposed a hybrid Deep Learning model that Standard Scaler is used to scale the data.
combined Convolutional Neural Network (CNN) and Relief
algorithm. They applied this hybrid model to consumer credit Data Partitioning Training data and Test data are obtained
data collected from a Chinese finance company and compared from the dataset by splitting it into 70:30 ratio. 700 are in
it with Random Forest and Logistic Regression models. training set and test set has 300 records.
Relief-CNN achieved an accuracy of 91.64% and yielded
superior results compared to the other two models.
Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
algorithm is based on the predictions [14] made by the Experimental setup Dell Inspiron 7506 laptop is used to
decision trees. perform the analysis and it has a 16 GB Intel Core i7
processor and a 512 GB SSD. Jupyter Notebook and Python is
Decision Tree – Decision Tree is a non-parametric algorithm used for this study. Pandas, Numpy, Matplotlib, Scikit-learn,
[15] that can handle multi-output problems and uses decision Seaborn libraries are used.
rules to predict the output.
III. RESULTS
K-Nearest Neighbors – KNN is an extremely easy algorithm
A. Data Visualization
to implement. It uses proximity [16] to decide the outcome.
Support Vector Machines – SVM is a powerful algorithm Bar charts, Histograms, Pie charts, Box plots and Count
that generates the optimal hyperplane [17] to differentiate the plots are used to visualize the data. Heatmap generated for the
target classes. dataset is shown in Fig.2.
B. Model Evaluation
Accuracy (1) is the ratio of correct predictions to total number
of predictions made. TP = True positives, TN = True Different evaluation metrics obtained for seven Machine
negatives, FP = False positives and FN = False negatives. Learning models applied to the dataset are presented in Table
1. The best evaluation metrics for different techniques are
Precision is the ability of the model to predict positive labels. highlighted in blue in Table 1, 2 and 3.
A model is expected to have this score closer to one.
The models performed better with one-hot encoding
Recall score defines how well a model can classify the compared to Label encoding. Therefore, one-hot encoding is
positive class when the outcome is positive. subsequently used in the dataset. With one-hot encoding,
Support Vector Machines achieved the highest accuracy of
F1 score is a harmonic mean of Precision and Recall. 80.67%. This demonstrates that SVM is a powerful algorithm
for distinguishing target classes in feature space, achieving a
ROC AUC score indicates how effectively the classification good recall of 93.55%.
algorithm can differentiate between positive and negative
classes. The higher the ROC AUC score, the better the Due to the data imbalance, Synthetic Minority
performance of the model. Oversampling Technique (SMOTE) is applied to create
Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
synthetic records and balance the data. With SMOTE, 3 3 4 9 3 7
Extreme Gradient Boosting attained the highest accuracy KNN 78.1 78.9 81.34 80.3 61.9 63.9 80.02 80.4
4 7 7 4 2 4
score of 83.3% and an ROC AUC of 83.29%. LR 77.0 82.2 80.91 82.8 60.1 69.0 81.15 83.2
1 5 1 7 8 9
Principal Component Analysis is conducted on the data to SVM 76.7 82.1 77.50 80.9 59.8 70.2 80.09 83.4
reduce the dimensions from 21 to 10. With reduced 8 9 3 9 7 4
dimensions, SVM achieved the highest accuracy (83.57%), an
F1 score of 84.70%, a recall of 88.84%, and an ROC AUC SVM has performed well when compared to other ML
Score of 83.44%. models and the evaluation metrics of SVM are shown in Fig. 3.
However, lower ROC AUC score of 70.27% show that the
algorithm must be tuned with the best hyperparameters to
improve the prediction accuracy.
IV. DISCUSSION
Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.
[4] B. Dushimimana, Y. Wambui, T. Lubega, and P. E. McSharry, “Use of
Machine Learning Techniques to Create a Credit Score Model for Airtime
Loans,” Journal of Risk and Financial Management, vol. 13, no. 8, p. 180,
Aug. 2020, doi: https://doi.org/10.3390/jrfm13080180.
[5] P. Yao, “Credit Scoring Using Ensemble Machine Learning,” Jan. 2009,
doi: https://doi.org/10.1109/his.2009.264.
[8] B. Zhu, W. Yang, H. Wang, and Y. Yuan, “A hybrid deep learning model
for consumer credit scoring,” 2018 International Conference on Artificial
Intelligence and Big Data (ICAIBD), May 2018, doi:
https://doi.org/10.1109/icaibd.2018.8396195.
Authorized licensed use limited to: Indian Ins of Science Edu & Research. Downloaded on August 10,2024 at 10:29:04 UTC from IEEE Xplore. Restrictions apply.