0% found this document useful (0 votes)
27 views5 pages

Singh 2017

Comparative Study of Individual and Ensemble Methods of Classification for Credit Scoring

Uploaded by

Emily Clark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

Singh 2017

Comparative Study of Individual and Ensemble Methods of Classification for Credit Scoring

Uploaded by

Emily Clark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)

IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9

Comparative Study of Individual and Ensemble


Methods of Classification for Credit Scoring

Pradeep Singh
Department of Computer Science and Engineering
National Institute of Technology Raipur
G. E. Road, Raipur – 492010, Chhattisgarh, India
[email protected]

Abstract— Credit Scoring is the primary method for context; therefore we have selected individual classifiers and
classifying loan applicants into two classes, namely credible homogeneous ensembles to assess the relative effectiveness of
payers and defaulters. In general, credit score is the primary these techniques on the credit scoring datasets.
indicator of creditworthiness of the person. This credit scoring
technique is used by banks and other money lenders to build a
probabilistic predictive model, called a scorecard for estimating II. RELATED WORK
the probability of defaulters. In the current global scenario, The past decade, various classification techniques has been
credit scoring is a major tool for risk evaluation and risk developed for credit scoring; in this regard, we studied the
management for all the existing and emerging economies. With various latest developments in the credit scoring classification.
the introduction of Basel II Accord, Credit scoring has gained In 2015, Lessmann et al. [3] have introduced a benchmarking
much significance in retail credit industry. In this paper, we study with 41 classifiers on eight credit scoring datasets and
performed an extensive comparative in order to classify the various ensemble selection techniques. The accuracy of
credit scoring and identification of best classifier. Furthermore, scorecard has been measured on six indicators. Analysis of the
we used two different categories of classifiers i.e. individual and financial effects of the various scorecards is also performed
ensemble. Identification of optimal machine-learning methods for
but they have not analyzed the Taiwan credit data.
credit scoring applications is a crucial step towards stable
creditworthiness of the person. Different parameters Accuracy, In 2015, Ala’raj and Abbod [4] have applied a classifier
AUC, F-measure, precision and recall are used for the evaluation consensus method in order to combine multiple classifier
of the results. systems (MCS). The famous five base classifiers used are
support vector machines, neural networks, random forests,
Keywords—Taiwan credit card; machine learning; credit Naïve Bayes and decision trees. The proposed method has
scoring been compared with two benchmark classifiers namely
Multivariate Adaptive Regression Splines (MARS) and
I. INTRODUCTION Logistic Regression (LR). The accuracy results along with the
The aim of credit scoring is to identify the lending other performance metrics demonstrate the improved of the
capability of the individual consumers. With the acquisition of proposed model’s predictive performance.
the goods on credit, there is enormous growth of the market in An organized literature survey concerning with theories
recent years, therefore reliable models of credit scoring have and applications of binary classification methods has been
been given extraordinary attention of commercial institutions discussed in Ref. [5]. The results reveal the importance and
because they lower both the operating risk and cost. A smaller use of these methods of creating rating.
improvement in identifying correct credit worth would
significantly increment the gain of financial institutions. This An ensemble classification technique [6] has been
paper addresses the problem of credit scoring by using Taiwan proposed, employed on supervised clustering mechanism with
credit data. Scoring refers to the evaluation of credit the weighted voting approach. The proposed method improves
worthiness or capability of the new applicants. It quantifies the the accuracy of the credit rating.
risks associated with the credit requests with respect to their A comparative study performed on the base classifiers
characteristics such as income, age and occupation. In current with the help of ensemble methods [7]. An experimental study
years, it has been established that machine learning techniques using various base classifiers for the different ensemble
are distinguished and effective methods for credit scoring [1], methods to improve the credit scoring task has been carried
[2]. In most of the researches, only a single classification out.
model is built for credit scoring. Therefore, it is beneficial to
compare the performance measures of various individual and x The authors [8] have proposed a boosted decision tree
ensemble classification models for credit scoring on similar approach using an ensemble approach based on the
criteria. The aim of the study is to compare the performance of hyper-parameters of XGBoost tuned with Bayesian
a wide range of a classification technique in the credit scoring optimization.

978-1-5386-4031-9/17/$31.00 ©2017 IEEE 968


Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9

The proposed approach outperforms the base models on IV. DATASET CHARACTERISTICS
several performance measures. Our empirical study includes Taiwan retail credit scoring
The survey performed above shows that the current datasets. The data sets Taiwan credit (TC) is obtained from the
researches on classification algorithms and their application to UCI Machine Learning Library. The summary for each data
the problem of credit scoring utilizes various accuracy set is provided in Table 1, data sets are binary class and are
parameters. Unfortunately, less attention has been given for an slightly imbalanced towards the "good" class i.e. 'the
extensive comparison of various algorithms in order to customers who are not defaulting'. Our study included
identify the best learners for this problem. classification of the original data sets.

III. EXPERIMENTAL SETUP AND DATA SETS V. PERFORMANCE INDICATORS


Credit scoring is a group of decision models and their A variety of indicators to measure predictive accuracy are
under-lying techniques which give support to lenders when available. We consider following accuracy indicators in our
providing credit to customers. In order to complete the study: the percentage correctly classified (ACC), the area
extensive study we have performed with individual classifier under a receiver-operating-characteristics (ROC) curve [11],
and ensembles. In the next section we describe the the TP rate, FP rate Precision, Recall and F-measure. We
methodology used for credit card scoring. chose these indicators because they assess the predictive
performance of a scorecard from different angles. Table 2
A. Implementation shows the configuration of a confusion matrix and the
formulations of other accuracy parameters that were used to
The implementation of the various machine learning assess the classifiers used in our study.
algorithms is categorized into two categories i.e. individual
and ensemble. Figure 1 illustrates the implementation of Precision= tp ⁄ (tp+fp) (1)
individual and ensemble based classifiers. 10-fold cross
validation is used in all the classification where the original The precision is in the equation (1) where tp = true
dataset is divided into 10 independent folds where 9 of the 10 positive, fp = false positive. Precision focuses on the class
folds are combined and are used as a training set; the agreement of the data labels with the positive labels given by
remaining one fold is used as the testing set. Thereafter, every the classifier
classifier is applied separately on these data sets. The Recall= tp ⁄ (tp+fn) (2)
experiments were performed using the default parameters of
WEKA machine learning API [9]. Where to = true positive and fun =false negatives. Recall
evaluated the eeffectiveness of a classifier to identify positive
labels.
B. Individual Classifiers
In individual classification all the different categories of F-major=2 * (precision*recall) ⁄ (precision+recall) (3)
classifiers are used for evaluation. Various categories are F1 score can be interpreted as a weighted average of the
mainly from Bayien learners, MLP, simple logistics, Lazy precision and recall.
learners, rule based learners and decision tree based learners.
Accuracy= (tp+tn) ⁄ (tp+tn+fp+fn) (4)
C. Ensemble Classifiers
Accuracy shows the overall effectiveness of a classifier.
Ensemble methods use multiple learning algorithms to
obtain better predictive performance than the constituent AUC provides the classifier’s ability to avoid false
learning algorithms. Ensembles perform better when there is a classification. Table 3 shows the all machine learning
significant diversity among the models. Ensembles are a algorithms and their abbreviations, used in this study.
divide-and-conquer approach which used to improve
performance. The most important feature in the most active
field of research in supervised learning is to study the
techniques to build good ensemble classifiers. The principle TABLE I. SUMMARY OF CREDIT SCORING DATA SET
revelation is that ensemble methods are generally much exact Name Indepen Instances Good/Bad k-
than the individual classifiers of which they are composed of. dent cross
The principle of ensemble classifiers stated as ‘a combination Variabl (%) validat
of “weak learners” can merge together and collectively form a es ion
“strong learner”. Individually, each classifier is a “weak
learner,” while all the combined classifiers are the “strong
learners”[10].
TC 24 30000 78/22 10
The various ensemble used for creation of the models are
Bagging with base of fast decision tree classifiers, boosting,
tree based ensembles, random forest etc.

978-1-5386-4031-9/17/$31.00 ©2017 IEEE 969


Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9

TABLE II. CONFUSION MATRIX RC Random committee


RFOREST Random Forest
Actual
SPEGASOS Stochastic variant of the pegasos
Faulty Module Not Faulty TS Thresholdsector
VFI Voting feature intervals
Module
Predicted

Faulty TP(True FP (False VPTER Voted perceptron


Module positive) positive)
Not Faulty FN (False TN (True
VI. RESULTS AND DISCUSSIONS
Module negative) Negative)
We have performed extensive experiments in order to
identify the best algorithms for Taiwan credit (TC)
scoring. We perform studies on Taiwan credit card
In this experiment ten fold cross validation is used for having 30000 instances and twenty four attributes. The
credit scoring prediction analysis. Tenfold cross validation is results are presented in Tables 4. Form the table it can be
used to overcome the sampling bias. For the purpose of seen that the J48 has obtained 80.73% of accuracy
obtaining stable and reliable results in the experiments, k fold and0.669 AUC whereas the Naïve bayes achieved 65.57
cross validation technique was used. The k fold classifier is and 0.745 accuracy and AUC respectively. Classification
mostly used for the classification accuracy measure. In this via clustering is the worst performed in the all twenty five
validation k partitions are made and out of these one is used classifiers. It has achived only 0.49 AUC. Multilayer
for testing and the others are used for training. The dataset can Perceptron (MLP) has outperformed among all the models.
be shown as follows Random Forest has performed second best with ROC
0.768. If we see the top four learners then Multilayer
perceptron, Multinomial logistic regression model,
Random forest and bagging model are outperformers.. It
is well known that higher the AUC value the better the
… classifier is said to be. Table 4 compares the benchmarking
… performance of individual and ensemble learners on the
basis of mentioned performance indicators. Column nine
shows the comparison of ROC area under the curve for the
In the above, T1, T2 ,…Tk are the partitions for testing and all classifiers. Observations show that the results validate
P1, P2 ,…PK are for training. K is typically 10 or 30. K=10 the findings of [12] according to which the four best
has been used for all our experiments. performing classifiers have been highlighted. Random
forests and Bagging were the two classifiers from
ensemble category which have performed exceptionally.
Multinomial logistic regression model and Multilayer
TABLE III. LEARNERS ABBREVIATIONS USED IN THE PAPER Perceptron (MLP) have performed well in the categories of
individual learners.
Classifier Classified instances
ADB Adaboost M1
VII. CONCLUSIONS
ADTREE Alternating decision tree
This paper examines the twenty five major classification
ATSELECT Attribute select classifier techniques including individual and ensemble learner
BAGGING Bagging compares the predictive accuracy using AUC among them.
BAYSNET Bayes Network learning The extensive experiments in Taiwan Credit scoring, for the
CBAL ClassBalancend first time, are presented to estimate various parameters. In the
CVC ClassificationViaCluster classification accuracy among twenty five learning techniques,
CVR ClassificationviaRegression the results show that there are little differences in
CONRULE Conjunctive Rule classification accuracy of Multilayer perceptron, Logistic,
IB1 Nearest-neighbour classifier Random forest and bagging model. Artificial neural networks
J48 C4.5 decision tree perform classification more accurately than the other five
JRIP Repeated Incremental Pruning to methods.In the Area under the curve, MLP (AUC: 0.792) has
Produce Error Reduction shown the best performance. Based on the predictive accuracy
LOGI Multinominal logistic regression model of ensembles Random Forest and Bagging are best among the
MLP Multilayer perceptron ensemble learners. The impact of various pre-processing
MBOOSTAB Multi boost AB techniques could be extended to provide important
NAVEBAYES Naïve Bayes classifier effectiveness.
NB TREE Decision tree with naive Bayes
ONER 1R classifier
RACEI Race incremental

978-1-5386-4031-9/17/$31.00 ©2017 IEEE 970


Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9

Start

Original
data set

10 fold
10 fold data Testing validation
data

Training
data

Ensemble Train Individual’s


classifier classification
algorithms
algorithm

Evaluate the
classifier

10 times

Average the
results

End

Figure1: Proposed Model for credit score classification

978-1-5386-4031-9/17/$31.00 ©2017 IEEE 971


Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9

TABLE IV. RESULT OF AN EXPERIMENT INDIVIDUAL CLASSIFIER AND ENSEMBLE CLASSIFIER FOR TAIWAN CREDIT SCORING

Classifier Classified ACC TP FP P Recall F-M ROC


instances
ADB 24385 81.28 0.278 0.035 0.691 0.278 0.396 0.73
ADTREE 24554 81.84 0.357 0.051 0.668 0.357 0.465 0.765
ATSELECT 24576 81.92 0.325 0.04 0.695 0.325 0.443 0.707
BAGGING 24480 81.6 0.371 0.058 0.646 0.371 0.472 0.764
BAYSNET 23570 78.56 0.509 0.136 0.516 0.509 0.512 0.756
CBAL 24221 80.7 0.349 0.063 0.613 0.349 0.445 0.669
CVC 16921 56.40 0.36 0.378 0.213 0.36 0.268 0.491
CVR 24621 82.07 0.354 0.047 0.682 0.354 0.467 0.771
CONRULE 24402 81.34 0.273 0.033 0.701 0.273 0.393 0.618
IB1 21936 73.12 0.393 0.173 0.393 0.393 0.393 0.61
J48 24221 80.73 0.349 0.063 0.613 0.349 0.445 0.669
JRIP 24528 81.76 0.373 0.056 0.653 0.373 0.475 0.661
LOGI 24599 81.99 0.38 .055 0.662 0.38 0.483 0.767
MLP 24666 82.22 0.538 0.097 0.612 0.538 0.572 0.792
MBOOSTAB 24385 81.28 0.278 0.035 0.691 0.278 0.396 0.716
NAVEBAYS 19671 65.57 0.715 0.361 0.36 0.715 0.479 0.745
NB TREE 23996 79.98 0.451 0.101 0.559 0.451 0.499 0.747
ONER 24576 81.92 0.325 0.04 0.695 0.325 0.443 0.642
RACEI 24309 81.03 0.32 0.05 0.643 0.32 0.427 0.742
RC 24228 80.76 0.347 0.061 0.616 0.347 0.444 0.732
RFOREST 24539 81.79 0.371 0.055 0.657 0.371 0.474 0.768
SPEGASOS 24581 81.93 0.326 0.041 0.695 0.326 0.444 0.643
TS 23491 78.30 0.558 0.153 0.509 0.558 0.532 0.766
VFI 23878 79.59 0.516 0.124 0.541 0.516 0.528 0.748

VPTER 23346 77.82 0.03 0.009 0.478 0.03 0.056 0.511

We believe the importance of expert knowledge of credit [8] Xia, C. Liu, Y. Li, N. Liu, “A boosted decision tree approach using
scoring, and the fusion of such expert knowledge in Bayesian hyper-parameter optimization for credit scoring,” Expert
Syst. Appl. Vol. 78, pp. 225–241, 2017.
machine learning domain can be a promising topic for doi:10.1016/j.eswa.2017.02.017.
future research. [9] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH.
The WEKA data mining software. SIGKDD Explor Newsl [Internet].
2009; 11(1):10. Available from:
References http://portal.acm.org/citation.cfm?doid=1656274.1656278
[10] L.Rokach,“Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33,
[1] T. Bellotti, J. Crook, “Support vector machines for credit scoring and no. 1–2, pp. 1–39, 2010.
discovery of significant features,” Expert Syst. Appl. Vol. 36, pp. [11] L. F. Carvalho, G. Fernandes, M. V. O. De Assis, J. J. P. C.
3302–3308, 2009. doi:10.1016/j.eswa.2008.01.005. Rodrigues, A. M. Lemes Proen, “Digital signature of network
[2] I-Fei, C.I.-F.C., “Evaluate the performance of cardholders’ repayment segment for healthcare environments support. Irbm. Vol. 35, pp. 299–
behaviors using artificial neural networks and data envelopment 309, 2014. doi:10.1016/j.patrec.2005.10.010.
analysis,” Networked Comput. Adv. Inf. Manag. (NCM), 2010 Sixth [12] B. Baesens, T. Gestel, S. Viaene, M. Stepanova, J. Suykens, J.
Int. Conf. pp. 478–483, 2010. VanthienenJ, “Benchmarking State-of-the-Art Classification
[3] S. Lessmann, B. Baesens, H. V. Seow, L. C. Thomas, “Benchmarking Algorithms for Credit Scoring,” J. Oper. Res. Soc. Vol. 54, pp. 627–
state-of-the-art classification algorithms for credit scoring: An update 635, 2003. doi:10.1057/palgrave.jors.2601545.
of research. Eur. J. Oper. Res. Vol. 247, pp. 124–136, 2015.
doi:10.1016/j.ejor.2015.05.030.
[4] M. Ala’raj and M. F. Abbod, “Classifiers consensus system approach
for credit scoring,” Knowledge-Based Syst., vol. 104, pp. 89–105,
2015.
[5] F. Louzada, A. Ara, G. B. Fernandes, “Surveys in Operations
Research and Management Science Classification methods applied to
credit scoring : Systematic review and overall comparison,” Surv.
Oper. Res. Manag. Sci. 2016. doi:10.1016/j.sorms.2016.10.001.
[6] H. Xiao, Z. Xiao, Y. Wang, “Ensemble classification based on
supervised clustering for credit scoring,” Appl. Soft Comput. J. Vol.
43, pp. 73–86, 2016. doi:10.1016/j.asoc.2016.02.022.
[7] J. Abellán, J. G. Castellano, “A comparative study on base classifiers
in ensemble methods for credit scoring,” Expert Syst. Appl. Vol. 73,
pp. 1–10, 2017. doi:10.1016/j.eswa.2016.12.020.

978-1-5386-4031-9/17/$31.00 ©2017 IEEE 972

You might also like