Singh 2017
Singh 2017
Pradeep Singh
Department of Computer Science and Engineering
National Institute of Technology Raipur
G. E. Road, Raipur – 492010, Chhattisgarh, India
[email protected]
Abstract— Credit Scoring is the primary method for context; therefore we have selected individual classifiers and
classifying loan applicants into two classes, namely credible homogeneous ensembles to assess the relative effectiveness of
payers and defaulters. In general, credit score is the primary these techniques on the credit scoring datasets.
indicator of creditworthiness of the person. This credit scoring
technique is used by banks and other money lenders to build a
probabilistic predictive model, called a scorecard for estimating II. RELATED WORK
the probability of defaulters. In the current global scenario, The past decade, various classification techniques has been
credit scoring is a major tool for risk evaluation and risk developed for credit scoring; in this regard, we studied the
management for all the existing and emerging economies. With various latest developments in the credit scoring classification.
the introduction of Basel II Accord, Credit scoring has gained In 2015, Lessmann et al. [3] have introduced a benchmarking
much significance in retail credit industry. In this paper, we study with 41 classifiers on eight credit scoring datasets and
performed an extensive comparative in order to classify the various ensemble selection techniques. The accuracy of
credit scoring and identification of best classifier. Furthermore, scorecard has been measured on six indicators. Analysis of the
we used two different categories of classifiers i.e. individual and financial effects of the various scorecards is also performed
ensemble. Identification of optimal machine-learning methods for
but they have not analyzed the Taiwan credit data.
credit scoring applications is a crucial step towards stable
creditworthiness of the person. Different parameters Accuracy, In 2015, Ala’raj and Abbod [4] have applied a classifier
AUC, F-measure, precision and recall are used for the evaluation consensus method in order to combine multiple classifier
of the results. systems (MCS). The famous five base classifiers used are
support vector machines, neural networks, random forests,
Keywords—Taiwan credit card; machine learning; credit Naïve Bayes and decision trees. The proposed method has
scoring been compared with two benchmark classifiers namely
Multivariate Adaptive Regression Splines (MARS) and
I. INTRODUCTION Logistic Regression (LR). The accuracy results along with the
The aim of credit scoring is to identify the lending other performance metrics demonstrate the improved of the
capability of the individual consumers. With the acquisition of proposed model’s predictive performance.
the goods on credit, there is enormous growth of the market in An organized literature survey concerning with theories
recent years, therefore reliable models of credit scoring have and applications of binary classification methods has been
been given extraordinary attention of commercial institutions discussed in Ref. [5]. The results reveal the importance and
because they lower both the operating risk and cost. A smaller use of these methods of creating rating.
improvement in identifying correct credit worth would
significantly increment the gain of financial institutions. This An ensemble classification technique [6] has been
paper addresses the problem of credit scoring by using Taiwan proposed, employed on supervised clustering mechanism with
credit data. Scoring refers to the evaluation of credit the weighted voting approach. The proposed method improves
worthiness or capability of the new applicants. It quantifies the the accuracy of the credit rating.
risks associated with the credit requests with respect to their A comparative study performed on the base classifiers
characteristics such as income, age and occupation. In current with the help of ensemble methods [7]. An experimental study
years, it has been established that machine learning techniques using various base classifiers for the different ensemble
are distinguished and effective methods for credit scoring [1], methods to improve the credit scoring task has been carried
[2]. In most of the researches, only a single classification out.
model is built for credit scoring. Therefore, it is beneficial to
compare the performance measures of various individual and x The authors [8] have proposed a boosted decision tree
ensemble classification models for credit scoring on similar approach using an ensemble approach based on the
criteria. The aim of the study is to compare the performance of hyper-parameters of XGBoost tuned with Bayesian
a wide range of a classification technique in the credit scoring optimization.
The proposed approach outperforms the base models on IV. DATASET CHARACTERISTICS
several performance measures. Our empirical study includes Taiwan retail credit scoring
The survey performed above shows that the current datasets. The data sets Taiwan credit (TC) is obtained from the
researches on classification algorithms and their application to UCI Machine Learning Library. The summary for each data
the problem of credit scoring utilizes various accuracy set is provided in Table 1, data sets are binary class and are
parameters. Unfortunately, less attention has been given for an slightly imbalanced towards the "good" class i.e. 'the
extensive comparison of various algorithms in order to customers who are not defaulting'. Our study included
identify the best learners for this problem. classification of the original data sets.
Start
Original
data set
10 fold
10 fold data Testing validation
data
Training
data
Evaluate the
classifier
10 times
Average the
results
End
TABLE IV. RESULT OF AN EXPERIMENT INDIVIDUAL CLASSIFIER AND ENSEMBLE CLASSIFIER FOR TAIWAN CREDIT SCORING
We believe the importance of expert knowledge of credit [8] Xia, C. Liu, Y. Li, N. Liu, “A boosted decision tree approach using
scoring, and the fusion of such expert knowledge in Bayesian hyper-parameter optimization for credit scoring,” Expert
Syst. Appl. Vol. 78, pp. 225–241, 2017.
machine learning domain can be a promising topic for doi:10.1016/j.eswa.2017.02.017.
future research. [9] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH.
The WEKA data mining software. SIGKDD Explor Newsl [Internet].
2009; 11(1):10. Available from:
References http://portal.acm.org/citation.cfm?doid=1656274.1656278
[10] L.Rokach,“Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33,
[1] T. Bellotti, J. Crook, “Support vector machines for credit scoring and no. 1–2, pp. 1–39, 2010.
discovery of significant features,” Expert Syst. Appl. Vol. 36, pp. [11] L. F. Carvalho, G. Fernandes, M. V. O. De Assis, J. J. P. C.
3302–3308, 2009. doi:10.1016/j.eswa.2008.01.005. Rodrigues, A. M. Lemes Proen, “Digital signature of network
[2] I-Fei, C.I.-F.C., “Evaluate the performance of cardholders’ repayment segment for healthcare environments support. Irbm. Vol. 35, pp. 299–
behaviors using artificial neural networks and data envelopment 309, 2014. doi:10.1016/j.patrec.2005.10.010.
analysis,” Networked Comput. Adv. Inf. Manag. (NCM), 2010 Sixth [12] B. Baesens, T. Gestel, S. Viaene, M. Stepanova, J. Suykens, J.
Int. Conf. pp. 478–483, 2010. VanthienenJ, “Benchmarking State-of-the-Art Classification
[3] S. Lessmann, B. Baesens, H. V. Seow, L. C. Thomas, “Benchmarking Algorithms for Credit Scoring,” J. Oper. Res. Soc. Vol. 54, pp. 627–
state-of-the-art classification algorithms for credit scoring: An update 635, 2003. doi:10.1057/palgrave.jors.2601545.
of research. Eur. J. Oper. Res. Vol. 247, pp. 124–136, 2015.
doi:10.1016/j.ejor.2015.05.030.
[4] M. Ala’raj and M. F. Abbod, “Classifiers consensus system approach
for credit scoring,” Knowledge-Based Syst., vol. 104, pp. 89–105,
2015.
[5] F. Louzada, A. Ara, G. B. Fernandes, “Surveys in Operations
Research and Management Science Classification methods applied to
credit scoring : Systematic review and overall comparison,” Surv.
Oper. Res. Manag. Sci. 2016. doi:10.1016/j.sorms.2016.10.001.
[6] H. Xiao, Z. Xiao, Y. Wang, “Ensemble classification based on
supervised clustering for credit scoring,” Appl. Soft Comput. J. Vol.
43, pp. 73–86, 2016. doi:10.1016/j.asoc.2016.02.022.
[7] J. Abellán, J. G. Castellano, “A comparative study on base classifiers
in ensemble methods for credit scoring,” Expert Syst. Appl. Vol. 73,
pp. 1–10, 2017. doi:10.1016/j.eswa.2016.12.020.