Machine Learning Models and Bankruptcy Prediction Paper File
Machine Learning Models and Bankruptcy Prediction Paper File
a r t i c l e i n f o a b s t r a c t
Article history: There has been intensive research from academics and practitioners regarding models for predicting
Received 22 September 2015 bankruptcy and default events, for credit risk management. Seminal academic research has evaluated
Revised 4 March 2017
bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression) and
Accepted 3 April 2017
early artificial intelligence models (e.g. artificial neural networks). In this study, we test machine learn-
Available online 10 April 2017
ing models (support vector machines, bagging, boosting, and random forest) to predict bankruptcy one
JEL Classification: year prior to the event, and compare their performance with results from discriminant analysis, logistic
C45 regression, and neural networks. We use data from 1985 to 2013 on North American firms, integrating
C52 information from the Salomon Center database and Compustat, analysing more than 10,0 0 0 firm-year
C63 observations. The key insight of the study is a substantial improvement in prediction accuracy using
G33 machine learning techniques especially when, in addition to the original Altman’s Z-score variables, we
L25
include six complementary financial indicators. Based on Carton and Hofer (2006), we use new variables,
Keywords: such as the operating margin, change in return-on-equity, change in price-to-book, and growth measures
Bankruptcy prediction related to assets, sales, and number of employees, as predictive variables. Machine learning models show,
Machine learning on average, approximately 10% more accuracy in relation to traditional models. Comparing the best mod-
Support vector machines els, with all predictive variables, the machine learning technique related to random forest led to 87%
Boosting accuracy, whereas logistic regression and linear discriminant analysis led to 69% and 50% accuracy, re-
Bagging
spectively, in the testing sample. We find that bagging, boosting, and random forest models outperform
Random forest
the others techniques, and that all prediction accuracy in the testing sample improves when the addi-
tional variables are included. Our research adds to the discussion of the continuing debate about superi-
ority of computational methods over statistical techniques such as in Tsai, Hsu, and Yen (2014) and Yeh,
Chi, and Lin (2014). In particular, for machine learning mechanisms, we do not find SVM to lead to higher
accuracy rates than other models. This result contradicts outcomes from Danenas and Garsva (2015) and
Cleofas-Sanchez, Garcia, Marques, and Senchez (2016), but corroborates, for instance, Wang, Ma, and Yang
(2014), Liang, Lu, Tsai, and Shih (2016), and Cano et al. (2017). Our study supports the applicability of the
expert systems by practitioners as in Heo and Yang (2014), Kim, Kang, and Kim (2015) and Xiao, Xiao,
and Wang (2016).
© 2017 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2017.04.006
0957-4174/© 2017 Elsevier Ltd. All rights reserved.
406 F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417
Researchers and practitioners have sought to improve ure will probably continue in the short and medium terms, as new
bankruptcy forecasting models using various quantitative ap- techniques are frequently being suggested and, particularly for the
proaches. For example, Ohlson (1980) was one of the first study of corporate bankruptcy, failure events are subject to myriad
researchers to apply logistic regression analysis to default estima- variables. In this context, for instance, with the advancement of
tion. In contrast to the model of Altman (1968), which generates technology, data scraping will allow the observation of new vari-
a score by which to classify observations between good and bad ables that could be relevant inputs to machine learning models and
payers, Ohlson’s model (Ohlson, 1980) determines the default lead to different results.
probability of the potential borrower. Second, the variety of techniques and the applicability to practi-
Given the relative ease of running discriminant analysis and lo- tioners can also be considered contributions of the study. By using
gistic regression, several subsequent studies have sought to per- raw data and considering standardized computer settings for the
form similar tests (e.g. Hillegeist, Keating, Cram, and Lundst- machine learning techniques, all our models can be easily repli-
edt (2004), Upneja and Dalbor (2001), Griffin and Lemmon (2002), cated, not only by academics, but also by market practitioners. In
and Chen, Chollete, and Ray (2010)). However, Begley, Ming, this context, these models can be implemented in real world situ-
and Watts (1996) argued that the popular models based on ations to address, for instance, the case of investors that could bet-
Altman (1968) and Ohlson (1980) had become inaccurate and sug- ter understand and analyse strategic credit decisions, and the case
gested the need for enhancements in the modelling of default risk. of lender institutions that can improve their credit risk controls,
Academics and practitioners are exploring artificial intelligence based on results of machine learning models. Finally, we analyse
and machine learning tools to assess credit risk amid advances a large database of corporate failure in the United States, by in-
in computer technology. Since credit risk analysis is similar to tegrating data from 1985 and 2013 from the Salomon Center and
pattern-recognition problems, algorithms can be used to classify Compustat. The use of a broad database of public companies, with
the creditworthiness of counterparties (Kruppa, Schwarz, Arminger, more than 10,0 0 0 firm-year data records in the test set, is un-
& Ziegler, 2013; Pal, Kupka, Aneja, & Militky, 2016), thus improv- usual in machine learning studies of corporate credit risk and can
ing upon traditional models based on simpler multivariate statis- reveal relevant information of corporate bankruptcy in the North
tical techniques such as discriminant analysis and logistic regres- American environment. More specifically, various papers, such as
sion. Other methods have also been developed, offering new al- Wang et al. (2014); Yeh, Chi, and Lin (2014); Zhao et al. (2014),
ternatives for credit risk analysis. Among these, we highlight ma- and Xiao, Xiao, and Wang (2016), use a smaller number of observa-
chine learning methods. Support vector machines (SVMs)(Cortes & tions of specific banks or credit card companies. Although results
Vapnik, 1995), for example, generate functions similar to discrim- of these studies can convey information on adequacy of machine
inant analysis, but they are not subject to series of assumptions learning models, they are usually confined to specific characteris-
and so are less restrictive. Other machine learning methods with tics of some financial institutions and their clients. In this context,
wide applicability to predictive models have also been proposed, results of our analysis can be more general, allowing for the under-
including default models such as boosting, bagging, and random standing of default, not in a specific bank, but rather in the North
forest models. Artificial neural networks (ANN) have been applied American market for corporate loans. We highlight that, to the best
in many contexts as well. The incorporation of these machine of our knowledge, we did not find, in the machine learning lit-
learning algorithms seems promising. For example, Nanni and erature, studies of corporate bankruptcy that investigate a similar
Lumini (2009) used Australian, German, and Japanese financial number of observations, with all the techniques employed in our
datasets to find that machine learning techniques, such as ensem- study.
ble methods, lead to better classification than standalone methods. Our work investigated the performance of different classifi-
Although many studies have analysed corporate solvency us- cation techniques by considering various machine learning algo-
ing modern computational techniques, Wang et al. (2014) found rithms applied to the practical problem of default prediction. In
that the results did not identify the best method, since model a comparative study, we used data from a training set of defaulted
performance depended on the specific characteristics of the clas- and non-defaulted firms covering 1985 to 2005 and a validation set
sification problem and on the data structure (Duéñez Guzmán & covering 2006 to 2013, thus obtaining a confusion matrix. Overall
Vose, 2013). Furthermore, Wang, Hao, Ma, and Jiang (2011) used accuracy indicators and area under the receiver operating charac-
ensemble methods (bagging, boosting, and stacking) coupled with teristic (ROC) curve (AUC) were employed as performance metrics
base learners (logistic regression, decision trees, ANN, and SVM) to compare the models. To evaluate the significance of the vari-
to find that bagging outperformed boosting for all credit databases ables used in this study, its results were compared with those pro-
they analysed. duced when the same models used only the Z-score variables. All
Several studies have dealt with the discussion of strengths the models showed lower accuracy when the number of variables
and weaknesses of machine learning in many different disciplines, was reduced, and the models with fewer variables produced higher
such as Subasi and Ismail Gursoy (2010) and de Menezes, Liska, type I and type II error rates.
Cirillo, and Vivanco (2017) in medicine; Laha, Ren, and Sugan- The rest of the paper proceeds as follows. In Section 2, we
than (2015); Maione et al. (2016) and Cano et al. (2017) in chem- briefly discuss the main machine learning models. In Section 3,
istry; Bernard, Chang, Popescu, and Graf (2017) in education; and we present the study’s method and data. We discuss the classifi-
Cleofas-Sánchez, García, Marqués, and Sénchez (2016); Heo and cation results of the models in Section 4. In Section 5, we present
Yang (2014); Kim, Kang, and Kim (2015) and Gerlein, McGinnity, final comments, discuss the implications of the study, including the
Belatreche, and Coleman (2016) in finance. However, our study strengths and weakenesses of the paper, and offer suggestions for
does contribute to this debate. future research.
First, our study focuses on the comparison of traditional sta-
tistical methods and machine learning techniques for predicting 2. Theoretical background
corporate bankruptcy. Although some papers have studied credit
default and machine learning (Danenas & Garsva, 2015; du Jardin, Machine learning methods are considered to be among the
2016; Tsai, Hsu, & Yen, 2014; Wang et al., 2014; Zhou et al., 2014), most important of the recent advances in applied mathematics,
new studies, exploring different models, contexts and datasets, are with significant implications for classification problems (Tian, Shi,
relevant, since results regarding the superiority of models are still & Liu, 2012). Machine learning techniques assess patterns in ob-
inconclusive. The debate over the best models for predicting fail- servations of the same classification and identify features that dif-
F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417 407
ferentiate the observations of different groups. Machine learning where γ is a positive constant. Eq. (3) is called the linear kernel
studies are found across a wide range of research fields, including and (4) is the ‘radial basis function’ (RBF). The linear kernel does
medicine (Noble, 2006; Subasi & Ismail Gursoy, 2010), engineer- not provide strong predictability in non-separable datasets, due to
ing (Oskoei & Hu, 2008), and computing (Osuna, Freund, & Girosi, the complexity of the empirical analysis, but the results are easily
1997). In this study, machine learning mechanisms are designed to interpreted by users. Meanwhile, although the RBF kernel is diffi-
distinguish between bankrupt and non-bankrupt companies based cult to analyse, or even discuss, it provides superior predictions in
on firm characteristics such as profitability, liquidity, leverage, size, non-separable cases.
and growth measures. We compare applications of SVM, boost- The SVM method is discussed in detail in Cortes and Vap-
ing, bagging, and random forest methods with artificial neural net- nik (1995); Min and Lee (2005), and Yu et al. (2010).
works, logistic regression, and discriminant analysis. This section
briefly reviews each of these mechanisms, considering each one’s 2.2. Bagging
specific goals, mathematical modelling, and learning algorithms.
The solution of the credit analysis problem – specifically, of ap- Bagging, also known as ‘bootstrap aggregating’, is a technique
plication scoring – involves an identification of the category (e.g. involving independent classifiers that uses portions of the data and
good vs. bad borrower, bankrupt vs. non-bankrupt firm) to which then combines them through model averaging, providing the most
each observation belongs. The procedure is based on the defini- efficient results concerning a collection (Breiman, 1996). Bagging
tion of potential discriminant variables and the identification of creates random new subsets of data through sampling, with re-
weights or coefficients that can be used in mathematical functions placement, from a given dataset, generating confidence-interval es-
that could segregate the groups. timates (Figini, Savona, & Vezzoli, 2016). The objective of bagging
is to reduce the overfitting of a class within the model. Rather than
2.1. Support vector machines using the collection to check if the model is overfitted, the training
set is recombined to produce better classifiers.
Following Noble (2006), the SVM optimisation model is based Our bagging algorithm, based on Breiman (1996), follows these
on the transformation of a mathematical function by another func- steps:
tion, called the ‘kernel’, by which one identifies the greatest dis-
tance between the most similar observations that are oppositely 1. A random bootstrap set, t, is selected from the parent dataset.
classified. 2. Classifiers Ct are configured on the dataset from step 1.
A common criterion is whether the groups are completely sep- 3. Steps 1 and 2 are repeated for t = 1, . . . , T .
arable, as this would allow the SVM to build a model with 100% 4. Each classifier determines a vote,
accuracy. In finance, doing so is virtually impossible because eco-
nomic variables are influenced by noise in empirical data and are
T
C (x ) = T −1 Ct (x ) (5)
often biased. For classification problems involving partially separa-
t=1
ble groups, the SVM method allows the inclusion of a margin of
error (Zhou et al., 2014).
In general, the number of variables is not a constraint on the where x is the data of each element from the training set. In the
optimisation problem (Trustorff, Konrad, & Leker, 2010). The algo- last step, the class that receives the largest number of votes is cho-
rithm associated with the quantitative model establishes a classifi- sen as the classifier for the dataset.
cation mechanism, calibrating parameters using a training set (i.e.
the algorithm learns from the training data). The resulting classifi- 2.3. Boosting
cation scheme can then be applied to predict the grouping or clas-
sification of new observations. The validation set is usually evalu- The boosting technique consists of the repeated use of a base
ated by comparing the classification given by the model with the prediction rule or function on different sets of the initial set. Boost-
actual group to which the observation belongs. The validation and ing builds on other classification schemes and assigns a weight
training sets are independent: no observations are common be- to each training set, which is then incorporated into the model
tween them (Yu, Yue, Wang, & Lai, 2010). (Begley et al., 1996). The data are then reweighted. Boosting can
From Li, Wang, and He (2013), the optimisation problem can be apply the base classifier to find a model that better classifies the
summarised as set, identified by a low error rate for the training set.
A derived algorithm, AdaBoost (adaptive boost) has proved suc-
1 T M
Minimise w w+C ξi , (1) cessful for classification prediction (e.g. Kim & Upneja (2014)). Ad-
2 aBoost initialises the weights of all m observations at 1/m. Thus,
i=1
the first sample is uniformly generated from the initial observa-
subject to
tions. After the training set, Xi , is extracted from X, a classifier Yi is
yi [wT φ (xi ) + b] ≥ 1 − ξi , (2) trained on Xi . The error rate is calculated, considering the number
of observations of the training set. The new weight for each obser-
where i = 1, 2, . . . , M, ξ i ≥ 0 are the margins of error related to
vation is based on the effectiveness of the classifier Yi . If the error
classification cost C, yi are the classifications in the training set,
rate is greater than a random guess, the test set is discarded, and
and φ (x) transforms space RM . One advantage of this technique
another set is generated with the original weights (initially 1/m).
is that φ (x) does not need to be known, since a kernel function
If the error rate is satisfactory, the weights of the observation are
(K (x ) = K (xi , x j )) is applied so that K (x ) = φ (xi )T .φ (x j ).
updated according to the importance of the classifier. These new
The kernel function is predetermined in the algorithm and a
weights are then used to generate another sample from the initial
solution to the optimisation problem (Eqs. (1) and (2)). The tradi-
observations. Our algorithm follows Heo and Yang (2014):
tional kernel functions are
K (xi , x j ) =< xi , x j >, (3) 1. A distribution of weights, w1 (i ) = 1/m is created, where i =
1, 2, . . . , m; and wt is the iterative weighting (t = 1, . . . , T ),
and
wt (i )eαt (2I (yi =ht )−1)
K (xi , x j ) = e(−γ ||xi −x j || ) . wt+1 (i ) =
2
(4) , (6)
wt (i )eαt I (yi =ht )
408 F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417
No change/growth variables.
Summary of relevant studies in the paper context. ACC means the accuracy of the model presented; Obs. Amount is the dataset size; Attrib shows how many explanatory variables were applied.
Weakness
t=1
boosting technique.
Combined techniques
of new variables
can provide better results than boosting can (Kruppa et al., 2013).
company size.
ACC Measure
It is particularly robust and allows for the presence of outliers
selection
and noise in the training set (Yeh et al., 2014). Finally, RF iden-
Benefits
tifies the importance of each variable in the classification results.
Therefore, it provides not only the classification of observations,
but also information about the determinants of separation among
groups Maione et al. (2016). The RF technique follows an approach
Risk of bankruptcy
Prediction concern
similar to bagging, as it repeatedly generates classification func-
tions based on subsets. However, RF randomly selects a subset of
Price trending
Credit scoring
Fin. distress
Bankruptcy
Bankruptcy
Bankruptcy
Bankruptcy
Bankruptcy
characteristics from each node of the tree, avoiding correlation in
the bootstrapped sets (Booth, Gerding, & McGroarty, 2014; Cano
et al., 2017; Yeh et al., 2014). The forest is built for several sub-
sets that generate the same number of classification trees. The pre-
ferred class is defined by a majority of votes, thus providing more
30; 24
20; 14
Attrib.
12–30
precise forecasts and, most importantly, avoiding data overfitting
190
95
30
30
12
51
(Breiman (2001)).
Our RF algorithm follows Yeh et al. (2014):
Obs. amount
240–8200
10 0 0;70 0
1. Create random subsets of the parent set, composed of an arbi-
240; 132
13 years
9 years
30,0 0 0
10,0 0 0
trary number of observations and different features.
440
480
2. Each subset from step 1 produces a decision tree, and all ele-
ments of the set have a label (correct or not).
Linear Reg., Reg. Tree, NN, SVR, 0.01 (MAPE)
3. For each element, the forest takes a large number of votes. The
82%; 87%
class with the most votes is chosen as the preferred classifica-
tion of the element.
94%
86%
93%
78%
95%
82%
Acc
A more detailed discussion on random forests, including a more Boosting, Bagging, DT, NN, NB,
NN, SVM, Boosting, Bagging
NN, SVM, LR
GMBoost
SVM, LR
Germany, Australia
UCSD, USA
Taiwan
with human neural processing (Park, Kim, & Lee, 2014; Tsai et al.,
Korea
Korea
Cleofas (2016)
and Schapire, 1997; Kim and Kang, 2010; and Kruppa et al., 2013).
Wang (2014)
Liang (2016)
Xiao (2016)
Tsai (2014)
For instance, Zhao et al. (2014) used German credit data to build
Kim(2014)
Heo(2014)
Paper
Table 2
Amount of failure (F) and Non-failure (NF) companies by year and per industry: (1) Agriculture,... ,(10) Wholesale trade.
Industry (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Total
Year F NF F NF F NF F NF F NF F NF F NF F NF F NF F NF F NF
model was somewhat better than traditional prediction mecha- tributed variables and sensitivity to outliers, made the logistic re-
nisms. gression (LR) a more popular alternative as a multivariate model
The ANN algorithm we use is similar to that used in for application scoring and, subsequently, for credit risk modelling
Wang et al. (2011) and Zhao et al. (2014). The model is a struc- (du Jardin, 2016). Not only are the assumptions less restrictive, but
ture (network) created in layers with linkages among nodes (neu- LR also produces a result in the [0, 1] interval that can be inter-
rons). Input variables determine the first layer of the modelling preted as the probability of a given observation being a member
system, and the final layer provides the output (dependent) vari- of a specific group (de Menezes et al., 2017). Several studies (e.g.
able. Here, the dependent variable is the classification of ‘bankrupt’ Kruppa et al. 2013; Trustorff et al. 2010) have shown that these
(one year before filing date) and ‘non-bankrupt’ companies. Since traditional multivariate methods are not as efficient or accurate as
default probability is also important in the model, we use a real are more recent machine learning techniques for credit risk classi-
number between 0 (bankrupt) and 1 (non-bankrupt). fication.
Three problems with the ANN technique have been identified
by Zhao et al. (2014): (i) its performance for unbalanced data is
3. Data and method
poor because it tends to classify more observations in classes with
more data and, reducing the test set’s forecasting performance; (ii)
We collected financial data on American and Canadian compa-
model accuracy improves as the training set becomes larger, but
nies covering 1985 to 2013 using Compustat. Information on firm
the validation is insufficient to provide a satisfactory error rate;
insolvency was collected from NYU’s Salomon Center database.
and (iii) selecting the hidden layers is difficult, given the relation-
A subset covering 1985 to 2005 was extracted to provide the
ship between computing time (i.e. more time is required for more
training set, which included information on 449 companies that
layers) and higher predictability. We adopt a scheme similar to that
filed for bankruptcy during this period as well as information on
in Zhao et al. (2014) to address these issues and apply ANN.
the same number of non-bankruptcy firms. Insolvent firms in the
training set include all companies in the database that filed for
2.6. Discriminant analysis and logistic regression bankruptcy during this period and for which financial data were
available three years prior to filing. The solvent firms were ran-
Multivariate discriminant analysis (MDA) was a breakthrough domly chosen and were limited to companies that did not file for
in credit risk assessment that occurred when Altman (1968) pre- bankruptcy during the entire period (1985–2005) and for which fi-
sented a study of bankruptcy among manufacturing firms that nancial data for at least two consecutive years were available. We
achieved relevant classification results. The method is based on selected the same number of solvent and insolvent firms, following
the minimisation of the variance among observations of the same Altman (1968), who also considered a balanced set. Table 2 show
group and the maximization of the distance between observations the number of solvent and insolvent firms in each year and indus-
of different groups (Mahmoudi & Duman, 2015). The method pro- try.
duces a score, and an observation is classified into a group de- We chose the predictive variables based on two important stud-
pending on the score relative to an arbitrary cut-off value. The ies: the seminal paper by Altman (1968) and a review of organisa-
restrictive assumptions of MDA, such as requiring normally dis- tional performance by Carton and Hofer (2006). Five variables fol-
410 F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417
Table 3
Predictive variables of the default classification model for one year prior to bankruptcy.
If bankruptcy filing occurred less than one semester after fiscal year end, data were
collected from the previous fiscal year. Some measures require changes over time to
compute these variables; we use data up to three years prior to bankruptcy.
Variable Formula
Net working capital
X1
Total assets
Retained earnings
X2
Total assets
Earnings before interest and taxes
X3
Total assets
Market value of share * number of shares
X4
Total debt
Sales
X5
Total assets
Earnings before interest and taxes
OM
Sales
Total assetst − Total assetst−1
GA
Total assetst−1
Salest − Salest−1
GS
Salest−1
Number of employeest − Number of employeest−1
GE
Number of employeest−1
Net income
CROE ROEt − ROEt−1 where ROE =
Common Stockholders’ equity
Market value per share
CPB Price-to-Bookt − Price-to-Bookt−1 where P/B =
Book value per share
low the relevant financial dimensions in Altman (1968): liquidity We implemented the models using the R statistical software
(X1), profitability (X2), productivity (X3), leverage (X4), and asset packages. Specifically, this study used ada, e1071, mboost, random-
turnover (X5). To evaluate the potential impact of other dimen- Forest, MASS, aod, and nnet to implement bagging, SVM, boosting,
sions in predicting bankruptcy, we also included indicators with random forest, MDA, Logit, and ANN, respectively. For the machine
a greater influence on financial performance models in the short learning models, the regression trees were used as base learners
term: growth of assets (GA), growth in sales (GS), growth in the for bagging and random forest, while recursive partitioning trees
number of employees (GE), operational margin (OM), change in were applied by SVM. It is important to note that these learners
return on equity (CROE), and change in price-to-book ratio (CPB) are presented in the default package settings.
(Carton & Hofer, 2006). Data were rearranged as variables (see It is important to analyse, especially for the use of traditional
Table 3). statistical techniques, namely, logistic regression, potential corre-
The validation set contains a randomly chosen group of 133 lations among variables. Table 5 depicts correlations in different
bankrupt firms and 13,300 companies considered solvent from scenarios, since firm-year data from non-defaulting companies are
2006 (which it is not included in the training set) and 2013. For chosen randomly.
bankrupt companies, we included all those firms with data avail- Almost all correlations are not relevant, except those between
able in the database at least one year before filing. If the event X1 (liquidity) and X2 (profitability), X2 and X3 (productivity) and,
occurred during the first half of the fiscal year, the data were col- to a lesser degree, X1 and X3, in the training sample with eleven
lected from the second preceding fiscal year. In another group, variables. The other samples do not show a relevant high corre-
all 13,167 solvent (non-bankrupt) companies were selected from a lation between variables. We investigated the database and found
random year within this period for the test set. that the high correlation observed in the specific training sam-
All variables were included in the models at their original val- ple derives from an outlier related to a defaulted firm. Since the
ues. No transformation, such as normalization, was conducted. Al- number of bankruptcies is relatively small, and the study using
though this procedure may reduce the predictive power of the 11 variables reduces the number of observations with non-missing
models, we aimed to analyse the adequacy of machine learn- data, the correlation was sensitive to the outlier in this particu-
ing techniques without relying on specific or special treatment lar sample. We chose to maintain the correlated variables in the
of data in the sample. The use of originally available data with- models, since they were also used in the seminal paper from
out any transformation was also followed, for example, by Cleofas- Altman (1968) and the results of predictions using different models
Sánchez et al. (2016); Heo and Yang (2014); Tsai et al. (2014). For can be compared. It is important to highlight that the new metrics
a brief visualization of the descriptive statistics, Table 4 presents a proposed in this work did not correlate with any other variable,
summary of the full sample. We also analysed the potential impact suggesting that they would be potential candidates to contribute
of missing values and found no relevant difference in the data, as to a better prediction of bankruptcy. To check for robustness, we
depicted in the Table 4. conducted a study of the correlation matrix, excluding the outlier,
Eight techniques were applied: and a study excluding one correlated variable (X2). The prediction
results do not substantially change. The results are presented in
• Bagging,
Tables 8–10, respectively, of the Supplementary Material.
• Boosting,
The ROC curve was calculated for all models for the training
• Random forest (RF),
and validation sets by using the ROCR package, providing a critical
• SVM with two kernels: linear (SVM-Lin) and radial basis func-
analysis of the evolution of machine learning. The AUC also pro-
tion (SVM-RBF),
vided a criterion of accuracy for the validation set: the AUC had to
• Artificial neural networks (ANN),
be more than 0.5 for the model to be acceptable, and the closer it
• Logistic regression (Logit), and
was to 1, the stronger its predictive power.
• MDA.
Table 4
Descriptive statistics (Minimum, 1st. quarter, median, mean, 3rd quarter, maximum, and standard deviation (SD) of the full sample. First, data including missing values (NA’s). Second, without NA. Third, only data from bankruptcy
firms. Fourth, only from non-bankrupt firms.
Variable X1 X2 X3 X4 X5 GA GS GE OM CR CPB BK
Min −15415 −134863 −23957.5 0.00 −17.195 −1 −58.66 −1 −30175.7 −166842.9 −107120.79 −1
1st Qu 0.031 −0.38 −0.04 0.73 0.462 −0.04 −0.02 −0.06 −0.017 −0.06 −0.59 1
Median 0.212 0.09 0.064 1.83 1.006 0.07 0.1 0.02 0.06 0 −0.03 1
−1.071 −15.24 −0.731 −3.554 −0.32 −0.41
Variable X1 X2 X3 X4 X5 GA GS GE OM CR CPB BK
Min −3800.375 −24638 −23957.5 0.00 −11.5385 −0.9995 −50.286 −1 −30175.7 −166842.9 −107120.79 −1
1st Qu 0.062 −0.198 −0.006 0.68 0.6257 −0.0377 −0.023 −0.0563 −0.005 −0.06 −0.55 1
Median 0.238 0.144 0.072 1.6 1.1 0.0665 0.093 0.0217 0.061 0 −0.03 1
Mean −0.036 −2.858 −0.215 6.85 1.2895 0.2375 0.856 0.186 −2.65 −0.53 −0.62 0.9941
3rd Qu 0.418 0.344 0.126 4.17 1.6258 0.1991 0.24 0.1406 0.122 0.04 0.4 1
Max 16.238 140.582 35.917 188244 434.9835 2405 15054 2699 394.474 26773.44 18555.57 1
SD 18.38568 129.8665 54.37543 428.4017 2.299579 8.833552 55.07403 7.770179 107.917 395.095 273.8783 0.1088771
Failures X1 X2 X3 X4 X5 GA GS GE OM CR CPB BK
Min −4.057 −33.5285 −2.8514 0.0 0 0 0 0.0 0 0 0 −0.8786 −0.9832 −0.912 −20.2059 −538.1912 −878.61 −1
1st Qu −0.07863 −0.52582 −0.06963 0.0694 0.6171 −0.1645 −0.08415 −0.14348 −0.07995 −1.9442 −1.2222 −1
Median 0.0725 −0.1574 0.0 0 04 0.1874 1.1184 −0.0375 0.00765 −0.0414 0.0 0 04 −0.3272 −0.2987 −1
Mean −0.02921 −0.60389 −0.05185 0.406 1.2459 0.1238 0.42396 0.77861 −0.27476 −3.9153 −2.8037 −1
3rd Qu 0.1873 0.01365 0.04097 0.4324 1.611 0.1251 0.24672 0.08282 0.03692 −0.0128 0.3538 −1
Max 0.779 1.9423 0.4904 6.5718 7.8175 13.4138 48.1326 311.5 0.3839 363.9792 69.6994 −1
SD 0.440936 2.201727 0.2224851 0.6662472 0.9397091 0.8645904 2.923344 13.30826 1.464162 39.35201 38.36061 0
Non−Failures X1 X2 X3 X4 X5 GA GS GE OM CR CPB BK
Min −3800.375 −24638.0 0 0 −23957.500 0.00 −11.5385 −0.9995 −50.286 −1.0 0 0 0 −30175.700 −166842.90 −107120.79 1
1st Qu 0.063 −0.199 −0.006 0.68 0.6284 −0.0376 −0.023 −0.0563 −0.005 −0.06 −0.55 1
Median 0.240 0.1450 0.072 1.62 1.1019 0.0665 0.093 0.0220 0.061 0.00 −0.03 1
Mean −0.037 −2.874 −0.216 6.89 1.2916 0.2383 0.860 0.1847 −2.666 −0.52 −0.62 1
3rd Qu 0.418 0.346 0.126 4.20 1.6272 0.1996 0.240 0.1413 0.121 0.04 0.41 1
Max 16.238 140.582 35.917 188244.00 434.9835 2405.0 0 0 0 15054.0 0 0 2699.0 0 0 0 394.474 26773.44 18555.57 1
SD 18.44065 130.2547 54.53805 429.6827 2.305391 8.859838 55.2385 7.759349 108.239527 396.270799 274.689357 0
411
412 F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417
Table 5
Correlation matrices for main datasets: full sample, training and testing samples using Altman’s 5 variables, and training and testing samples for all 11 variables.
X1
X2 0.72∗ ∗ ∗
X3 0.09∗ ∗ ∗ 0.47∗ ∗ ∗
X4 0 0 0
X5 −0.19∗ ∗ ∗ −0.29∗ ∗ ∗ −0.25∗ ∗ ∗ 0
GA 0 0 0 0.01∗ ∗ ∗ −0.01∗
GS 0 0 0 0 0 0.07∗ ∗ ∗
GE 0 0 0 0 0 0.03∗ ∗ ∗ 0.02∗ ∗ ∗
OM 0.01∗ ∗ ∗ 0.01∗ ∗ ∗ 0.01∗ 0 0.01∗ ∗ ∗ −0.01∗ ∗ 0 0
CR 0 0 0 0 0 0 0 0 0
CPB 0 0 0 0 0 0 0 0 0 0
BK 0 0 0 0 0 0 0 0 0 0 0
Train X1 X2 X3 X4 X5 Test X1 X2 X3 X4 X5
X1 X1
X2 0.68∗ ∗ ∗ X2 0.59∗ ∗ ∗
X3 0.19∗ ∗ ∗ 0.38∗ ∗ ∗ X3 0.24∗ ∗ ∗ 0.35∗ ∗ ∗
X4 0 0 0 X4 0 0 0
X5 −0.26∗ ∗ ∗ −0.32∗ ∗ ∗ −0.26∗ ∗ ∗ 0 X5 −0.07∗ ∗ ∗ −0.15∗ ∗ ∗ −0.22∗ ∗ ∗ 0
BK 0 0 0 0 0 BK 0 0 0 0 0
Train X1 X2 X3 X4 X5 GA GS GE OM CR CPB
X1
∗∗∗
X2 0.71
X3 0.12∗ ∗ ∗ 0.71∗ ∗ ∗
X4 0 0 0
X5 −0.31∗ ∗ ∗ −0.45∗ ∗ ∗ −0.28∗ ∗ ∗ −0.02∗ ∗ ∗
GA 0 0 0 0.01∗ ∗ 0
GS 0 0 0 0 0.01∗ 0.07∗ ∗ ∗
GE 0 0 0 0 0 0.07∗ ∗ ∗ 0.05∗ ∗ ∗
OM 0.01∗ ∗ ∗ 0.01∗ ∗ ∗ 0.01∗ ∗ −0.02∗ ∗ ∗ 0.01∗ ∗ ∗ −0.01∗ ∗ 0 0
CR 0 0 0 0 0 0 0 0 0
CPB 0 0 0 0 0 0 0 0 0 0
BK 0 0 0 0.01∗ 0 0 0 −0.01∗ ∗ 0 0 0
Test X1 X2 X3 X4 X5 GA GE GS OM CR CPB
X1
X2 0.58∗ ∗ ∗
X3 0.30∗ ∗ ∗ 0.26∗ ∗ ∗
X4 0.01 0 -0.05∗ ∗ ∗
X5 -0.36∗ ∗ ∗ -0.31∗ ∗ ∗ 0.01 -0.06∗ ∗ ∗
GA 0 0 0 0 0.01
GE 0 0 0 0 -0.01 0.05∗ ∗ ∗
GS 0 0 0 0 0 0.07∗ ∗ ∗ 0.01
OM 0.13∗ ∗ ∗ 0.09∗ ∗ ∗ 0.29∗ ∗ ∗ -0.05∗ ∗ ∗ 0.04∗ ∗ ∗ 0 0 0
CR 0 0 0 0 0 0 0 0 0
CPB 0 -0.01 -0.03∗ ∗ 0.01 -0.02∗ 0 0 0.01 -0.01 0
BK 0 -0.01 0 0.02∗ ∗ -0.01 0 0 0 -0.01 0 0
∗ ∗∗ ∗∗∗
Note: p< 0.05; p< 0.01; and p< 0.001
Two commonly applied performance rates (Kim & Upneja, 2014; is low, and specificity is close to 1 when type II error is low. For
Wang, Ma, Huang, & Xu, 2012) were calculated: the true positive bankruptcy, there is a preference for higher sensitivity because this
rate (TPR) or sensitivity and the true negative rate (TNR) or speci- translates into losses for lenders, whereas specificity is the thresh-
ficity, which are equivalent to 1−type I error and 1−type II error, old for gain. Fig. 1 illustrates our methodology.
respectively. The predictive power or accuracy (ACC) was calculated
as the number of accurate classifications divided by the total num- 4. Results
ber of elements in the validation set. These indicators are equiva-
lent to those proposed by Altman (1968)); hence, we can compare Table 6 shows the outcomes for traditional and machine learn-
them with his outcomes directly. The variables are given by: ing models in the training and the test sets. We used a standard
TP MacBook Air (4GB DDR3L RAM Memory, 64GB of flash storage,
Sensitivity = TPR =1 − Type I Error = (8)
TP + FN 1.7GHz Intel Core i5 processor, and Mac OS X as operating system)
with the R software, version 3.1.1 installed, and all packages cited
TN below.
Specificity = TNR =1 − Type II Error = (9)
TN + FP The bagging and RF techniques show high accuracy in the
where TP is True Positive, that is, bankrupt firms classified cor- training phase. This outcome was expected, since both use de-
rectly and TN is True Negative, that is, non-bankrupt firms clas- cision trees, which can cause model overfitting in the training
sified correctly. Sensitivity has values close to 1 when type I error set. However, this does not mean that they are good models, as
F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417 413
Fig. 1. Graphic abstract: 11 variables were selected, including Z-score variables and growth measures, change variables, and margins suggested by Carton and Hofer (2006).
Machine learning models used: a support vector machine using linear and radial basis function kernels, boosting, bagging, and random forest. Traditional models used:
artificial neural networks, logistic regression, and multivariate discriminant analysis. The results are presented as confusion matrices and ROC curves.
Table 6
The 13,300 firms tested in our models. Type I error means the portion of bankrupt companies that were predicted
to be non-bankrupt. Type II error means the portion of non-bankrupt companies that were predicted to be bankrupt.
AUC is the area under the ROC curve, and ACC is total estimated accuracy.
Training sample
Model TP TN FP FN Type I Error (%) Type II Error (%) AUC (%) ACC (%)
1.0
0.8
True positive rate
0.6
0.4
SVMLin
SVM RBF
Boosting
0.2
Bagging
Random Forest
NeuralNet
Logit
LDA
0.0
evidenced by the significantly poorer outcomes of the validation We performed several tests with the selection of variables. Us-
set. Table 6 shows the metrics (AUC, ACC, Type I and Type II errors) ing only the original Z-score (Altman, 1968) variables leads to sig-
discussed above when the models were applied to the test set. Ma- nificantly inferior performance in terms of predictive power in this
chine learning models outperform traditional models measured on more recent database. The success rate was comparable to that
the validation set, except SVM-Lin. The linear structure applied to for the model that used only the variables selected by Carton and
separate the two classes (bankrupt vs. non-bankrupt) causes this Hofer (2006). Our outcomes show the importance using more ex-
weak performance, since MDA, which uses a linear process, dis- planatory variables, since bankruptcy may be reflected by many
plays a similar accuracy problem. Thus, non-linear techniques are different indicators. Table 7 shows the results using only the vari-
necessary when more variables are used to predict bankruptcy. ables from Altman (1968). In this case, 14 companies were added
While the ANN model had a lower type I error (6.8%), followed to the test set to replace companies that lacked sufficient data to
by SVM-Lin (7.5%), their type II error is considerably higher than calculate growth rates; therefore, data from only one year before
that of other machine learning models (i.e. ANN [27.2%], SVM-Lin the event were used. The 14,553 healthy companies were resam-
[28,7%] as opposed to boosting [13.3%], bagging [14.3%], and RF pled randomly, following the proposed methodology.
[12.9%]). Boosting provides the most accurate AUC rate (92.9%), but The results show reduced performance among all forecasting
RF returns the lowest type II error (12.9%) and the best total ac- models. While ANN and SVM-RBF were slightly improved for AUC
curacy rate (87.1%). The machine learning model’s errors and pre- (by 1% and 4%, respectively), ANN decreased in type I error. The
diction rates are all better than those of the MDA model. While SVM-Lin model, although presenting reduced accuracy, improved
Logit performs acceptably in classifying bankruptcy firms (11.3% its prediction of bankrupt companies, which also occurred with the
type II error), misclassification of solvent companies indicates a MDA model. This is most likely due to the reduction in variables
poor model (23.8%). allowing a more linear cluster and thus a better performance for
Bagging shows reasonable performance and may provide an in- these linear models. Bagging achieved the highest AUC rate. How-
teresting alternative to the more computationally intense machine ever, the type I errors of the three most precise models (boosting,
learning systems. Type II error was the second-best rate among the bagging and RF) were significantly impacted (approximately 23%),
eight models tested (1.4% above RF) and third-best for AUC and whereas MDA (19%) and Logit (9.5%) outperformed these models
ACC. Fig. 2 shows the ROC curve of each model. The machine learn- for this measure. Type II error shows the opposite result with a
ing models show significant superiority over MDA and Logit, except large separation (≈ 20% for Logit and ≈ 36% for MDA), generating a
SVM-Lin, which is most similar to MDA, due to their inherently significantly better ACC. Following the methodology outlined here,
linear models, as discussed above. While it is difficult to confirm a we also investigated different periods: training in pre-crisis periods
single preferred technique from the curves, bagging, boosting, and and validation in crisis periods (and vice versa); training for five
RF are the most promising candidates. consecutive years and tests one year later; and training for three
F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417 415
Table 7
Predictions of the models applied using the variables of the popular Altman model (Altman, 1968). All methods show
loss of predictability, sensibility, and specificity. This shows the importance of including new variables in the models.
Model TP TN FP FN Type I Error (%) Type II Error (%) AUC (%) ACC (%)
consecutive years and validation in the three subsequent years. De- formation about short-term evolution of a variable in a machine
tailed results are omitted for brevity. The various outcomes were learning model. Results show that, by adding variables that depict
inferior to those of the model described here, perhaps indicating a dynamic yet simple behaviour of firms, machine learning tech-
that historical data are not relevant to current events. However, in niques may improve predictive accuracy.
all tests without exception, the three machine learning techniques We also highlight that our using of an extensive database of
– boosting, bagging and RF – showed better outcomes in all tests, defaults is not common in papers related to corporate bankruptcy.
with the latter usually being the best. Many studies that investigate default are restricted to a specific
credit product of a specific financial institution. Since we use broad
5. Conclusions training and testing samples, the results can be useful in analysing
bankruptcy of the overall North American corporate credit market.
Bankruptcy prediction is associated with credit risk, which has The number of observations also reduces problems of overfitting.
been thrust into the spotlight due to the recent financial crisis. Ma- Accuracy rates close to 90% in the testing sample represents evi-
chine learning models have been very successful in finance appli- dence of the suitability of machine learning models to analyse cor-
cations, and many studies examine their use in bankruptcy pre- porate default. It is also important to discuss the weaknesses of
diction. The Altman and Ohlson models are still relevant, due not the study. As we used default parameters of the algorithms imple-
only to their predictive power but also to their simple, practi- mented in R packages, we do not take advantage of the full ca-
cal, and consistent frameworks. Few studies can improve on their pacity of the models. Nonetheless, even by using simple settings
results concerning forecasting accuracy or the simplicity of the of the algorithms, results show a relevant superiority of machine
models. learning techniques.
Regarding accuracy ratios, our results show that the traditional In addition, our study did not focus on feature selection, which
models (MDA, LR, and ANN) have lower predictive capacity (be- is a common procedure in recent studies that explore a high num-
tween 52% and 77%) than the machine learning models (71% to ber of variables. However, as usual in studies regarding bankruptcy,
87%), corroborating Wang et al. (2012), Breiman (2001), Kim and one has access to a very limited number of variables. In partic-
Upneja (2014) and Chen et al. (2010). New studies can adapt these ular, even though we use a large number of observations, the
machine learning techniques for other credit risk studies, such as databases supplied a limited number of variables. Therefore, the
general default events not limited to bankruptcy. impact of feature selection would not be prominent in our study.
However, machine models are not perfect. The SVM-Lin results Pal et al. (2016) argues that feature selection in the finance context
show that it is difficult to address non-separable datasets, which “depends upon the individual judgment of the analyst or group
produces more misclassifications. SVM-RBF, a non-linear kernel, re- decision-making. This makes the theoretical basis for the feature
duced error rates but had weaker performance than other ma- selection limited and less reliable”. Finally, another limitation of
chine learning models. Moreover, SVM models took significantly the study is not considering different classification costs, similarly
more computational processing time than others did; however, this to Cleofas-Sánchez et al. (2016); Liang, Lu, Tsai, and Shih (2016),
was not an important issue because no model took more than and Mahmoudi and Duman (2015). We find that, especially for
one minute to run. Although bagging, boosting, and RF incorporate prediction of bankruptcy, accuracy should not be the only perfor-
similar procedures, RF generally produced better accuracy and er- mance metric, and future research should focus on adjusting classi-
ror rates. Output variability is a critical problem typically found in fication models by considering different impacts of type I and type
ANNs, whereas the machine learning models produced stable solu- II errors.
tions. Credit risk applications – specifically, default prediction –
We now highlight some strengths of the study. First, we argue should be further investigated, particularly in efforts to obtain
that significant predictive accuracy was achieved by using machine models related to macroeconomic variables. Several papers have
learning models under restrictive conditions such as existence of found relationships between default and macroeconomic variables
high correlated variables, outliers, and missing values. Since we use (e.g. Ali & Daly, 2010; Bonfim, 2009; Chen & Wu, 2014; Yurdakul,
raw data, without making any transformation or adjustments to 2014). We chose not to use these macroeconomic data as inputs in
the variables, results suggest that machine learning techniques can the models because the effects of our metrics in terms of firm-
easily be applied and generate substantial classification accuracy, specific measures produced relevant outcomes. This choice may
when compared to traditional mechanisms such as linear discrim- constitute a limitation that could be explored in a subsequent
inant analysis, logistic regression, and artificial neural networks. study. However, given the scope of this paper, further research is
Another differential element of the study is the use of metrics needed to incorporate the impact of macroeconomic variables such
that reflect growth or change in some variables, which are not usu- as sustainability, governance, sovereign risk, credit spreads, and
ally incorporated into predictive models of bankruptcy. Following firm performance. Another limitation of this study is its validation
arguments from Carton and Hofer (2006), we argue that failure of procedure. The overfitting analysis of machine learning models is
firms is likely to follow from difficulties over time and not just in not well explained in the literature. Questions about which tools
the year prior to bankruptcy. Instead of using the time series or and theories are most appropriate for such analyses remain open
survival analysis approaches, our procedure easily incorporates in- to further investigation.
416 F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417
We can highlight major implications of the study, from two dif- Booth, A., Gerding, E., & McGroarty, F. (2014). Automated trading with performance
ferent perspectives. First, for academics, we tested machine learn- weighted random forests and seasonality. Expert Systems with Applications, 41(8),
3651–3661.
ing models using an unusually large sample for the study of cor- Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
porate bankruptcy. We use a very representative sample, including Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
several sectors, with more than 13,0 0 0 observations. Thus, consid- Calderoni, L., Ferrara, M., Franco, A., & Maio, D. (2015). Indoor localization in a hos-
pital environment using random forest classifiers. Expert Systems with Applica-
ering the large sample and the high predictive accuracy, results are tions, 42(1), 125–134.
not restricted to a specific bank or credit portfolio and can be use- Cano, G., Garcia-Rodriguez, J., Garcia-Garcia, A., Perez-Sanchez, H., Benedikts-
ful to analyse failure in North American corporate loans. Second, son, J. A., Thapa, A., & Barr, A. (2017). Automatic selection of molecular descrip-
tors using random forest: Application to drug discovery. Expert Systems with Ap-
for practitioners, the results of the study add to a growing litera-
plications, 72, 151–159.
ture (e.g. Danenas & Garsva, 2015; Kruppa et al., 2013; López Itur- Carton, R., & Hofer, C. (2006). Measuring organizational performance. Edward Elgar
riaga & Sanz, 2015) related to better performance of machine Publishing.
Chen, J., Chollete, L., & Ray, R. (2010). Financial distress and idiosyncratic volatility:
learning techniques when compared to traditional approaches that
An empirical investigation. Journal of Financial Markets, 13(2), 249–267.
are widespread in the credit industry, such as logistic regression. Chen, P., & Wu, C. (2014). Default prediction with dynamic sectoral and macroeco-
These results should encourage decision makers to test and con- nomic frailties. Journal of Banking & Finance, 40, 211–226.
sider the use of machine learning models in their databases. Al- Cleofas-Sánchez, L., García, V., Marqués, A., & Sénchez, J. (2016). Financial distress
prediction using the hybrid associative memory with translation. Applied Soft
though practitioners could be concerned about explanatory rea- Computing, 44, 144–152.
sons to validate their model, the complexity of the bankruptcy Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3),
phenomenon would suggest that machine learning could be an 273–297.
Danenas, P., & Garsva, G. (2015). Selection of support vector machines based classi-
important tool to aid credit risk analysis. If the goal of the de- fiers for credit risk domain. Expert Systems with Applications, 42(6), 3194–3204.
cision maker is to predict and not necessarily explain (Efron & Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference.
Hastie, 2016), then the use of estimates of prediction error should Figini, S., Savona, R., & Vezzoli, M. (2016). Corporate default prediction model av-
eraging: A Normative linear pooling approach. Intelligent Systems in Accounting,
be the focus and relative contribution of predictors would not be Finance and Management, 23(1–2), 6–20.
a matter of concern. In this context, results show that machine Freund, Y., & Schapire, R. E. (1997). A decision-Theoretic generalization of on-Line
learning could be a powerful ally to make decisions about corpo- learning and an application to boosting. Journal of Computer and System Sciences,
55(1), 119–139.
rate loans.
Gerlein, E. A., McGinnity, M., Belatreche, A., & Coleman, S. (2016). Evaluating ma-
Future studies should extend the analysis to incorporate the chine learning classification for financial trading: An empirical approach. Expert
growth rates and/or time effects of all variables, including the Systems with Applications, 54, 193–207.
Griffin, J. M., & Lemmon, M. L. (2002). Book-to-Market equity, distress risk, and
growth measures themselves, to evaluate the impact of time on
stock returns. The Journal of Finance, 57(5), 2317–2336.
default events. The outcomes should be applied to individual fi- Duéñez Guzmán, E. A., & Vose, M. D. (2013). No free lunch and benchmarks. Evolu-
nancial institutions, while considering specific institutional aspects tionary Computation, 21(2), 293–312.
such as ratings, credit losses, economic capital, and credit spreads. Heo, J., & Yang, J. Y. (2014). AdaBoost based bankruptcy forecasting of Korean con-
struction companies. Applied Soft Computing, 24, 494–499.
For practitioners, this study’s outcomes are interesting in that Hillegeist, S. A., Keating, E. K., Cram, D. P., & Lundstedt, K. G. (2004). Assessing the
they reveal how using computational learning techniques can en- probability of bankruptcy. Review of Accounting Studies, 9(1), 5–34.
hance the predictive power of credit risk models. Banks and risk du Jardin, P. (2016). A two-stage classification technique for bankruptcy prediction.
European Journal of Operational Research, 254(1), 236–252.
managers can investigate these machine learning models, which Kim, M.-J., & Kang, D.-K. (2010). Ensemble with neural networks for bankruptcy pre-
could improve their credit risk analysis and thus help them achieve diction. Expert Systems with Applications, 37(4), 3373–3379.
better profitability with lower credit risk exposure. Kim, M.-J., Kang, D.-K., & Kim, H. B. (2015). Geometric mean based boosting al-
gorithm with over-sampling to resolve data imbalance problem for bankruptcy
prediction. Expert Systems with Applications, 42(3), 1074–1082.
Acknowledgements Kim, S. Y., & Upneja, A. (2014). Predicting restaurant financial distress using decision
tree and adaboosted decision tree models. Economic Modelling, 36, 354–362.
This document is a collaborative effort. We thank Santander Kruppa, J., Schwarz, A., Arminger, G., & Ziegler, A. (2013). Consumer credit risk: In-
dividual probability estimates using machine learning. Expert Systems with Ap-
Bank, CAPES Foundation – Ministry of Education of Brazil (Process plications, 40(13), 5125–5131.
number 1766/2014-07), Centro Estadual de Educação Tecnológica Laha, D., Ren, Y., & Suganthan, P. (2015). Modeling of steelmaking process with ef-
Paula Souza (CEETEPS) and CNPq (Process numbers 409725/2013- fective machine learning techniques. Expert Systems with Applications, 42(10),
4687–4696.
7 and 310666/2016-3) for their financial support. The authors also Li, H., & Sun, J. (2009). Gaussian case-based reasoning for business failure prediction
thank Credit & Debt Markets Research Program from Salomon Cen- with empirical data in china. Information Sciences, 179(1–2), 89–108.
ter, New York University and Brenda Kuehne (Credit & Debt Mar- Li, S., Wang, M., & He, J. (2013). Prediction of banking systemic risk based on sup-
port vector machine. Mathematical Problems in Engineering, 2013, 1–5.
kets Research Specialist) for the list of bankrupt firms. Liang, D., Lu, C.-C., Tsai, C.-F., & Shih, G.-A. (2016). Financial ratios and corporate
governance indicators in bankruptcy prediction: A comprehensive study. Euro-
Supplementary material pean Journal of Operational Research, 252(2), 561–572.
López Iturriaga, F. J., & Sanz, I. P. (2015). Bankruptcy visualization and prediction
using neural networks: A study of u.s. commercial banks. Expert Systems with
Supplementary material associated with this article can be Applications, 42(6), 2857–2869.
found, in the online version, at 10.1016/j.eswa.2017.04.006 Mahmoudi, N., & Duman, E. (2015). Detecting credit card fraud by modified fisher
discriminant analysis. Expert Systems with Applications, 42(5), 2510–2516.
References Maione, C., de Paula, E. S., Gallimberti, M., Batista, B. L., Campiglia, A. D., Jr, F. B.,
& Barbosa, R. M. (2016). Comparative study of data mining techniques for the
Ali, A., & Daly, K. (2010). Macroeconomic determinants of credit risk: Recent ev- authentication of organic grape juice based on ICP-MS analysis. Expert Systems
idence from a cross country study. International Review of Financial Analysis, with Applications, 49, 60–73.
19(3), 165–171. de Menezes, F. S., Liska, G. R., Cirillo, M. A., & Vivanco, M. J. (2017). Data classifica-
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of cor- tion with binary response through the boosting algorithm and logistic regres-
porate bankruptcy. The Journal of Finance, 23(4), 589–609. sion. Expert Systems with Applications, 69, 62–73.
Begley, J., Ming, J., & Watts, S. (1996). Bankruptcy classification errors in the 1980s: Min, J., & Lee, Y. (2005). Bankruptcy prediction using support vector machine with
An empirical analysis of Altman’s and Ohlson’s models. Review of Accounting optimal choice of kernel function parameters. Expert Systems with Applications,
Studies, 1(4), 267–284. 28(4), 603–614.
Bernard, J., Chang, T.-W., Popescu, E., & Graf, S. (2017). Learning style identifier: Im- Nanni, L., & Lumini, A. (2009). An experimental comparison of ensemble of classi-
proving the precision of learning style identification through computational in- fiers for bankruptcy prediction and credit scoring. Expert Systems with Applica-
telligence algorithms. Expert Systems with Applications, 75, 94–108. tions, 36(2), 3028–3033.
Bonfim, D. (2009). Credit risk drivers: Evaluating the contribution of firm level in-
formation and of macroeconomic dynamics. Journal of Banking & Finance, 33(2),
281–299.
F. Barboza et al. / Expert Systems With Applications 83 (2017) 405–417 417
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), Upneja, A., & Dalbor, M. C. (2001). An examination of capital structure in the
1565–1567. restaurant industry. International Journal of Contemporary Hospitality Manage-
Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. ment, 13(2), 54–59.
Journal of Accounting Research, 18(1), 109–131. Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble
Oskoei, M., & Hu, H. (2008). Support vector machine-Based classification scheme learning for credit scoring. Expert Systems with Applications, 38(1), 223–230.
for myoelectric control applied to upper limb. IEEE Transactions on Biomedical Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on
Engineering, 55(8), 1956–1965. dual strategy ensemble trees. Knowledge-Based Systems, 26, 61–68.
Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: An ap- Wang, G., Ma, J., & Yang, S. (2014). An improved boosting based on feature selec-
plication to face detection. In Computer vision and pattern recognition, 1997. pro- tion for corporate bankruptcy prediction. Expert Systems with Applications, 41(5),
ceedings., 1997 ieee computer society conference on (pp. 130–136). IEEE. 2353–2361.
Pal, R., Kupka, K., Aneja, A. P., & Militky, J. (2016). Business health characterization: Xiao, H., Xiao, Z., & Wang, Y. (2016). Ensemble classification based on supervised
A hybrid regression and support vector machine analysis. Expert Systems with clustering for credit scoring. Applied Soft Computing, 43, 73–86.
Applications, 49, 48–59. Yeh, C.-C., Chi, D.-J., & Lin, Y.-R. (2014). Going-concern prediction using hybrid ran-
Park, H., Kim, N., & Lee, J. (2014). Parametric models and non-parametric machine dom forests and rough set approach. Information Sciences, 254, 98–110.
learning models for predicting option prices: Empirical comparison study over Yu, L., Yue, W., Wang, S., & Lai, K. (2010). Support vector machine based multiagent
KOSPI 200 index options. Expert Systems with Applications, 41(11), 5227–5237. ensemble learning for credit risk evaluation. Expert Systems with Applications,
Subasi, A., & Ismail Gursoy, M. (2010). EEG signal classification using PCA, ICA, 37(2), 1351–1360.
LDA and support vector machines. Expert Systems with Applications, 37(12), Yurdakul, F. (2014). Macroeconomic modelling of credit risk for banks. Procedia -
8659–8666. Social and Behavioral Sciences, 109, 784–793.
Tian, Y., Shi, Y., & Liu, X. (2012). Recent advances on support vector machines re- Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., & Wasinger, R. (2014). Investi-
search. Technological and Economic Development of Economy, 18(1), 5–33. gation and improvement of multi-layer perception neural networks for credit
Trustorff, J.-H., Konrad, P. M., & Leker, J. (2010). Credit risk prediction using support scoring. Expert Systems with Applications, in press.
vector machines. Rev Quant Finan Acc, 36(4), 565–581. Zhou, L., Lai, K. K., & Yen, J. (2014). Bankruptcy prediction using SVM models with
Tsai, C.-F., Hsu, Y.-F., & Yen, D. C. (2014). A comparative study of classifier ensembles a new approach to combine features selection and parameter optimisation. In-
for bankruptcy prediction. Applied Soft Computing, 24, 977–984. ternational Journal of Systems Science, 45(3), 241–253.