Bankruptcy Prediction Using Machine Learning: Nanxi Wang
Bankruptcy Prediction Using Machine Learning: Nanxi Wang
http://www.scirp.org/journal/jmf
ISSN Online: 2162-2442
ISSN Print: 2162-2434
Nanxi Wang
1. Introduction
Machine learning is a subfield of computer science. It allows computers to build
analytical models of data and find hidden insights automatically, without being
unequivocally coded. It has been applied to a variety of aspects in modern socie-
ty, ranging from DNA sequences classification, credit card fraud detection, robot
locomotion, to natural language processing. It can be used to solve many types
of tasks such as classification. Bankruptcy prediction is a typical example of clas-
sification problems.
Machine learning was born from pattern recognition. Earlier works of the
same topic (machine learning in bankruptcy) use models including logistic re-
gression, genetic algorithm, and inductive learning.
Logistic regression is a statistical method allowing researchers to build predic-
tive function based on a sample. This model is best used for understanding how
several independent variables influence a single outcome variable [1]. Though
useful in some ways, logistic regression is also limited.
Genetic algorithm is based on natural selection and evolution. It can be used
to extract rules in propositional and first-order logic, and to choose the appro-
priate sets of if-then rules for complicated classification problems [2].
Inductive learning’s main category is decision tree algorithm. It identifies
training data or earlier knowledge patterns and then extracts generalized rules
which are then used in problem solving [2].
To see if the accuracy of bankruptcy prediction can be further improved, we
propose three latest models—support vector machine (SVM), neural network,
and autoencoder.
Support vector machine is a supervised learning method which is especially
effective in cases of high dimensions, and is memory efficient because it uses a
subset of training points in the decision function. Also, it specifies kernel func-
tions according to the decision function [3]. Its nice math property guarantees a
simple convex optimization problem to converge to a single global problem.
Neural networks, unlike conventional computers, are expressive models that
learn by examples. They contain multiple hidden layers, thus are capable of
learning very complicated relationships between inputs and outputs. And they
operate significantly faster than conventional techniques. However, due to li-
mited training data, overfitting will affect the ultimate accuracy. To prevent this,
a technique called dropout—temporarily and randomly removes units (hidden
and visible)—to the neural network [4].
Autoencoder, also known as Diabolo network, is an unsupervised learning al-
gorithm that sets the target values to be equal to the inputs. By doing this, it
suppresses the computation of representing a few functions, which improves
accuracy. Also, the amount of training data required to learn these functions is
reduced [5].
This paper is structured as follows. Section 2 describes the motivation for this
idea. Section 3 describes relevant previous work. Section 4 formally describes the
three models. In Section 5 we present our experimental results where we do a
parallel comparison within the three models we choose and a longitudinal com-
parison with the three older models. Section 6 is the conclusion. Section 7 is the
reference.
2. Motivation
The three models we choose (SVM, neural network, autoencoder) are relatively
newly-developed but have already been applied to many fields.
SVM has been used successfully in many real-world problems such as text ca-
tegorization, object tracking, and bioinformatics (Protein classification, Cancer
classification). Text categorization is especially helpful in daily life—web
searching and email filtering provide huge convenience and work efficiency.
Neural networks learn by examples instead of algorithms, thus, they have been
widely applied to problems where it is hard or impossible to apply algorithmic
methods [6]. For instance, finger print recognition is an exciting application.
People can now use their unique fingerprints as keys to unlock their phones and
payment accounts, free from the troubling, long passwords.
3. Related Work
Machine learning enables computers to find insights from data automatically.
The idea of using machine learning to predict bankruptcy has previously been
used in the context of Predicting Bankruptcy with Robust Logistic Regression by
Richard P. Hauser and David Booth [1]. This paper uses robust logistic regres-
sion which finds the maximum trimmed correlation between the samples re-
mained after removing the overly large samples and the estimated model using
logistic regression [1]. This model has its limitation. The value of this technique
relies heavily on researchers’ abilities to include the correct independent va-
riables. In other words, if researchers fail to identify all the relevant independent
variables, logistic regression will have little predictive value [7]. Its overall accu-
racy is 75.69% in the training set and 69.44% in testing set.
Another work, the discovery of experts’ decision rules from qualitative bank-
ruptcy data using genetic algorithms, in 2003 by Myoung-Jong Kim and Ingoo
Han uses the same dataset as we do. They apply older models—inductive learn-
ing algorithms (decision tree), genetic algorithms, and neural networks without
dropout. Since the length of genomes in GA is fixed, a given problem cannot
easily be encoded. And GA gives no guarantee of finding the global maxima. The
problem of inductive learning is with the one-step-ahead node splitting without
backtracking, which may generate a suboptimal tree. Also, decision trees can be
unstable because small variations in the data might result in a completely differ-
ent tree being generated [3]. And the absence of dropout in the neural network
model increases the possibility of overfitting which affects accuracy. The overall
accuracies are 89.7%, 94.0%, and 90.3% respectively.
The models we choose either contain a newly developed technique, like dro-
pout, or completely new models that have hardly been utilized in bankruptcy
prediction.
4. Model Description
This section describes the proposed three models.
subject to
( )
yi ω Tφ ( xi ) + b ≥ 1 − ζ i
Its dual is
1
min α T Qα − eTα
α 2
subject to
y Tα = 0, 0 ≤ α i ≤ C , i= 1, , n
where e is a common vector, C > 0 is upper bound, Q is n by n positive semi-
definite matrix, Qij ≡ yi y j k ( xi ⋅ x j ) , and K ( xi , x j ) = φ ( xi ) φ ( x j ) is the kernel.
T
Here the function implicitly maps the training vectors into a higher dimensional
space.
The decision function is:
n
sgn ∑ yiα i K ( xi , x ) + ρ [3]
i=1
y(
l +1)
= f z( ( l +1)
), ii
z ( ) w(
l +1)
y l + b(
l +1 l +1) iii
= , [4].
4.3. Autoencoder
Consider an n/p/n autoencoder.
In Figure 4, let F and G denote sets, n and p be positive integers where 0 < p <
n, and B be a class of functions from Fn to Gp.
Define X = { x1 , , xm } as a set of training vectors in Fn. When there are ex-
ternal targets, let Y = { y1 , , ym } denote the corresponding set of target vectors
in Fn. And ∆ is a distortion function (e.g. Lp norm, Hamming distance) defined
over Fn.
For any A ∈ A and B ∈ B, the input vector x ∈ Fn becomes output vector A ◦
B(x) ∈ Fn through the autoencoder. The goal is to find A ∈ A and B ∈ B that
minimize the overall distortion function:
B ) min E ( =
min E ( A,= xt ) min ∆A B ( xt ) , xt [10].
H ( Qright (θ ) )
nright
H ( Qleft (θ ) ) +
nleft
G ( Q, θ )
=
Nm Nm
Then recur for subsets Qleft (θ ∗ ) and Qright (θ ∗ ) until reaching the maxi-
mum possible depth, N m < min samples or N m = 1 [3].
5. Experimental Result
The data we used shown in Table 1, called Qualitative Bankruptcy database, is
created by Martin. A, Uthayakumar. j, and Nadarajan. m in February 2014 [10].
The attributes include industrial risk, management risk, financial flexibility, cre-
dibility, competitiveness, and operating risk.
variation accuracy
truncate = 50 0.9899
truncate = 100 0.9933
variation accuracy
without dropout 0.9867 with loss 0.0462
with dropout (dropout rate = 0.1) 0.9867 with loss 0.0292
with dropout (dropout rate = 0.3) 0.9933 with loss 0.0300
with dropout (dropout rate = 0.4) 0.9933 with loss 0.0401
with dropout (dropout rate = 0.5) 0.9933 with loss 0.0278
with dropout (dropout rate = 0.7) 0.9933 with loss 0.0428
with dropout (dropout rate = 0.8) 0.9867 with loss 0.0318
As shown in Table 4 and Table 5, we can conclude that adding layers in-
creases accuracy. Figure 5 and Figure 6 depict Table 5.
6. Conclusions
Support vector machine, neural network with dropout, and autoencoder are
three relatively new models applied in bankruptcy prediction problems. Their
accuracies outperform those of the three older models (robust logistic regres-
sion, inductive learning algorithms, genetic algorithms). The improved aspects
include the control for overfitting, the improved probability of finding the global
maxima, and the ability to handle large feature spaces. This paper compared and
concluded the progress of machine leaning models regarding bankruptcy pre-
diction, and checked to see the performance of relatively new models in the
context of bankruptcy prediction that have rarely been applied in that field.
However, the three models also have drawbacks. SVM does not directly give
probability estimates, but uses an expensive five-fold cross-validation instead.
Table 4. Accuracy of Neural Network Model with Two, Three, and Four Layer.
variation accuracy
two layer with dropout (dropout rate = 0.5) 0.9933 with loss 0.0278
Table 5. Accuracy of Neural Network Model with Truncate 50 or 100 and With Four
Layers.
variation accuracy
Table 6. Accuracy of Neural Network Model with SVM or With Decision Tree.
variation accuracy
model accuracy
Also, if the data sample is not big enough, especially when outnumbered by the
number of features, SVM is likely to give bad performance [4]. With dropout,
the time to train the neural network will be 2 to 3 times longer than training a
standard neural network. An autoencoder captures as much information as
possible, not necessarily the relevant information. And this can be a problem
when the most relevant information only makes up a small percent of the input.
The solutions to overcome these drawbacks are yet to be found.
References
[1] Hauser, R.P. and Booth, D. (2011) Predicting Bankruptcy with Robust Logistic Re-
gression. Journal of Data Science, 9, 565-584.
[2] Kim, M.-J. and Han, I. (2003) The Discovery of Experts’ Decision Ruels from Qua-
litative Bankruptcy Data Using Genetic Algorithms. Expert Systems with Applica-
tion, 25, 637-646,
[3] Pedregosa, et al. (2011) Scikit-Learn: Machine Learning in Python. Journal of Ma-
chine Learning Research, 12, 2825-2830.
[4] Sirvastava, N., et al. (2014) Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
[5] Dev, D. (2017) Deep Learning with Hadoop. Packet Publishing, Birmingham, 52.
[6] Nielsen, F. (2001) Neural Networks—Algorithms and Applications.
https://www.mendeley.com/research-papers/neural-networks-algorithms-applicatio
ns-5/
[7] Robinson, N. (n.d.) The Disadvantages of Logistic Regression.
http://classroom.synonym.com/disadvantages-logistic-regression-8574447.html
[8] Sima, J. (1998) Introduction to Neural Networks. Technical Report No. 755.
[9] Baldi, P. (2012) Autoencoders, Unsupervised Learning, and Deep Architectures.
Journal of Machine Learning Research, 27, 37-50.
[10] Martin, A., Uthayakumar, J. and Nadarajan, M. (2014) Qualitative Bankruptcy Data
Set, UCI. https://archive.ics.uci.edu/ml/datasets/qualitative_bankruptcy