0% found this document useful (0 votes)
47 views9 pages

An Application of Support Vector Machine To Companies' Financial Distress Prediction

SVM method of supervised learning

Uploaded by

Wahyu A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

An Application of Support Vector Machine To Companies' Financial Distress Prediction

SVM method of supervised learning

Uploaded by

Wahyu A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

An Application of Support Vector Machine to

Companies’ Financial Distress Prediction*

Xiao-Feng Hui and Jie Sun

School of Management, Harbin Institute of Technology, Harbin 15 00 01,


HeiLongJiang Province, China
[email protected]
[email protected]

Abstract. Because of the importance of companies’ financial distress predic-


tion, this paper applies support vector machine (SVM) to the early-warning of
financial distress. Taking listed companies’ three-year data before special
treatment (ST) as sample data, adopting cross-validation and grid-search tech-
nique to find SVM model’s good parameters, an empirical study is carried out.
By comparing the experiment result of SVM with Fisher discriminant analysis,
Logistic regression and back propagation neural networks (BP-NNs), it is con-
cluded that financial distress early-warning model based on SVM obtains a bet-
ter balance among fitting ability, generalization ability and model stability than
the other models.

1 Introduction
Bankruptcy of enterprise not only makes stockholders, creditors, managers, employ-
ees, and other interest parts suffer individual economic loss, but also if many enter-
prises run into bankruptcy the economic development of the whole country will be
greatly shocked. In general, most enterprises that ran into bankruptcy had experienced
a condition of financial distress, but they could not detect financial distress at an early
stage and timely take effective measures to prevent bankruptcy. Therefore, in the
perspective of management, it is important to explore a more effective financial dis-
tress prediction model to signal early-warning for enterprises which will possibly get
into financial distress, so that managers can take strategic actions to avoid deteriora-
tion of financial state and bankruptcy. Besides, from the view of financial institution,
an effective financial distress prediction model can help them detect customers with
high default risk at an early stage so as to improve their efficiency of commercial
credit assignment [1].
Financially distressed enterprises usually have some characteristic symptoms, which
generally can be indicated by the data in financial statements and the financial ratios
derived from financial statements. With the development of all kinds of classification
and prediction techniques, from univariate analysis to multi discriminate analysis
(MDA), from statistical methods to machine learning methods, research literatures on
enterprises’ financial distress prediction become more and more abundant.

* Sponsored by National Natural Science Foundation of China (No. 70573030).

V. Torra et al. (Eds.): MDAI 2006, LNAI 3885, pp. 274 – 282, 2006.
© Springer-Verlag Berlin Heidelberg 2006
An Application of Support Vector Machine to Companies’ Financial Distress Prediction 275

Beaver (1966), one of the first researchers to study bankruptcy prediction, investi-
gated the predictability of the 14 financial ratios using 158 samples consisting of
failed and non-failed firms [2]. Beaver’s study was followed by Altman’s model
(1968) based on the MDA to identify the companies into known categories. Accord-
ing to Altman, bankruptcy could be explained quite completely by using a combina-
tion of five (selected from an original list of 22) financial ratios [3]. Logit model is
widely used to deal with two classes classification problems, and Ohlson was the first
to apply it to predicting financial distress in 1980 [4].
The most widely used machine learning method in the field of financial distress
prediction is neural networks (NNs), which has strong capability of identifying and
representing nonlinear relationships in the data set. Odom and Sharda (1990) made an
early attempt to use NNs for financial distress prediction. He used the same financial
ratios as Altman’s study and took MDA model as the benchmark [5]. From then on,
many scholars (Fletcher and Goss, 1993; Carlos Serrano-Cinca, 1996; Parag C. P.,
2005; etc.) were dedicated to compare NNs with MDA and logit, which brought a lot
of positive support for the conclusion that NNs can predict financial distress more
accurate than those benchmarks [6][7][8].
Generally, statistical methods have the advantages of simple model structure and
easiness to understand and use, but they have restrictive assumptions such as linearity,
normality and independence of input variables, which limits the effectiveness and valid-
ity of prediction. In contrary, NNs is not constrained by those assumptions and have
strong ability of fitting nonlinear relationships between descriptive variables and
conclusive variables. But NNs also has the disadvantages such as unfixed structure,
over-fitting, needing a lot of samples, and black-box effect.
Support vector machine (SVM) is a relatively new machine learning technique,
originally developed to resolve local minima and over-fitting problems which are the
main sources of trouble to NNs [9], [10], [11]. Shin K.-S. (2005) and Min J. H. (2005)
respectively made an attempt to use SVM to predict corporate bankruptcy with Ko-
rean data and got satisfying results [12], [13]. Other applications of SVM by Kim K.
J. (2003) and Tay F. E. H. etc. (2001) also showed that it is a promising classification
and prediction method [14], [15].
This paper attempts to apply SVM to predicting the financial distress of Chinese
listed companies and compare the result of SVM with the results got by the methods
of Fisher discriminant analysis, Logistic regression and NNs. The rest of the paper is
divided into five sections. Section 2 is the brief description of SVM theory. Section 3
is about data collection and preprocessing. Section 4 gives the modeling process and
experiment results. Section 5 discusses and analyzes the experiment results. Section 6
makes conclusion.

2 Theory of SVM
SVM, put forward by Vapnik in 1990’s, is a relatively new machine learning tech-
nique, which is developed on the basis of statistical learning theory [9]. Former re-
searches have shown that SVM has the following merits according to learning ability
and generalization ability.
276 X.-F. Hui and J. Sun

1) SVM is based on the principle of structural risk minimization, not on the


principle of empirical risk minimization, so SVM can better avoid the prob-
lem of over-fitting.
2) SVM algorithm is uneasy to get into local optimization, because it is a convex
optimization problem and its local optimal solution is just the global optimal
solution.
3) In practice, when the number of samples is relatively small, SVM can often
get better result than other classification and prediction techniques.
A simple description of the SVM algorithm is provided as follows [9], [10], [11],
[12], [13]. Suppose D = {xi , y i }iN=1 is a training data set with input vectors
x i = ( x i(1) , xi( 2) , L , xi( n ) ) T ∈ R n and target labels y i ∈ {−1,+1} , in which N is the
number of training samples. In the condition that the training samples are linearly
separable, SVM algorithm is to find an optimal separating plane w • x + b = 0 , which
can not only separate training samples without error but also make the margin width
between the two parallel bounding planes at the opposite side of the separating plane
get a biggest value.
In the nonlinearly separable case, SVM firstly uses a nonlinear function Φ (x) to
map input space to a high-dimensional feature space. Then a nonlinear optimal sepa-
rating hyperplane w • Φ ( x) + b = 0 with the biggest margin width can be found by the
same technique as linear model. Those data instances which are nearest to the separat-
ing hyperplane are called support vectors, and other data instances are irrelevant to
the bounding hyperplanes. Because most problems are nonlinear separable and line-
arly separable case is the special situation of nonlinearly separable case, i.e.
Φ ( x ) = x , here only the SVM theory under nonlinearly separable case is stated. So
according to Vapnik’s original formulation, the SVM classifier should satisfy the
following conditions
⎪⎧w Φ ( x i ) + b ≥ +1 if y i = +1
T
⎨ T (1)
⎪⎩w Φ ( x i ) + b ≤ −1 if y i = −1
which is equivalent to
y i [ w T Φ( x i ) + b] ≥ 1 (i = 1, L , N ) (2)
where wT represents the weight vector and b is the bias.
2
Because the margin width between both bounding hyperplanes equals to 2 /( w ) ,
the construction of optimal separating hyperplane with biggest margin width can be
defined as the following optimization problem
N


1 T
min w w + C ξi
2 i =1
s.t. (3)
⎧⎪ y i [ w Φ ( xi ) + b] ≥ 1 − ξ i
T
(i = 1,2, L , N )

⎪⎩ξ i ≥ 0 (i = 1,2, L , N )
An Application of Support Vector Machine to Companies’ Financial Distress Prediction 277

in which ξ i are slack variables. Feature space generally can not be linearly separated,
if the separating hyperplane is constructed perfectly without even one training error, it
is easy to appear over-fitting phenomenon, so slack variables are needed to allow a
small part of misclassification. C ∈ R + is a tuning parameter, weighting the impor-
tance of classification errors with the margin width.
This problem is transformed into its dual problem because it is easier to interpret
the results of the dual problem than those of the primal one.
1 T
max ∂ Q∂ − e T ∂
2
s.t. (4)
⎧⎪0 ≤ ∂ i ≤ C (i = 1,2, L , N )
⎨ T
⎪⎩ y ∂ = 0

In the optimization problem above, eT is the N-dimension vector of all ones.


Q is a N × N positive semi-definite matrix, and Qij = y i y j K ( x i , x j ) , in which
K ( x i , x j ) = Φ ( x i ) T Φ ( x j ) is called kernel function. The three common types of
kernel function are polynomial kernel function, radial basis kernel function and sig-
moid kernel function. ∂ i are Lagrange multipliers. A multiplier exits for each training
data instance and data instances corresponding to non-zero ∂ i are support vectors.
Do this optimization problem and the ultimate SVM classifier is constructed as fol-
lowing
N
sgn( ∑ ∂ y K ( x, x ) + b )
i =1
i i i (5)

3 Data Collection and Preprocessing

3.1 Data Collection

The data used in this research was obtained from China Stock Market & Accounting
Research Database. Companies that are specially treated (ST)1 by China Securities
Supervision and Management Committee (CSSMC) are considered as companies in
financial distress and those never specially treated are regarded as healthy ones. Ac-
cording to the data between 2000 and 2005, 135 pairs of companies listed in
Shenzhen Stock Exchange and Shanghai Stock Exchange are selected as initial data-
set. Suppose the year when a company is specially treated as the benchmark year

1
The most common reason that China listed companies are specially treated by CSSMC is that
they have had negative net profit in continuous two years. Of course they will also be
specially treated if they purposely publish financial statements with serious false and mis-
statement, but the ST samples chosen in this study are all companies that have been specially
treated because of negative net profit in continuous two years.
278 X.-F. Hui and J. Sun

(t-0), then (t-1), (t-2) and (t-3) respectively represent one year before ST, two years
before ST and three years before ST. After eliminating companies with missing and
outlier data, the final numbers of sample companies are 75 pairs at year (t-1), 108
pairs at year (t-2) and 91 pairs at year (t-3).

3.2 Data Scaling

To avoid features in greater numeric ranges dominating those in smaller numeric


ranges and also to avoid numerical difficulties during the calculation [16], all data are
scaled to the range [-1,1] according to the formula (6).
x − min x
x' = a + × (b − a ) (6)
max x − min x

where minx and maxx are respectively the minimum and maximum value of feature x,
and a and b are respectively the minimum and maximum value after scaling, here a= -
1, b=1.

3.3 Choice of Financial Ratios

Companies at the different stages before financial distress usually have different
symptoms which are indicated by different financial ratios. Different from other re-
searches which use the same financial ratios set to construct predictive models for
different years before financial distress, aiming at improving the predictive ability,
this study use different financial ratios set respectively for the three years before ST.
Each year’s financial ratios set are selected from 35 original financial ratios by the
statistical method of stepwise discriminant analysis. According to the sample data, the
chosen financial ratios sets for year (t-1), (t-2) and (t-3) are listed in Table1.
From Table 1, it is known that financially distressed companies at the stage (t-3)
mainly showed abnormality in activity ratios and debt ratios. At the stage (t-2) the
profitability of financially distressed companies began to evidently differ from that of
healthy ones. At the stage of (t-1) activity ratios, debt ratios, profitability and growth
ability all start to further deteriorate.

Table 1. Financial ratios sets of the three years before ST

Year Financial ratios set


Total asset turnover Asset-liability ratio
(t-1)
Earning per share Total asset growth rate
Account payable turnover Current asset turnover
Fixed asset turnover Asset-liability ratio
(t-2)
Return on total asset Return on current asset
Return on equity
Current asset turnover Fixed asset turnover
(t-3) The ratio of cash to current liability Asset-liability ratio
The proportion of current liability
An Application of Support Vector Machine to Companies’ Financial Distress Prediction 279

4 Model Construction and Experiment Results

4.1 Construction of SVM Model

Construction of SVM model is to choose the kernel function and search for the values
of model parameters. This study uses radial basis kernel function because it was the
most widely used kernel function and usually got better result than other kernel func-
tions (see reference [13] for detailed reason). The radial basis kernel function is as
formula (7).
2
K ( x i , x j ) = exp(−γ x i − x j ) γ >0 (7)

Then construction of SVM needs two parameters to be identified, the tuning pa-
rameter C and the kernel parameter γ . Improperly setting these parameters will lead
to training problems such as over-fitting or under-fitting. So aiming at searching for
proper values for these parameters, this study follows the cross-validation and grid-
search techniques recommended by Chih-Wei Hsu etc. Trying exponentially growing
sequences of C and γ ( C =2-5, 2-4, … , 215; γ =2-15, 2-14, …, 25), it is designed to
find a good pair of parameter values which make the 5-fold cross-validation accuracy
highest. Though grid-search is an exhaustive search method, its computational time is
not much more than other advanced search methods, for example, approximating the
cross-validation rate [16].Grid-search on C and γ for SVM model at year (t-1) is as
Fig. 1.

Fig. 1. Grid-search on C and γ for SVM model at year (t-1)


280 X.-F. Hui and J. Sun

So the good pair of model parameters for year (t-1) is (25, 2-2), i.e. C =32, γ =0.25.
By the same method, the model parameters for year ((t-2) and (t-3) are respectively
defined as (0.5, 1) and (128, 0.0625).

4.2 Experiment Results

In order to make comparative study, Fisher discriminant analysis, Logistic regression


and BP-NNs were also carried out in the experiment. Leave-one-out method was used
to test the validity of different models, because leave-one-out accuracy can objec-
tively reflect the models’ ability to predict financial distress for companies outside the
training samples. If the total number of samples is N, to calculate the leave-one-out
accuracy, N times of training and testing are needed for each predictive method. SPSS
11.5 was utilized for Fisher discriminant analysis and Logistic regression. MATLAB
6.5 was used for BP-NNs, and its structure for year (t-1), (t-2) and (t-3) are respec-
tively 4-8-1, 7-11-1 and 5-8-1, and the learning rate was set as 0.1. LIBSVM software
developed by Prof. Lin Chih-Jen in Taiwan University was used for SVM modeling
and testing. The experiment results are list in Table 2.

Table 2. The training and testing accuracy of different models at different years

Models MDA Logit NNs SVM


Year (%) (%) (%) (%)
Training accuracy 89.5 87.6 92.2 91.5
(t-1) Leave-one-out 88.2 86.9 88.2 88.9
Drop percentage 1.45 0.80 4.34 2.84
Training accuracy 86.1 86.1 89.6 87.5
(t-2) Leave-one-out 85.2 84.7 84.3 85.6
Drop percentage 1.05 1.63 5.92 2.17
Training accuracy 77.5 79.1 80.9 80.2
(t-3) Leave-one-out 75.3 77.5 74.2 78.6
Drop percentage 2.84 2.02 8.28 2.00

5 Discussion and Analysis


From table 2, it is clear that the predictive ability of each model declines from year
(t-1) to year (t-3), which indicates that the nearer to the time when financial distress
breaks out, the more information content the financial ratios contain, so that the more
strong predictive ability each model has.
Besides, whichever year it is, SVM has the highest leave-one-out accuracy.
Whether from the perspective of training accuracy or leave-one-out accuracy, SVM
performs better than Fisher discriminant analysis and Logistic regression. The training
accuracy of SVM is a little lower than BP-NNs at each year, but its leave-one-out
accuracy is higher than BP-NNs, which shows that SVM has better generalization
ability than BP-NNs and can better avoid over-fitting phenomenon.
Furthermore, from the point of view of model stability, Fisher discriminant analy-
sis, Logistic regression and SVM have better stability but BP-NNs has relatively
An Application of Support Vector Machine to Companies’ Financial Distress Prediction 281

worse stability according to the drop percentage, which equals the difference between
training accuracy and leave-one-out accuracy divides training accuracy. Compared
with Fisher discriminant analysis and Logistic regression, SVM has worse model
stability than them at year (t-1) and (t-2), but at year (t-3) SVM performs a little better
than them. Compared with BP-NNs, SVM has much better model stability.
So, SVM has a better balance among fitting ability, generalization ability and
model stability. By finding out support vectors for financial distress prediction from
training samples, SVM is suitable to predict financial distress for companies outside
training sample, and it can keep the predictive accuracy relatively stable when the
training samples change within a certain range.

6 Conclusion
SVM is a relatively new machine learning and classification technique with strict
theoretical basis. This paper applies SVM to companies’ financial distress prediction
on the basis of stating its basic theory. In the empirical experiment, three years’ data
of 135 pairs of Chinese listed companies were selected as initial sample data, and
stepwise discriminant analysis method was used to select financial ratios set, and
cross-validation and grid-search techniques are utilized to define good parameters for
SVM model. By comparing the experiment results of SVM financial distress predic-
tion model with Fisher discriminant analysis, Logistic regression and BP-NNs, it is
concluded that SVM gets a better balance among fitting ability, generalization ability
and model stability than the other three models. So SVM, which is not only a theo-
retically good classifier but also has satisfying empirical application result, is a prom-
ising method in the practice of financial distress prediction and should be more widely
used in the domain of financial decision.

Acknowledgements
The authors gratefully thank anonymous reviewers for their comments and editors for
their work.

References
1. Gestel T. V., Baesens B., Suykens J. A.: Bayesian Kernel Based Classification for Finan-
cial Distress Detection, Vol. 1. European Journal of Operational Research (2005) 1-2
2. Beaver W.: Financial Ratios as Predictors of Failure. Journal of Accounting Research
(1966) 71-111
3. Altman E. I.: Financial Ratios Discriminant Analysis and the Prediction of Corporate
Bankruptcy, Vol. 23. Journal of Finance (1968) 589–609
4. Ohlson J. A.: Financial Ratios and Probabilistic Prediction of Bankruptcy, Vol. 18. Journal
of Accounting Research (1980) 109-131
5. Odom M., Sharda R.: A Neural Networks Model for Bankruptcy Prediction. Proceedings
of the IEEE International Conference on Neural Network (1990) 163–168
282 X.-F. Hui and J. Sun

6. Fletcher D., Goss E.: Forecasting with Neural Networks: an Application Using Bankruptcy
Data, Vol. 24. Information and Management (1993) 159–167
7. Carlos Serrano-Cinca: Self Organizing Neural Networks for Financial Diagnosis, Vol. 17.
Decision Support Systems (1996) 227-238
8. Parag C. P.: A Threshold Varying Artificial Neural Network Approach for Classification
and Its Application to Bankruptcy Prediction Problem, Vol. 32. Computers & Operations
Research (2005) 2561-2582
9. Cristianini N., Shawe-Taylor J.: An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge University Press, England (2000)
10. 10.Vapnik V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
11. 11.Vapnik V.: Statistical Learning Theory. Wiley, New York (1998)
12. 12.Shin K.-S., Lee T. S., Kim H.-J.: An Application of Support Vector Machines in Bank-
ruptcy Prediction Model, Vol. 28. Expert Systems with Applications (2005) 127–135
13. 13.Min J. H., Lee Y.-C.: Bankruptcy Prediction Using Support Vector Machine with Op-
timal Choice of Kernel Function Parameters, Vol. 28. Expert Systems with Applications
(2005) 128-134
14. 14.Kim K. J.: Financial Time Series Forecasting Using Support Vector Machines, Vol. 55.
Neurocomputing (2003) 307–319
15. 15.Tay F. E. H., Cao L.: Application of Support Vector Machines in Financial Time Series
Forecasting, Vol. 29. Omega (2001) 309–317
16. 16.Hsu C.-W., Chang C.-C., Lin C.-J.: A Practical Guide to Support Vector Classification.
Technical Report. http://www.csie.ntu.edu.tw/~cjlin/papers /guide/guide.pdf

You might also like