Classification Models
Classification Models
Score
Class 1 Class 2
A discriminant function
is optimized to minimize
the common area for the
distributions
Canonical discriminant functions
A canonical discriminant function is a linear
combination of the discriminating variables which are
formed to satisfy certain conditions
The coefficients for the first function are derived so
that the group means on the function are as different
as possible
The coefficients for the second function are derived to
maximize the difference between group means under
the added condition that values on the second function
are not correlated with the values on the first function
A third function is defined in a similar way having
coefficients which maximize the group differences while
being uncorrelated with the previous function and so on
The maximum number on unique functions is
Min(Groups – 1, No of discriminating variables)
Fisher’s Discriminant functions
T=W+BW=T-B
Multivariate Wilks’ Lambda statistic...
F(1,64)-distribution
0.14
0.12
0.1
0.08
0.06 (CA-Cash)/TA
0.027 5% critical
0.04 CA/Loans value = 3.99
CA/TA
0.981
0.02 2.072
0
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5
Tests of equality of group means
The tests of equality of the group means indicate that
for the three first predictor variables there does not
seem to be any significant difference in group means
F-values < 3.99, the 5 % critical value for F(1,64)
Significance > 0.05
The result is confirmed by the Wilks’ lambda values
close to 1
As the results indicate low univariate discriminant
power for these variables, some or all of them may be
excluded from analysis in order to get a parsimonious
model
Pooled Within-Groups Correlation
Matrix
CA/TA (C-C)/TA CA/Loa Res/Loa NI/TA NI/TEC NI/Loa CS/Sal CF/Loa
CA/TA 1,000
(C-C)/TA ,760 1,000
CA/Loa ,917 ,641 1,000
Res/Loa ,013 -,230 ,099 1,000
NI/TA ,038 -,007 ,058 ,174 1,000
NI/TEC -,023 -,016 -,035 ,033 ,956 1,000
NI/Loa ,048 -,015 ,072 ,194 ,999 ,947 1,000
CS/Sal -,087 -,147 -,104 -,288 -,565 -,419 -,570 1,000
CF/Loa -,007 -,013 ,014 ,116 ,223 ,181 ,225 -,372 1,000
Correlations between predictor
variables
The variables Current assets/Total assets and Current
assets/Loans are highly correlated (Corr = 0,917)
The variables explain the same variation in the data
Including both the variables in the discriminant
function does not improve the explanation power but
may lead to multicollinearity problem in estimation
Only one of the variables should be selected into the
set of explanatory variables
For the same reason, only one of the variables Net
income/Total assets, Net income/Total equity capital
and Net income/Loans should be selected
Summary of Canonical Discriminant
Functions
Eigenvalues
Function Eigenvalue % of Cumulative Canonical
Variance % Correlation
1 ,417a 100,0 100,0 ,542
a. First 1 canonical discriminant functions were used in the analysis.
Wilk’s Lambda
Test of Wilk’s Chi-square df Sig.
Function(s) Lambda
1 ,706 20,899 8 ,007
Canonical Discriminant Function
Coefficients
Function 1
Standardized Unstandardized
CA/TA -1,318 -11,825
(CA-Cash)/TA ,625 6,940
CA/Loans ,612 4,601
Reserves/Loans -,228 -5,510
NI/TA 1,134 85,998
NI/TEC -1,264 -4,456
CofS/Sales ,780 5,884
CF/Loans -,180 -7,864
Constant -3,957
Relative contribution
of each variable to
discriminant function
Functions at group centroids
Class Function 1
0 -,563
1 ,718
Unstandardized
canonical discriminant
functions evaluated at
group means
Example of classifying an observation
by the canonical discrimiant function
Obs. 1 Coeff. Score
Constant -3,957 -3,957 Distance to group
CA/TA 0.4611 -11,825 -5,453 centroid for Group 1
(Failed), 0,718,
CA_Cash/TA 0.3837 6,940 2,663 smaller than
CA/Loans 0.4894 4,601 2,252 distance to group
Res/Loans 0.0077 -5,510 -0,042 centroid for Group 0
(Survived), -0,563
NI/TA 0.0057 85,998 0,490 Classification to the
NI/TEC 0.0996 -4,456 -0,444 closest group Failed
CofS/Sales 0.8799 5,884 5,177
CF/Loans 0.0092 -7,864 -0,072
Total Score 0,614
Fisher’s discriminant function
coefficients
Survived Failed
Predicted class
Survived Failed
True Survived 28 9
class
75,7 % 24,3 %
Failed 4 25
13,8 % 86,2 %
Summary of classifications with different
classification methods(Estimation sample)
Error types
Classifying a survivor as failed
(4) pi | j f j ( x )dx
Ri
Probability of correct classification
k
(6) Fj 1 ej 1 pj f j ( x )dx p j f j ( x )dx
i 1 Ri Rj
i j
The maximization problem for optimal
allocation
We obtain the last equality because
k k
pi | j pi | j p j| j 1
i 1 i 1
i j
f1( x ) p2 f1( x ) p2
(9) R1 x , R2 x
f2 ( x ) p1 f2 ( x ) p1
Let ~
I 1 ( x )dx 2 ( x )dx
~ ~
R1 R2
I 1 ( x )dx 2 ( x )dx
R1 R2
1 = ( 1 R1) ( 1 R2)
2 = ( 2 R1) ( 2 R2)
We can therefore write (10) as
1( x )dx 1( x )dx 2 ( x )dx
~ ~ ~
R1 R2
R1 R1
R2 R1
(12) 2 ( x )dx 1( x )dx 1( x )dx
~ ~ ~
R2 R2
R1 R1
R1 R2
2 ( x )dx 2 ( x )dx
~ ~
R2 R1 R2 R2
The optimal partitioning – Proof…
But the more we shift the critical level FRc to the right, the
more often we will accept H0 even if it is false: there
will be firms in our clientele that should not be there
These firms are distressed, even though we have failed
to detect this because of a high FRc. This latter error is
denoted Type II
Because of the high FRc the test has a low power: the
probability of failing to reject a false null hypothesis is
unduly high
The probability of type I vs. type II errors depend on the
significance level , the properties of the test statistic
(here: FR) and the statistical properties of the
database
Statistical experts warn against a slavish usage of the
standard type I significance test in a statistical context.
Other techniques in financial classification
Logistic regression
The recursive partitioning algorithm (RPA)
Mathematical programming
Linear programming models
Quadratic programming models
Neural network classifiers
Logistic Regression
Logistic regression is part of a category of statistical
models called generalized linear models
Whereas discriminant analysis can only be used with
continuous independent variables. Logistic regression
allows one to predict a discrete outcome, such as
group membership, from a set of variables that may be
continuous, discrete, dichotomous, or a mix of any of
these
Generally, the dependent or response variable is
dichotomous, such as presence/absence or
success/failure.
Logistic Regression...
The Model:
1x1 2 x2 ... n xn
e
1 e 1x1 2 x2 ... n xn
where is the constant of the equation and, :s are
the coefficient of the predictor variables
An alternative form of the logistic regression equation
is:
x
logit x log x
1 1 2 x2 ... n xn
1 x
Logistic Regression...
Stepwise regression
Variables are entered into the model in the
order specified by the researcher or logistic
regression can test the fit of the model after
each coefficient is added or deleted
Used in the exploratory phase of research
where no a-priori assumptions regarding
the relationships between the variables are
made, thus the goal is to discover
relationships
Logistic Regression...
1.2
0.8
F1(x1)
0.6 F2(x1)
0.4
D(x1*) = 0,6
0.2
0
2 1 4 7 5 6 3 8 9
Recursive Partitioning Algorithm, an
example...
1.2
0.8
F1(x2)
0.6 F2(x2)
0.4
D(x2*) = 0,8
0.2
0
7 6 9 5 8 1 2 4 3
Recursive Partitioning Algorithm, an
example...
The maximum value of the absolute difference
between the cumulative distributions is now 0.8,
corresponding to value x2 = 6
Thus the best discrimination based on variable x2 is
achieved by assigning the five cases with x2 less than
or equal to 6 into class 2 and the other four cases into
class 1.
By this partitioning, only one of the nine cases is
misclassified, i.e. variable x2 is superior to variable x1,
in terms of univariate discrimination power.
Recursive Partitioning Algorithm, an
example...
Mathematically, the best univariate discriminator is
found by comparing the maximum distances D(x1) and
D(x2) and selecting the variable with the maximum
D(xj)
As the maximum D(xj) is
Max(D(x1),D(x2)) = Max(0.6;0.8) = 0.8 = D(x2)
x2 is the variable with the greatest univariate
discrimination power and the first splitting is done in
the way suggested by the second predictor variable
Recursive Partitioning Algorithm, an
example...
As one of the two subgroups contains cases from both
classes, an additional partitioning of the subgroup
consisting of observations 4, 6, 7, 8 and 9 is possible
The maximum distance in this second partitioning is
1.0 corresponding to value x1 = 2
The optimal partitioning now is to assign the case with
x1 equal to 2 into class 1 and the other four cases into
class 2
All the nine cases are now correctly assigned in pure
classes
Recursive Partitioning Algorithm, an
example... The decision tree
X2
≤6 >6
X1 Class 1
≤2 >2
Class 1 Class 2
The Linear Programming classification
model by Freed and Glover (1981)
Given observations xi and groups Gj, find the linear
transformation a, and the appropriate boundaries bjL
and bjU, to 'properly' categorize each xi
Bounds bjL and bjU represent respectively the lower
and upper boundaries for points assigned to group j.
Thus the task is to determine a linear predicting or
weighting scheme a and breakpoints bjL and bjU, such
that
bjL ≤ xka ≤ bjU ⇔ xk ∈ Gj
and
b1L < b1U < b2L < b2U < ... < bgU
The Linear Programming classification model
by Freed and Glover (1981) …
Output layer
Input layer
Predictor variables x1 x2 x3 x4
3. Case: Bankruptcy prediction in
the Spanish banking sector
Reference: Olmeda, Ignacio and Fernández, Eugenio:
"Hybrid classifiers for financial multicriteria decision
making: The case of bankruptcy prediction",
Computational Economics 10, 1997, 317-335.
Sample: 66 Spanish banks
37 survivors
29 failed
Sample was divided in two sub-samples
Estimation sample, 34 banks, for estimating the model
parameters
Holdout sample, 32 banks, for validating the results
Case: Bankruptcy prediction in the
Spanish banking sector
Input variables
Current assets/Total assets
(Current assets-Cash)/Total assets
Current assets/Loans
Reserves/Loans
Net income/Total assets
Net income/Total equity capital
Net income/Loans
Cost of sales/Sales
Cash flow/Loans
Empirical results
Predicted class
Survived Failed
True Survived 17 1
class
94.44 % 5.56 %
Failed 3 11
21.43 % 78.57 %
Summary of classifications
(Estimation sample)
Error types
Classifying a survivor as failed
Classifying a failed as survivor
Many methods may be calibrated to take into account
the relative severity of the two types of errors
Fisher’s discriminant function
coefficients
Survived Failed