STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data
STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data
MULTIVARIAT GUNAAN
GROUP PROJECT
THEME
DISCRIMINANT ANALYSIS IN BANKING DATA
LECTURER
DR. ZAMIRA HASANAH BINTI ZAMZURI
GROUP MEMBERS
INTRODUCTION
23
... sambungan
Evaluating the Likelihood of - To evaluate and test the statistical technique of discriminant analysis on credit card Bank card
Using Linear Discriminant customers’ data of a Greek commercial bank and examine whether it is possible to create a owners credit
Analysis as A Commercial model evaluating the credibility of prospective credit-card customers. scoring model
Bank Card Owners Credit - To determine whether there exists a relationship between the on-time payments of credit Mylonakis &
Scoring Model card owners of a Commercial Bank and their demographic characteristics. Diacogiannis
(2010)
Discriminant Analysis of - To identify the factors associated with seasonal loan success or default amongst small-scale Loan
Seasonal Agricultural Loan farm clients of the Agricultural Bank of Transkei in 1991. repayment
Repayment by Small-scale Lugemwa &
Farmers in Transkei Darroch (1995)
3
4
METHODOLOGY
DATA
Data was collected through questionnaire. 1100 respondents were selected from the
Nigerian banking public using quota sampling method. However only 594
questionnaires were used for analysis. The questions in questionnaire included seven
independent variables with some questions each, which are service quality, security,
perceived risk, input factor, price, service product characteristics and individual
factors using five-point Likert scales. Demographic variables and the use of Hi-Tech
banking services also included in the questionnaire. The frequency of use Hi-Tech
banking services as dependent variable was measured using frequency-type question
which separated into two categories to apply discriminant analysis (Titus & Dennis
2014). From Ramayah, Fauziah and Koay (2006), data was collected through
questionnaires. 230 respondents were selected from the customers in various locations
from various banks operating in Penang using non-probability sampling method of
convenience sampling. However only 180 questionnaires were used for data analysis
out of 194 questionnaires collected since some of the questionnaires were incomplete.
The questions in questionnaires included six variables which are prior experience
using the internet, perceived usefulness, perceived ease of use, security, awareness of
services and benefits and price. Demographic variables of users and non-users of
internet banking also included in the questionnaire. The sample was selected randomly
into two groups which are analysis and holdout sample based on 65:35 ratio. Both
electronic and internet banking include the security and price factors to investigate
whether contribute to the classification or not.
For research which carried by Mbanasor and Nto (2009), data were collected
from both primary and secondary sources using two set for structured questionnaires
and information schedule for both the banks and livestock farmers who were granted
credit in 2006 farming seasons. A total of 62 data beneficiaries were randomly
selected from the zone Aba, Ohafia and Umuahia. A total of 13 independent variables
used in the analysis. From Nguyen, Do & Nguyen (2017), data was collected from
credit officers in commercial banks in Vietnam. Out of 550 consumer credit customers
5
data, only information of 500 borrowers was used for analysis. The predictor variables
are age, dependents, years at present job (YAPJ), salary and loan amount. Information
on 500 customers is divided into two different groups named analysis samples and
hold out samples randomly. The former including 400 customers will be used to
estimate discriminant function while the later including 100 customers will be used to
check the validity of the model.
The total sample used by Mylonakis and Diacogiannis (2010) consists of the
personal data, as well as, payment consistency for 1767 customers of the Greek
Commercial Bank (X-BANK) over a five-year period. The first sub-period (2.5 years)
aimed at establishing the model using a sample of 829 customers and contained 318
inconsistent paying customers; the second sub-period (2.5 years) aimed at validating
the model using a sample of 938 customers and contained 360 inconsistent paying
customers. Dependent variables used are the consistent and inconsistent payments and
14 explanatory variables, which are gender, marital status, dependants, date of birth,
home owner/or rent, years living in own property, profession code, owns a car, holds
an insurance, bills sent at home, automatic payment of bills, C card, XBANK staff and
bank group staff. The data used in the study Lugemwa & Darroch (1995) is data on
seasonal input loan application by 38 representative borrowers who received credit for
the 1990/91 agricultural season from the Agricultural Bank of Transkei. The variables
of the data include loan approval date, previous loans history, age of the borrower,
gender of the borrower, farmer status, agricultural training of the borrower, farm size,
agro-ecological classification, assets to liability ratio and distance from the borrower’s
residence to the nearest Agricultural Bank in Transkei. This shows that separation of
data into two categories which are training set and testing set may carried out before
apply discriminant analysis.
6
METHOD
Titus and Dennis (2014) were applied Pearson correlation, Cronbach Alpha reliability
analysis, Durbin Watson coefficient and factor analysis before Canonical Discriminant
Function used. From Ramayah, Fauziah and Koay (2006), the method used is Linear
Discriminant Analysis (LDA). This shows that comparison between two different
discriminant analysis towards electronic banking and internet banking can be seen.
Mbanasor and Nto (2009) used canonical discriminant function to derive a linear
combination of socio-economic characteristics which best discriminated credit
worthiness potential. Squared canonical correlation, Wilk’s Lambda and associated
chi-square statistics and the percent of livestock credit beneficiaries correctly
classified into group were used to test the significant of the discriminant.
The methods used in Lugemwa & Darroch (1995) are linear discriminant
analysis (LDA) to estimate the factors associated with loan. Besides, Wilk's lambda
was used to show the proportion of the total variance in the discriminant scores not
explained by differences among groups. For the credit scoring, there were also two
different discriminant analysis carried out. Besides, this show that correlation can be
used to aid the selection of variables for discriminant analysis meanwhile Wilk's
lambda can be applied to test the outcome of discriminant analysis.
7
FINDING
Titus and Dennis (2014) found that there was strong positive correlation between most
of the variables using Pearson correlation and no autocorrelation in all variables based
on Durbin Watson coefficient. After factor analysis was applied, the Cronbach Alpha
statistics for all items and individual variables were fall within the acceptable limit.
Based on Test of Equality of Group Means, on univariate basis, seven variables
contributed to the classification. Wilks’s Lambda for function made from Canonical
Discriminant Function showed that discriminant analysis was dependable. 65.7% of
original grouped cases were correctly classified. Since maximum chance criterion and
proportional chance criterion were lower than 65.7%, discriminant analysis is reliable.
Seven variables were significant to classify retail bank customers on the basis of users
and non-users of electronic banking. Higher level of performance on the service
quality, well developed service products, higher user input, individual factor like
awareness positively, expected security of operation, lower perception of risk and
lower priced paid by customer influence consumer involvement in electronic banking.
banking service and benefits and also had less security concern as compared to non-
users of internet banking.
The accuracy of Linear Discriminant Analysis used for internet banking data is
higher than Canonical Discriminant Analysis for electronic banking data. Besides,
discriminant analysis for internet banking was carried out using two set of data which
are analysis and holdout sample meanwhile whole data was used for discriminant
analysis for electronic banking. This shows that the separation of data into training
data and testing data may obtain a more precise model. Ramayah, Fauziah and Koay
(2006) found that most of the individuals refuse to use internet banking because they
concern about security and privacy issue. This statement is similar to the hypothesis
accepted in discriminant analysis for electronic banking which is expected security of
operation will influence consumer to involve in electronic banking. Next, users of
electronic banking and internet banking also more aware of these banking services.
correlation between age and years at present job with coefficient 0.770 which had
higher than 0.7. Therefore, variable age was removed from the discriminant function.
Based on the Test of Equality of Group Means, YAPJ and number of dependents may
best discriminate between the two groups of borrowers. Square of canonical
correlation: 0.4956 indicates 49.56% of variance in the dependent variable is
explained by estimated discriminant function. Wilk’s Lambda with 0.505 showed that
the discriminant function computed is statistically significant at 0.000 level. This
estimated discriminate function was significant at 1% level of significance and could
forecast financial health with 72.3% accuracy. There is a difference between research
carried out by Titus and Dennis (2014) and Nguyen, Do dan Nguyen (2017). This is
because although the Pearson correlation is high between each variable in Titus and
Dennis (2014) but Durbin Watson coefficient showed the result there is no
autocorrelation, this showed that multicollinearity did not happened. However, the
method used by Nguyen, Do dan Nguyen (2017) for detect multicollinearity is based
on the Pearson coefficient between independent variables.
From Mylonakis and Diacogiannis (2010), the discriminant model using the
initial sample of 829 customers has a low R2 (16.15 %). Yet, the discriminant
function’s coefficients are almost all statistically significant at the significance level of
5%. The best results were obtained from a model with a validation sample of size 938
using a cut-off point 0.398, which includes ten explanatory variables. Using this
model ‘bad’ payer classification is correct in 59% of instances and the corresponding
percentage for ‘prompt’ payers was 54%. In this study, overall accuracy classification
did not provide. Then, the low value of correlation indicates the limited capability of
independent variables to explain the level of customer credibility using the
discriminant model made. Based on the study of Lugemwa & Darroch (1995), the
factor “credit history” is the major factor associated with seasonal agricultural loan
success. Besides, 76% of failed loans and 69 % of successful loans were correctly
identified using the LDA model while the overall classification accuracy is 74%.
Lastly, the study also shows the Wilk's lambda of 0.72 indicates that a considerable
amount of discriminatory information had not been accounted for by the selected
variables.
10
The Canonical discriminant analysis used by Mbanasor and Nto (2009) have a
higher overall accuracy classification if compared to others. The lowest wilk’s lambda
value in this model also indicates that more variables are contributes to the
discriminant function. Besides, the selection of variables quite difference because
stepwise discriminant analytical procedure was carried out. Next, the method used by
Nguyen, Do dan Nguyen (2017) also a bit difference from others which log
transformation was carried out due to the distribution is not normal. The outcomes
provided by Mylonakis and Diacogiannis (2010) and Lugemwa & Darroch (1995)
included the corresponding percentage for correct classification in each group.
Different from the others three studies, the result by Mylonakis and Diacogiannis
(2010) shows that establishing a model to evaluate the credibility of prospective bank
card customers, using the technique of the linear discriminant analysis, is not possible.
CONCLUSION
Most of the studies show that discriminant analysis can be used to classify the bank
customers with different aspects such as insolvency, users of e-banking, users of
internet banking, credit worthiness potentials and seasonal loan repayment based on
their demographic or socio-economic characteristics. By using the estimated function,
the consumer credit disbursement decision can be faster, more accurate and cost
saving. This can conclude that the discriminant models used in these studies to
evaluate information of bank customers’ demographic characteristics like economic
and personal data would improve prediction accuracy and exhibits their relative
potential value and practical in everyday business life.
11
REFERENCE
Mbanasor, J.A, and P. Nto. 2009. Discriminant Analysis of Livestock Farmers Credit
Worthiness Potential under Rural Banking Scheme in Abia State, Nigeria.
Nigeria Agricultural Journal, Vol. 39, No. 1.
Nguyen, T.D., Do, T.T.H., Nguyen, B.N. 2017. The Application of Discriminant
Model in Managing Credit Risk for Consumer Loans in Vietnamese
Commercial Bank. Asian Social Science 13(2): 176-186.
Ramayah, T., Fauziah, M. and Pei Ling, K., 2006. Classifying Users and Non-Users of
Internet Banking in Northern Malaysia. Journal of Internet Banking and
Commerce. ISSN: 1204-5357.