0% found this document useful (0 votes)
84 views12 pages

STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data

This document discusses six research articles that used discriminant analysis in banking. The articles examined factors like electronic banking usage, internet banking acceptance, agricultural loan repayment, credit risk assessment, and credit card spending behavior. The document outlines the objectives, data collection methods, variables, and sample sizes of each study. The key factors analyzed across multiple studies included security, price, demographic characteristics, and payment history.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views12 pages

STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data

This document discusses six research articles that used discriminant analysis in banking. The articles examined factors like electronic banking usage, internet banking acceptance, agricultural loan repayment, credit risk assessment, and credit card spending behavior. The document outlines the objectives, data collection methods, variables, and sample sizes of each study. The key factors analyzed across multiple studies included security, price, demographic characteristics, and payment history.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

STQS4113 Set1 Sem 1 2020/2021

MULTIVARIAT GUNAAN

GROUP PROJECT

THEME
DISCRIMINANT ANALYSIS IN BANKING DATA

LECTURER
DR. ZAMIRA HASANAH BINTI ZAMZURI

GROUP MEMBERS

HO SOON CHENG A163749


TAN JING YI A164077
PUNG CEAU THUNG A164291
SEE JUN WEE A165208
LOH JUN HENG A165754
KHAW ZHI QI A165859
1

INTRODUCTION

Discriminant analysis is a concept widely used to classify levels of an outcome.


Discriminant analysis can be used to separate observations to certain known groups.
This method can apply in banking sector to identify the costumers belong to which
categories so that the more appropriate step can be taken by bank towards them. There
are six articles related to discriminant analysis in banking sector which will discussed
in this report.
Table 1 Title, objective and field

Title Objective Field


 A Discriminant Analysis of - To employ discriminant analysis to classify retail bank customers on the basis of users and Electronic
Electronic Banking in non-users. banking
Nigeria - To identify which variables that contributes to the classification.
- To find out whether users and non-users of e-banking differ in opinion with respect to the
variables that define e-banking usage. Titus & Dennis
(2014)
Classifying Users and Non- - To gauge to what extent is the acceptance/adoption, and what are the main concerns that are Internet Ramayah,
Users of Internet Banking in hindering the acceptance/adoption of Internet banking in Malaysia. banking Fauziah & Pei
Northern Malaysia Ling (2006)
Discriminant Analysis of - To analyse the livestock rural credit worthiness potential under rural banking scheme in Credit
Livestock Farmers Credit Abia State. Worthiness
Worthiness Potentials under - To examine the socio-economic characteristics of livestock farmers who are beneficiaries of Potentials
Rural Banking Scheme in rural banks credit in Abia State
Abia Stat, Nigeria, Nigeria Mbanasor &
Agricultural Nto (2009)
The Application of - To determine the consumer credit customers’ insolvency by using demographic & socio- Credit risk for
Discriminant Model in economics characteristics and two-group discriminant analysis customer
Managing Credit Risk for loans
Consumer Loans in
Vietnamese Commercial Nguyen, Do &
Bank Nguyen (2017)
… bersambung

23
... sambungan
Evaluating the Likelihood of - To evaluate and test the statistical technique of discriminant analysis on credit card Bank card
Using Linear Discriminant customers’ data of a Greek commercial bank and examine whether it is possible to create a owners credit
Analysis as A Commercial model evaluating the credibility of prospective credit-card customers. scoring model
Bank Card Owners Credit - To determine whether there exists a relationship between the on-time payments of credit Mylonakis &
Scoring Model card owners of a Commercial Bank and their demographic characteristics. Diacogiannis
(2010)
Discriminant Analysis of - To identify the factors associated with seasonal loan success or default amongst small-scale Loan
Seasonal Agricultural Loan farm clients of the Agricultural Bank of Transkei in 1991. repayment
Repayment by Small-scale Lugemwa &
Farmers in Transkei Darroch (1995)

3
4

METHODOLOGY

DATA

Data was collected through questionnaire. 1100 respondents were selected from the
Nigerian banking public using quota sampling method. However only 594
questionnaires were used for analysis. The questions in questionnaire included seven
independent variables with some questions each, which are service quality, security,
perceived risk, input factor, price, service product characteristics and individual
factors using five-point Likert scales. Demographic variables and the use of Hi-Tech
banking services also included in the questionnaire. The frequency of use Hi-Tech
banking services as dependent variable was measured using frequency-type question
which separated into two categories to apply discriminant analysis (Titus & Dennis
2014). From Ramayah, Fauziah and Koay (2006), data was collected through
questionnaires. 230 respondents were selected from the customers in various locations
from various banks operating in Penang using non-probability sampling method of
convenience sampling. However only 180 questionnaires were used for data analysis
out of 194 questionnaires collected since some of the questionnaires were incomplete.
The questions in questionnaires included six variables which are prior experience
using the internet, perceived usefulness, perceived ease of use, security, awareness of
services and benefits and price. Demographic variables of users and non-users of
internet banking also included in the questionnaire. The sample was selected randomly
into two groups which are analysis and holdout sample based on 65:35 ratio. Both
electronic and internet banking include the security and price factors to investigate
whether contribute to the classification or not.

For research which carried by Mbanasor and Nto (2009), data were collected
from both primary and secondary sources using two set for structured questionnaires
and information schedule for both the banks and livestock farmers who were granted
credit in 2006 farming seasons. A total of 62 data beneficiaries were randomly
selected from the zone Aba, Ohafia and Umuahia. A total of 13 independent variables
used in the analysis. From Nguyen, Do & Nguyen (2017), data was collected from
credit officers in commercial banks in Vietnam. Out of 550 consumer credit customers
5

data, only information of 500 borrowers was used for analysis. The predictor variables
are age, dependents, years at present job (YAPJ), salary and loan amount. Information
on 500 customers is divided into two different groups named analysis samples and
hold out samples randomly. The former including 400 customers will be used to
estimate discriminant function while the later including 100 customers will be used to
check the validity of the model.

The total sample used by Mylonakis and Diacogiannis (2010) consists of the
personal data, as well as, payment consistency for 1767 customers of the Greek
Commercial Bank (X-BANK) over a five-year period. The first sub-period (2.5 years)
aimed at establishing the model using a sample of 829 customers and contained 318
inconsistent paying customers; the second sub-period (2.5 years) aimed at validating
the model using a sample of 938 customers and contained 360 inconsistent paying
customers. Dependent variables used are the consistent and inconsistent payments and
14 explanatory variables, which are gender, marital status, dependants, date of birth,
home owner/or rent, years living in own property, profession code, owns a car, holds
an insurance, bills sent at home, automatic payment of bills, C card, XBANK staff and
bank group staff. The data used in the study Lugemwa & Darroch (1995) is data on
seasonal input loan application by 38 representative borrowers who received credit for
the 1990/91 agricultural season from the Agricultural Bank of Transkei. The variables
of the data include loan approval date, previous loans history, age of the borrower,
gender of the borrower, farmer status, agricultural training of the borrower, farm size,
agro-ecological classification, assets to liability ratio and distance from the borrower’s
residence to the nearest Agricultural Bank in Transkei. This shows that separation of
data into two categories which are training set and testing set may carried out before
apply discriminant analysis.
6

METHOD

Titus and Dennis (2014) were applied Pearson correlation, Cronbach Alpha reliability
analysis, Durbin Watson coefficient and factor analysis before Canonical Discriminant
Function used. From Ramayah, Fauziah and Koay (2006), the method used is Linear
Discriminant Analysis (LDA). This shows that comparison between two different
discriminant analysis towards electronic banking and internet banking can be seen.
Mbanasor and Nto (2009) used canonical discriminant function to derive a linear
combination of socio-economic characteristics which best discriminated credit
worthiness potential. Squared canonical correlation, Wilk’s Lambda and associated
chi-square statistics and the percent of livestock credit beneficiaries correctly
classified into group were used to test the significant of the discriminant.

Nguyen, T.D. et. al (2017) used an independent sample T-test to compare


mean score on predictors between default and non-default groups. A Kolmogorov-
Smirnov test was applied to access the normality of distribution of data. Other than
that, Pearson correlation also used to check whether multicollinearity happens
between variables. A direct method of discriminant analysis was used to determine the
status of customers. Mylonakis and Diacogiannis (2010) compare the theoretical
frequencies of the bank’s customer demographic characteristics with the
corresponding actual frequencies of the sample to determine the customers’
classification according to their consistency. Then the method of LDA is applied on
the data to produce a credit scoring model and a validation sample is considered and
each customer’s credit score is computed. The credit score is compared to a cut-off
point to determine the classification of each consumer as ‘good’ or ‘bad’.

The methods used in Lugemwa & Darroch (1995) are linear discriminant
analysis (LDA) to estimate the factors associated with loan. Besides, Wilk's lambda
was used to show the proportion of the total variance in the discriminant scores not
explained by differences among groups. For the credit scoring, there were also two
different discriminant analysis carried out. Besides, this show that correlation can be
used to aid the selection of variables for discriminant analysis meanwhile Wilk's
lambda can be applied to test the outcome of discriminant analysis.
7

FINDING

Titus and Dennis (2014) found that there was strong positive correlation between most
of the variables using Pearson correlation and no autocorrelation in all variables based
on Durbin Watson coefficient. After factor analysis was applied, the Cronbach Alpha
statistics for all items and individual variables were fall within the acceptable limit.
Based on Test of Equality of Group Means, on univariate basis, seven variables
contributed to the classification. Wilks’s Lambda for function made from Canonical
Discriminant Function showed that discriminant analysis was dependable. 65.7% of
original grouped cases were correctly classified. Since maximum chance criterion and
proportional chance criterion were lower than 65.7%, discriminant analysis is reliable.
Seven variables were significant to classify retail bank customers on the basis of users
and non-users of electronic banking. Higher level of performance on the service
quality, well developed service products, higher user input, individual factor like
awareness positively, expected security of operation, lower perception of risk and
lower priced paid by customer influence consumer involvement in electronic banking.

From Ramayah, Fauziah and Koay (2006), discriminant analysis was


conducted to test whether the six variables can help to discriminate between users and
non-users of Internet banking. The analysis showed that the predictive accuracy of the
analysis sample is 95.8% while the predictive accuracy of the holdout sample is
93.3%. In the process of model validation using holdout sample, the hit ratio value is
greater than the maximum likelihood and proportional chance value thus Press’s Q
statistics was significant at level of significance 0.01. Thus, it can be concluded that
the model is good and accurate. By squaring the Canonical correlation (0.7848),
61.6% of the variance in the dependent variable can be accounted for by this model.
Overall hit ratio also exceeded the proportional chance criterion (62.5%) by 25%. This
confirmed the predictive validity by using discriminant function and thus the model is
good and valid. The canonical squared correlation of 0.616 and is statistically
significant with Wilks’ Lambda = 0.384, p-value = 0.000. The results of discriminant
analysis also showed that the internet banking users had more prior internet banking
experience, had positive views on the ease of use, were more aware of internet
8

banking service and benefits and also had less security concern as compared to non-
users of internet banking.
The accuracy of Linear Discriminant Analysis used for internet banking data is
higher than Canonical Discriminant Analysis for electronic banking data. Besides,
discriminant analysis for internet banking was carried out using two set of data which
are analysis and holdout sample meanwhile whole data was used for discriminant
analysis for electronic banking. This shows that the separation of data into training
data and testing data may obtain a more precise model. Ramayah, Fauziah and Koay
(2006) found that most of the individuals refuse to use internet banking because they
concern about security and privacy issue. This statement is similar to the hypothesis
accepted in discriminant analysis for electronic banking which is expected security of
operation will influence consumer to involve in electronic banking. Next, users of
electronic banking and internet banking also more aware of these banking services.

Mbanasor and Nto (2009) used stepwise discriminant analytical procedure


identified only nine variables as being significant in discriminating credit worthiness
potential. Furthermore, the result further shows that farm income, value of assets, loan
amount and total expenditure are the most valuable variables in determining loan
applicants’ credit worthiness potentials in the area using total discriminant score. The
estimated centroid for credit worthy livestock farmers was found to be 1.27 while that
of non-credit worthy livestock farmers was 3.65 which implies that the higher the
composite score of any livestock farmer, the higher the probability that the livestock
farmer will be classified as being credit worthy. From Mbanasor and Nto (2009), this
show a high canonical correlation coefficient 0.910, Wilk’s lambda value 0.173 and
Chi-square test significant at one percent which indicate that the discriminant function
used is high significant amount of information’s required discriminant. The
percentage of correctly classified is 95.16%.

According to Nguyen, Do dan Nguyen (2017), a significant difference in group


means shown by variables age, years at present jobs, number of dependents and loan
amount in t-test. Refer to results of K-S test, p-value= 0.000 which is smaller than
0.05 for all predictor variables indicates that the distribution is not normal thus log
transformation was carried out. Pearson correlation showed there was a strong positive
9

correlation between age and years at present job with coefficient 0.770 which had
higher than 0.7. Therefore, variable age was removed from the discriminant function.
Based on the Test of Equality of Group Means, YAPJ and number of dependents may
best discriminate between the two groups of borrowers. Square of canonical
correlation: 0.4956 indicates 49.56% of variance in the dependent variable is
explained by estimated discriminant function. Wilk’s Lambda with 0.505 showed that
the discriminant function computed is statistically significant at 0.000 level. This
estimated discriminate function was significant at 1% level of significance and could
forecast financial health with 72.3% accuracy. There is a difference between research
carried out by Titus and Dennis (2014) and Nguyen, Do dan Nguyen (2017). This is
because although the Pearson correlation is high between each variable in Titus and
Dennis (2014) but Durbin Watson coefficient showed the result there is no
autocorrelation, this showed that multicollinearity did not happened. However, the
method used by Nguyen, Do dan Nguyen (2017) for detect multicollinearity is based
on the Pearson coefficient between independent variables.

From Mylonakis and Diacogiannis (2010), the discriminant model using the
initial sample of 829 customers has a low R2 (16.15 %). Yet, the discriminant
function’s coefficients are almost all statistically significant at the significance level of
5%. The best results were obtained from a model with a validation sample of size 938
using a cut-off point 0.398, which includes ten explanatory variables. Using this
model ‘bad’ payer classification is correct in 59% of instances and the corresponding
percentage for ‘prompt’ payers was 54%. In this study, overall accuracy classification
did not provide. Then, the low value of correlation indicates the limited capability of
independent variables to explain the level of customer credibility using the
discriminant model made. Based on the study of Lugemwa & Darroch (1995), the
factor “credit history” is the major factor associated with seasonal agricultural loan
success. Besides, 76% of failed loans and 69 % of successful loans were correctly
identified using the LDA model while the overall classification accuracy is 74%.
Lastly, the study also shows the Wilk's lambda of 0.72 indicates that a considerable
amount of discriminatory information had not been accounted for by the selected
variables.
10

The Canonical discriminant analysis used by Mbanasor and Nto (2009) have a
higher overall accuracy classification if compared to others. The lowest wilk’s lambda
value in this model also indicates that more variables are contributes to the
discriminant function. Besides, the selection of variables quite difference because
stepwise discriminant analytical procedure was carried out. Next, the method used by
Nguyen, Do dan Nguyen (2017) also a bit difference from others which log
transformation was carried out due to the distribution is not normal. The outcomes
provided by Mylonakis and Diacogiannis (2010) and Lugemwa & Darroch (1995)
included the corresponding percentage for correct classification in each group.
Different from the others three studies, the result by Mylonakis and Diacogiannis
(2010) shows that establishing a model to evaluate the credibility of prospective bank
card customers, using the technique of the linear discriminant analysis, is not possible.

CONCLUSION
Most of the studies show that discriminant analysis can be used to classify the bank
customers with different aspects such as insolvency, users of e-banking, users of
internet banking, credit worthiness potentials and seasonal loan repayment based on
their demographic or socio-economic characteristics. By using the estimated function,
the consumer credit disbursement decision can be faster, more accurate and cost
saving. This can conclude that the discriminant models used in these studies to
evaluate information of bank customers’ demographic characteristics like economic
and personal data would improve prediction accuracy and exhibits their relative
potential value and practical in everyday business life.
11

REFERENCE

Lugemwa, W.H. & Darroch, M.A.G. 1995. Discriminant Analysis of Seasonal


Agricultural Loan Repayment by Small-scale Farmers in Transkei. Agrekon
34(4): 231- 234.

Mbanasor, J.A, and P. Nto. 2009. Discriminant Analysis of Livestock Farmers Credit
Worthiness Potential under Rural Banking Scheme in Abia State, Nigeria.
Nigeria Agricultural Journal, Vol. 39, No. 1.

Mylonakis, J. & Diacogiannis, G. 2010. Evaluating the Likelihood of Using Linear


Discriminant Analysis as A Commercial Bank Card Owners Credit Scoring
Model. International Business Research, 3, 9-20.

Nguyen, T.D., Do, T.T.H., Nguyen, B.N. 2017. The Application of Discriminant
Model in Managing Credit Risk for Consumer Loans in Vietnamese
Commercial Bank. Asian Social Science 13(2): 176-186.

Ramayah, T., Fauziah, M. and Pei Ling, K., 2006. Classifying Users and Non-Users of
Internet Banking in Northern Malaysia. Journal of Internet Banking and
Commerce. ISSN: 1204-5357.

Titus, C. O. & Dennis, C. A. 2014. A Discriminant Analysis of Electronic Banking in


Nigeria. Journal of Emerging Trends in Economics and Management Sciences.
5(2): 194-200.

You might also like