Ant Analysis
Ant Analysis
Ant Analysis
Discriminant Analysis
Purpose of Discriminant Analysis
To classify objects (people, customers, things, etc.) into one of two or more
groups based on a set of features that describe the objects (e.g. gender,
age, income, weight, preference score, etc. )
In general, we assign an object to one of a number of predetermined groups
based on observations made on the object.
Groups are known or predetermined and do not have order (i.e. nominal
scale).. What we are looking for is two things:
Which set of features can best determine group membership of the object?
What is the classification rule or model to best separate those groups?
Discriminant analysis is a useful way to answer the questions…
Are the groups different?
On which of the variables are they different?
Is it possible to predict which group a person belongs to using these
variables?
Example
A mortgage company loan officer wants to
decide whether to approve an applicant’s
mortgage loan
Past data contains information of people who
have successfully repaid the loan & those who
have defaulted
Information available on these two groups – age,
income, marital status, outstanding debt and
ownership of certain durable goods
Discriminant analysis
Technique for
Analysing marketing data
Where the criterion is dependent variable &
The predictor or independent variables are interval in
nature
Discriminant Function
A linear combination of independent
variables, which will best discriminate
between the dependent variables (groups)
Groups in DA
Two-group discriminant analysis
Discriminant analysis where the criterion variable
has 2 categories
Multiple discriminant analysis
Discriminant analysis technique where the
criterion variable involves three or more
categories
Examples
Dependent variable is the choice of a PC
brand (A or B) and independent variables are
the ratings of attributes of PC on a 5 point
scale like Price, Battery life, Weight…
Do heavy, medium and light users of soft
drink differ in terms of their consumption of
frozen foods?
Distinguishing characteristics of consumers
who respond to direct mail solicitations
Objectives of DA
Developing a discriminant function, it is a linear combination of
the predictor or independent variables, which will best
discriminate between the categories of criterion or dependent
variable (groups)
Examining whether significant differences exist among the
groups, in terms of the predictor variables
Determining which predictor variable contributes most to the
intergroup differences
Classification of cases to one of the groups based on the values
of the predictor variables
Evaluation of the accuracy of the classification
DA model
Discriminant analysis model involves a linear
combination of the following form
D=b0+b1X1+b2X2+…….+bkXk
where D is the discriminant score
b’s are the discriminant coefficients or weights
X’s are the predictor or independent variables
The coefficients of b’s are estimated in such a way that the
groups differ as much as possible on the values of the
discriminant function
Statistically, it means that the ratio of between-group sum
of squares to within-group sum of squares for the
discriminant scores is at a maximum
Comparing Regression and DA
Regressi Discriminant
on Analysis
Similarities
Number of dependent variables One One
Number of independent variables Multiple Multiple
Differences
Nature of dependent variables Metric Categorical / Binary
Nature of independent variables Metric Metric
Conducting DA
Formulate the problem
Income 1
Travel 0.19745 1
Wilks'
Variable Lambda F Significance
Significance of the F ratio indicates that when predictors are considered individually,
Income, Importance of vacation and household size significantly differentiate
between those who took a vacation and those who did not
Discriminant Function
As there are 2 groups,
there will be 1
discriminant funtion
The function explains 100% of the variance and has a correlation of 0.8007
r2=(.8007)2=0.64
Which indicates that 64% of the variance in the dependent variable, taken
vacation is explained by this model
Significance of Discriminant
function
Further interpretation of
Discriminant analysis
Wilks Chi-square DF Sig.
makes sense, only if 0.3589 26.13 5 0.0001
the estimated
discriminant function is This is significant at 95% level of
significance. Thus the null hypothesis is
statistically significant rejected, indicating significant
In SPSS, the statistic discrimination,
interpreted
so the results can be
provided is Wilks’
Lambda and its
corresponding chi-
square transformation
Interpreting results
Standard Canonical Discriminant Function Coefficients
Func1
Income 0.74301
Travel 0.09611
Vacation 0.23329
Hsize 0.46911
Age 0.20922
These are to be applied to the raw values of the variables for classification
purpose. All the coefficients are +ve, suggesting that higher family income,
household size, importance attached to vacation, attitude towards travel and age
are more likely to result in the family taking a vacation
Classification
Group Centroids
Group Func1
1 1.29118
2 -1.29118
Correct classification
83%
Given, 2 groups of equal size, one would expect a hit ratio of ½=.50, by
chance, or 50%. Discriminant function has shown more than 25%
improvement over chance and the validity of the discriminant function is
judged as satisfactory