Discriminant Analysis
Discriminant Analysis
Discriminant Analysis
(NIT Bhopal)
Assignment
for Data Analysis for Managers
Discriminant Analysis
Course
Coordinator:
Dr. Gaurika Malhotra
Submitted by:
Mohit Saxena
192121006
MBA 2019-2021
1
Discriminant Analysis
Assumptions
The assumptions of discriminant analysis are the same as those for MANOVA.
1. Sample Size :
a. The analysis is quite sensitive to outliers and the size of the smallest group must be
larger than the number of predictor variables
2. Multivariate Normality:
a. Independent variables are normal for each level of the grouping variable
3. Homogeneity of variance/covariance (homoscedasticity):
a. Variances among group variables are the same across levels of predictors.
b. Can be tested with Box's M statistic.
c. It has been suggested that
i. linear discriminant analysis be used when covariances are equal, and
ii. quadratic discriminant analysis may be used when covariances are not
equal.
4. Multicollinearity:
a. Predictive power can decrease with an increased correlation between predictor
variables.
5. Independence:
a. Participants are assumed to be randomly sampled, and a participant's score on one
variable is assumed to be independent of scores on that variable for all other
participants.
It has been suggested that discriminant analysis is relatively robust to slight violations of these
assumptions, and it has also been shown that discriminant analysis may still be reliable when
using dichotomous variables (where multivariate normality is often violated).
Model
Hypothesis
H1: The group means for two or more groups are not equal.
Discrimination rules
➢ Maximum likelihood:
Assigns x to the group that maximizes population (group) density.
➢ Classification matrix
AKA confusion or prediction matrix, the classification matrix contains the number of
correctly classified and misclassified cases.
It serves as a yardstick in measuring the accuracy of a model in classifying an individual
/case into one of the two groups. It tells as to what percentage of the existing data points
are correctly classified by the model developed in DA.
➢ F values
These are calculated from a one-way ANOVA, with the grouping variable serving as the
categorical independent variable.
Each predictor, in turn, serves as the metric dependent variable in the ANOVA.
➢ Box’s M Test:
By using Box’s M Tests, we test a null hypothesis that the covariance matrices do not differ
between groups formed by the dependent variable.
If the Box’s M Test is insignificant (<0.05), it indicates that the assumptions required for
DA holds true.
➢ Discriminant scores
The unstandardized coefficients are multiplied by the values of the variables.
These products are summed and added to the constant term to obtain the discriminant
scores.
➢ Eigen values
Eigen value is the index of overall fit. For each discriminant function, the Eigen value is
the ratio of between-group to within- group sums of squares.
Large Eigen values imply superior functions.
➢ Wilks‘ lambda
AKA the U statistic, Wilks’ λ for each predictor is the ratio of the within-group sum of
squares to the total sum of squares.
Its value varies between 0 and 1.
Large values of λ (near 1) indicate that group means do not seem to be different.
Small values of λ (near 0) indicate that the group means seem to be different.
Wilks’ λ = (1-R2),
where R2 is the canonical correlation
Discriminant Analysis
The dataset has 244 observations on four variables. The psychological variables are
outdoor interests, social and conservative.
The categorical variable is job type with three levels; 1) customer service, 2) mechanic, and
3) dispatcher.
4. Box’s M Test
5. Eigen Values
6. Wilk’s Lambda
8. Results
As observed, the customer service employees tend to be at the more social (negative) end of dimension
1; the dispatchers tend to be at the opposite end, with the mechanics in the middle. On dimension 2 the results
are not as clear; however, the mechanics tend to be higher on the outdoor dimension and customer service
employees and dispatchers lower.