Discriminant Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Maulana Azad National Institute of Technology

(NIT Bhopal)

Assignment
for Data Analysis for Managers

Discriminant Analysis

Course
Coordinator:
Dr. Gaurika Malhotra

Submitted by:
Mohit Saxena
192121006
MBA 2019-2021
1

Discriminant Analysis

Linear discriminant analysis (LDA)


• AKA normal discriminant analysis (NDA), or discriminant function analysis
• It is a generalization of Fisher's linear discriminant, a method used in statistics to find a
linear combination of features that characterizes or separates two or more classes of objects
or events.
• The resulting combination may be used as a linear classifier, or, more commonly, for
dimensionality reduction before later classification.
• LDA is closely related to analysis of variance (ANOVA) and regression analysis, which
also attempt to express one dependent variable as a linear combination of other features or
measurements.
o However, ANOVA uses categorical independent variables and a continuous
dependent variable, whereas discriminant analysis has continuous independent
variables and a categorical dependent variable (i.e. the class label).
• Logistic regression and Probit regression are more similar to LDA than ANOVA is, as they
also explain a categorical variable by the values of continuous independent variables.
o These other methods are preferable in applications where it is not reasonable to
assume that the independent variables are normally distributed, which is a
fundamental assumption of the LDA method.
• LDA is also closely related to principal component analysis (PCA) and factor analysis in
that they both look for linear combinations of variables which best explain the data.
o LDA explicitly attempts to model the difference between the classes of data.
o PCA, in contrast, does not take into account any difference in class, and
o factor analysis builds the feature combinations based on differences rather than
similarities.
o Discriminant analysis is also different from factor analysis in that it is not an
interdependence technique: a distinction between independent variables and
dependent variables (also called criterion variables) must be made.
• LDA works when the measurements made on independent variables for each observation
are continuous quantities. When dealing with categorical independent variables, the
equivalent technique is discriminant correspondence analysis.
• Discriminant analysis is used when groups are known a priori (unlike in cluster analysis).
o Each case must have a score on one or more quantitative predictor measures, and a
score on a group measure.
o Discriminant function analysis is classification - the act of distributing things into
groups, classes or categories of the same type.

Mohit Saxena 192121006 MBA 2019-21


2

Multiple Discriminant Analysis

• MDA is a multivariate dimensionality reduction technique. It has been used to predict


signals as diverse as neural memory traces and corporate failure.
• MDA is not directly used to perform classification. It merely supports classification by
yielding a compressed signal amenable to classification.
• The method projects the multivariate signal down to an M−1 dimensional space where M
is the number of categories.
• MDA is useful because most classifiers are strongly affected by the curse of dimensionality.
o When signals are represented in very-high-dimensional spaces, the classifier's
performance is catastrophically impaired by the overfitting problem.
o This problem is reduced by compressing the signal down to a lower-dimensional
space as MDA does.

Mohit Saxena 192121006 MBA 2019-21


3

Assumptions

The assumptions of discriminant analysis are the same as those for MANOVA.

1. Sample Size :
a. The analysis is quite sensitive to outliers and the size of the smallest group must be
larger than the number of predictor variables
2. Multivariate Normality:
a. Independent variables are normal for each level of the grouping variable
3. Homogeneity of variance/covariance (homoscedasticity):
a. Variances among group variables are the same across levels of predictors.
b. Can be tested with Box's M statistic.
c. It has been suggested that
i. linear discriminant analysis be used when covariances are equal, and
ii. quadratic discriminant analysis may be used when covariances are not
equal.
4. Multicollinearity:
a. Predictive power can decrease with an increased correlation between predictor
variables.
5. Independence:
a. Participants are assumed to be randomly sampled, and a participant's score on one
variable is assumed to be independent of scores on that variable for all other
participants.

It has been suggested that discriminant analysis is relatively robust to slight violations of these
assumptions, and it has also been shown that discriminant analysis may still be reliable when
using dichotomous variables (where multivariate normality is often violated).

Mohit Saxena 192121006 MBA 2019-21


4

Model

A discriminant function is a latent variable which is constructed as a linear combination of


independent variables, such that

D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk


where
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable

▪ The discriminant function is also known as canonical root.


▪ This discriminant function is used to classify the subject/cases into one of the two groups
on the basis of the observed values of the predictor variables. Variables in the analysis are
the independent entities.
▪ The coefficients, or weights (b), are estimated so that the groups differ as much as possible
on the values of the discriminant function.
▪ Discriminant analysis – creates an equation which will minimize the possibility of
misclassifying cases into their respective groups or categories

Hypothesis
H1: The group means for two or more groups are not equal.

This group means is referred to as a centroid.

Mohit Saxena 192121006 MBA 2019-21


5

Discrimination rules

➢ Maximum likelihood:
Assigns x to the group that maximizes population (group) density.

➢ Bayes Discriminant Rule:


Assigns x to the group that maximizes πi fi(x), where πi represents the prior probability of
that classification, and fi(x) represents the population density.

➢ Fisher's linear discriminant rule:


Maximizes the ratio between SSbetween and SSwithin, and finds a linear combination of the
predictors to predict group.

➢ Canonical discriminant analysis for k classes


Canonical correlation measures the extent of association between the discriminant scores
and the groups.
Canonical discriminant analysis (CDA) finds axes (k − 1 canonical coordinates, k being the
number of classes) that best separate the categories. These linear functions are uncorrelated
and define, in effect, an optimal k − 1 space through the n-dimensional cloud of data that
best separates (the projections in that space of) the k groups.

➢ Classification matrix
AKA confusion or prediction matrix, the classification matrix contains the number of
correctly classified and misclassified cases.
It serves as a yardstick in measuring the accuracy of a model in classifying an individual
/case into one of the two groups. It tells as to what percentage of the existing data points
are correctly classified by the model developed in DA.

➢ Discriminant function coefficients


The discriminant function coefficients (unstandardized) are the multipliers of variables,
when the variables are in the original units of measurement.

➢ F values
These are calculated from a one-way ANOVA, with the grouping variable serving as the
categorical independent variable.
Each predictor, in turn, serves as the metric dependent variable in the ANOVA.

Mohit Saxena 192121006 MBA 2019-21


6

➢ Box’s M Test:
By using Box’s M Tests, we test a null hypothesis that the covariance matrices do not differ
between groups formed by the dependent variable.
If the Box’s M Test is insignificant (<0.05), it indicates that the assumptions required for
DA holds true.

➢ Discriminant scores
The unstandardized coefficients are multiplied by the values of the variables.
These products are summed and added to the constant term to obtain the discriminant
scores.

➢ Eigen values
Eigen value is the index of overall fit. For each discriminant function, the Eigen value is
the ratio of between-group to within- group sums of squares.
Large Eigen values imply superior functions.

➢ Pooled within-group correlation matrix


The pooled within-group correlation matrix is computed by averaging the separate
covariance matrices for all the groups.

➢ Wilks‘ lambda
AKA the U statistic, Wilks’ λ for each predictor is the ratio of the within-group sum of
squares to the total sum of squares.
Its value varies between 0 and 1.
Large values of λ (near 1) indicate that group means do not seem to be different.
Small values of λ (near 0) indicate that the group means seem to be different.
Wilks’ λ = (1-R2),
where R2 is the canonical correlation

Mohit Saxena 192121006 MBA 2019-21


7

Practical using SPSS

Discriminant Analysis

The dataset has 244 observations on four variables. The psychological variables are
outdoor interests, social and conservative.
The categorical variable is job type with three levels; 1) customer service, 2) mechanic, and
3) dispatcher.

Step 1: Select grouping variable : Job

Step 2: Set min and max number of groups

Step 3: Select independent variables: Outdoor, Social, Conservative

Step 4: Under statistics option → select as follows

Mohit Saxena 192121006 MBA 2019-21


8

Step 5: Under Classification option → select as follows → Continue → OK

Discriminant Analysis will be generated as follows

1. Mean and Std Deviation of each independent variable

Mohit Saxena 192121006 MBA 2019-21


9

2. Cases included and excluded, if any.

3. Pooled within the group matrix: Correlation between the 3 independents

Not closely knit correlation between variables → discrimination

4. Box’s M Test

Box M <0.05 → Null Hypotheis rejected → discrimination

Mohit Saxena 192121006 MBA 2019-21


10

5. Eigen Values

Fn 1 is better suited → as Eigen Value is higher, and variation is higher

6. Wilk’s Lambda

Fn1 is better suited → Wilk’s Lambda is lower

7. Discriminant Function Coefficients derived and output

Mohit Saxena 192121006 MBA 2019-21


11

8. Results

Mohit Saxena 192121006 MBA 2019-21


12

The discriminant functions are:

Fn 1: discriminant_score_1 = 0.517*conservative + 0.379*outdoor – 0.831*social.

Fn 2: discriminant_score_2 = 0.926*outdoor + 0.213*social – 0.291*conservative.

As observed, the customer service employees tend to be at the more social (negative) end of dimension
1; the dispatchers tend to be at the opposite end, with the mechanics in the middle. On dimension 2 the results
are not as clear; however, the mechanics tend to be higher on the outdoor dimension and customer service
employees and dispatchers lower.

Mohit Saxena 192121006 MBA 2019-21

You might also like