Session 16-Discriminant Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Linear Discriminant Analysis

Dr. Rajiv Kumar


IIM Kashipur

Note: Content used in this PPT has copied from various source.
Introduction

Linear discriminant analysis (LDA) is a technique for


analyzing data when the criterion or dependent variable is
categorical and the predictor or independent variables are
continuous or interval in nature.
Introduction…

The objectives of discriminant analysis are as follows:


 Development of discriminant functions, or linear combinations of the
predictor or independent variables, which will best discriminate
between the categories of the criterion or dependent variable (groups).
 Examination of whether significant differences exist among the groups,
in terms of the predictor variables.
 Determination of which predictor variables contribute to most of the
intergroup differences.
 Classification of cases to one of the groups based on the values of the
predictor variables.
 Evaluation of the accuracy of classification.
Introduction…

 When the criterion variable has two categories, the technique is known
as two-group discriminant analysis.
 When three or more categories are involved, the technique is referred
to as multiple discriminant analysis.
 The main distinction is that, in the two-group case, it is possible to
derive only one discriminant function. In multiple discriminant analysis,
more than one function may be computed. In general, with G groups
and k predictors, it is possible to estimate up to the smaller of G - 1, or
k, discriminant functions.
 The first function has the highest ratio of between-groups to within-
groups sum of squares. The second function, uncorrelated with the
first, has the second highest ratio, and so on. However, not all the
functions may be statistically significant.
Geometric Interpretation

Figure: A Geometric Interpretation of


Two-Group Discriminant Analysis
Discriminant Analysis Model

The discriminant analysis model involves linear combinations of the


following form:
D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk Where:
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable

 The coefficients, or weights (b), are estimated so that the groups differ as much as
possible on the values of the discriminant function.
 This occurs when the ratio of between-group sum of squares to within-group sum of
squares for the discriminant scores is at a maximum.
Discriminant Analysis Model

The discriminant analysis model involves linear combinations of the


following form:
D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk Where:
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable

 The coefficients, or weights (b), are estimated so that the groups differ as much as
possible on the values of the discriminant function.
 This occurs when the ratio of between-group sum of squares to within-group sum of
squares for the discriminant scores is at a maximum.
Statistics Associated with Discriminant Analysis

Canonical correlation: Canonical correlation measures the extent of association


between the discriminant scores and the groups. It is a measure of association
between the single discriminant function and the set of dummy variables that
define the group membership.
Centroid: The centroid is the mean values for the discriminant scores for a
particular group. There are as many centroids as there are groups, as there is
one for each group. The means for a group on all the functions are the group
centroids.
Classification matrix: Sometimes also called confusion or prediction matrix, the
classification matrix contains the number of correctly classified and misclassified
cases.
Statistics Associated with Discriminant Analysis…

Discriminant function coefficients: The discriminant function coefficients


(unstandardized) are the multipliers of variables, when the variables are in the
original units of measurement.
Discriminant scores: The unstandardized coefficients are multiplied by the
values of the variables. These products are summed and added to the constant
term to obtain the discriminant scores.
Eigenvalue: For each discriminant function, the Eigenvalue is the ratio of
between-group to within-group sums of squares. Large Eigenvalues imply
superior functions.
Statistics Associated with Discriminant Analysis…

Standardized discriminant function coefficients: The standardized


discriminant function coefficients are the discriminant function coefficients and
are used as the multipliers when the variables have been standardized to a mean
of 0 and a variance of 1.
Structure correlations: Also referred to as discriminant loadings, the structure
correlations represent the simple correlations between the predictors and the
discriminant function.
Total correlation matrix: If the cases are treated as if they were from a single
sample and the correlations computed, a total correlation matrix is obtained.
Wilks'λ: Sometimes also called the U statistic, Wilks' λ for each predictor is the
ratio of the within-group sum of squares to the total sum of squares. Its value
varies between 0 and 1. Large values
of λ (near 1) indicate that group means do not seem to be different. Small values
of λ (near 0) indicate that the group means seem to be different.
Conducting Discriminant Analysis

If N1=N2 Cut-off Point=(Centroid1+Centroid2)/2

Otherwise,

Cut-off Point =(N1xCentroid2+N2xCentroid1)/(N1+N2)

Where N1, N2=number of training cases in group 1 and group 2


respectively
LDA in R (1of2)

library(readxl)
library(MASS)
df<-read_excel(‘E:/BC2/IrisData.xlsx')
#print(df)
model<-lda(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=df)
model
LDA in R (2of2)
Output (Confusion Matrix):
setosa versicolor virginica
Confusion matrix and Hit Ratio (Accuracy) of the
setosa 50 0 0
model
versicolor 0 48 2
predicted_class<-predict(model, newdata = df)$class
predicted_class
virginica 0 1 49
#Here we consider training data. One can use test
data Setosa Versicolor Virginica
table(df$Species, predicted_class) #Confusion Matrix
Setosa 50 0 0

Setosa 0 48 2

Setosa 0 1 49

Hit Ratio (Accuracy)=Total Correct Prediction/(Total True Prediction + Total False Prediction)
=147/(147+3)
=98%
LDA in R (3of3)

Predicting a case: Sepal.Length=5.8, Sepal.Width=2.7, Petal.Length=4.1, Petal.Width=1

l1=list("Sepal.Length"=5.8, "Sepal.Width"=2.7, "Petal.Length"=4.1, "Petal.Width"=1)


predicted_class<-predict(model, l1)$class #One can use data frame instead of a list
predicted_class

Output
Logistic Regression Vs. Discriminant Analysis

Logistic Regression Discriminant Analysis


1. No assumption of MV normality Assume MV normality
2. Can accommodate large numbers of Large number of predictors violets MV
predictors more easily normality -> can’t be accommodated
3. Categorical predictors OK (e.g., dummy Predictors must be continuous, interval level
codes)
4. Less powerful when assumptions are More powerful when assumptions are met
met
5. Few assumptions, typically met in Many assumptions, rarely met in practice
practice
6. Categorical IVs can be dummy coded Categorical IVs create problems
MV-Multivariate, IV-Independent Variable
Primary Scales of Measurement
Scale Basic Characteristics Examples Permissible Statistics
Nominal Numbers identify and classify Social Security numbers, numbering of football Percentages, mode
objects players, Brand numbers, store types, gender
(male, female)
Ordinal Numbers indicate the relative Quality rankings, rankings of teams in a Percentile, median
positions of the objects but tournament, Preference rankings, market
not the magnitude of position, social class, satisfaction, intention
differences between them

Interval Differences between objects Temperature (Fahrenheit, centigrade) Range, mean, standard
can be compared; zero point Attitudes, opinions, index numbers deviation
is arbitrary

Ratio Zero point is fixed; ratios of Length, weight, Age, income, costs, sales, Geometric mean, harmonic
scale values can be computed market shares mean

You might also like