Session 16-Discriminant Analysis
Session 16-Discriminant Analysis
Session 16-Discriminant Analysis
Note: Content used in this PPT has copied from various source.
Introduction
When the criterion variable has two categories, the technique is known
as two-group discriminant analysis.
When three or more categories are involved, the technique is referred
to as multiple discriminant analysis.
The main distinction is that, in the two-group case, it is possible to
derive only one discriminant function. In multiple discriminant analysis,
more than one function may be computed. In general, with G groups
and k predictors, it is possible to estimate up to the smaller of G - 1, or
k, discriminant functions.
The first function has the highest ratio of between-groups to within-
groups sum of squares. The second function, uncorrelated with the
first, has the second highest ratio, and so on. However, not all the
functions may be statistically significant.
Geometric Interpretation
The coefficients, or weights (b), are estimated so that the groups differ as much as
possible on the values of the discriminant function.
This occurs when the ratio of between-group sum of squares to within-group sum of
squares for the discriminant scores is at a maximum.
Discriminant Analysis Model
The coefficients, or weights (b), are estimated so that the groups differ as much as
possible on the values of the discriminant function.
This occurs when the ratio of between-group sum of squares to within-group sum of
squares for the discriminant scores is at a maximum.
Statistics Associated with Discriminant Analysis
Otherwise,
library(readxl)
library(MASS)
df<-read_excel(‘E:/BC2/IrisData.xlsx')
#print(df)
model<-lda(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=df)
model
LDA in R (2of2)
Output (Confusion Matrix):
setosa versicolor virginica
Confusion matrix and Hit Ratio (Accuracy) of the
setosa 50 0 0
model
versicolor 0 48 2
predicted_class<-predict(model, newdata = df)$class
predicted_class
virginica 0 1 49
#Here we consider training data. One can use test
data Setosa Versicolor Virginica
table(df$Species, predicted_class) #Confusion Matrix
Setosa 50 0 0
Setosa 0 48 2
Setosa 0 1 49
Hit Ratio (Accuracy)=Total Correct Prediction/(Total True Prediction + Total False Prediction)
=147/(147+3)
=98%
LDA in R (3of3)
Output
Logistic Regression Vs. Discriminant Analysis
Interval Differences between objects Temperature (Fahrenheit, centigrade) Range, mean, standard
can be compared; zero point Attitudes, opinions, index numbers deviation
is arbitrary
Ratio Zero point is fixed; ratios of Length, weight, Age, income, costs, sales, Geometric mean, harmonic
scale values can be computed market shares mean