Linear Discriminant Analysis
Linear Discriminant Analysis
• LDA assumes that the observations are drawn from a Gaussian distribution with a
common covariance matrix in each class, and so can provide some improvements
over logistic regression when this assumption approximately holds. Conversely,
logistic regression can outperform LDA if these Gaussian assumptions are not met.
• The objective of discriminant analysis is to develop discriminant functions that
are nothing but the linear combination of independent variables that will
discriminate between the categories of the dependent variable in a perfect
manner.
Both the DFA and Logistic Regression can answer similar research questions.
The logistic regression may be better when the dependent variable is binary (e.g., Yes/No, Pass/Fail, Healthy/Ill,
life/death) and the the independent variables can be nominal, ordinal, ratio or interval. The discriminant analysis might
be better when the dependent variable has more than two groups/categories
http://claudiaflowers.net/rsch8140/discriminant.htm
Introduction
• Discriminant analysis is the appropriate statistical technique
when the dependent variable is a categorical variable and the
independent variables are metric variables.
• In many cases, the dependent variable consists of two groups or
classifications, for example, male versus female or high versus
low.
• In other instances, more than two groups are involved, such as
low, medium, and high classifications.
• Discriminant analysis is capable of handling either two groups or
multiple (three or more) groups.
• When the criterion variable has two categories, the technique
is known as two-group discriminant analysis.
• Linearly separable
suggests that the groups
can be separated by a
linear combination of
features that describe the
objects.
Visualization (two outcomes)
• If only two independent variables, the separators between objects
group will become lines.
Visualization (3 outcomes)
• If the features are three, the separator is a plane and the
number of independent variable is more than 3, the separators
become a hyper plane.
• The discriminant analysis model involves linear combinations of the
following form:
where
• D = discriminant score
• b 's = discriminant coefficient or weight
• X 's = predictor or independent variable
• The coefficients, or weights (b), are estimated so that the groups differ
as much as possible on the values of the discriminant function.
Pros & Cons
• Cons
• Old algorithm
• Newer algorithms - much better predicition
• Pros
• Simple
• Fast and portable
• Still beats some algorithms (logistic regression) when its
assumptions are met
• Good to use when begining a project
Main idea: find projection to a line such that samples from different classes
are well separated
Algorithm to solve numerical problem
1. Compute the global mean (M) using the samples.
2. Compute the statistics like Mean Vector and Covariance Matrix for
samples.
3. Compute within-class scatter matrix C. ( Termed as pooled within
group matrix)
4. Create Discriminant Functions (F1 and F2) by using formula
Discriminant function is
1
fi i C x i C i ln pi
1 T
k
1
2
CLASSROOM PROBLEM
SOLUTION
LDA versus QDA
LDA QDA
LDA (Linear Discriminant Analysis) is used QDA (Quadratic Discriminant
when a linear boundary is required
between classifiers Analysis) is used to find a non-
linear boundary between
classifiers.