0% found this document useful (0 votes)
357 views33 pages

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a supervised machine learning algorithm used for classification. It finds the linear combination of features that best separates two or more classes of objects. LDA assumes that features are normally distributed within each class, and that classes have equal covariance matrices. LDA can be used for binary or multi-class classification problems. It works by finding the projection of high-dimensional data onto a line that separates the classes as much as possible.

Uploaded by

Naman Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
357 views33 pages

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a supervised machine learning algorithm used for classification. It finds the linear combination of features that best separates two or more classes of objects. LDA assumes that features are normally distributed within each class, and that classes have equal covariance matrices. LDA can be used for binary or multi-class classification problems. It works by finding the projection of high-dimensional data onto a line that separates the classes as much as possible.

Uploaded by

Naman Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Linear Discriminant Analysis

LDA versus Cluster Analysis

CLUSTER ANALYSIS LDA


Cluster analysis is a unsupervised LDA is a supervised algorithm
algorithm
In clustering, you just adjust parameters  in discriminant analysis you train your
and you apply it directly to your sample.  system using a training set and then the
system classifies a new set of data based
on the result from the training set.

cluster analysis (CA) groups the objects whereas Discriminant analysis


on the basis of closeness (DA) groups the objects on the basis
of difference.
LDA versus Logistic Regression

• If n is small and the distribution of the predictors X is approximately normal in


each of the classes, the linear discriminant model is more stable than the logistic
regression model.

• LDA assumes that the observations are drawn from a Gaussian distribution with a
common covariance matrix in each class, and so can provide some improvements
over logistic regression when this assumption approximately holds. Conversely,
logistic regression can outperform LDA if these Gaussian assumptions are not met.
• The objective of discriminant analysis is to develop discriminant functions that
are nothing but the linear combination of independent variables that will
discriminate between the categories of the dependent variable in a perfect
manner. 
Both the DFA and Logistic Regression can answer similar research questions.
The logistic regression may be better when the dependent variable is binary (e.g., Yes/No, Pass/Fail, Healthy/Ill,
life/death) and the the independent variables can be nominal, ordinal, ratio or interval. The discriminant analysis might
be better when the dependent variable has more than two groups/categories

http://claudiaflowers.net/rsch8140/discriminant.htm
Introduction
• Discriminant analysis is the appropriate statistical technique
when the dependent variable is a categorical variable and the
independent variables are metric variables.
• In many cases, the dependent variable consists of two groups or
classifications, for example, male versus female or high versus
low.
• In other instances, more than two groups are involved, such as
low, medium, and high classifications.
• Discriminant analysis is capable of handling either two groups or
multiple (three or more) groups.
• When the criterion variable has two categories, the technique
is known as two-group discriminant analysis.

• When three or more categories are involved, the technique is


referred to as multiple discriminant analysis.
• If we can assume that the
groups are linearly
separable, we can use
linear discriminant model
(LDA).

• Linearly separable
suggests that the groups
can be separated by a
linear combination of
features that describe the
objects.
Visualization (two outcomes)
• If only two independent variables, the separators between objects
group will become lines.
Visualization (3 outcomes)
• If the features are three, the separator is a plane and the
number of independent variable is more than 3, the separators
become a hyper plane.
• The discriminant analysis model involves linear combinations of the
following form:

D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk

where
• D = discriminant score
• b 's = discriminant coefficient or weight
• X 's = predictor or independent variable

• The coefficients, or weights (b), are estimated so that the groups differ
as much as possible on the values of the discriminant function.
Pros & Cons
• Cons
• Old algorithm
• Newer algorithms - much better predicition
• Pros
• Simple
• Fast and portable
• Still beats some algorithms (logistic regression) when its
assumptions are met
• Good to use when begining a project
Main idea: find projection to a line such that samples from different classes
are well separated
Algorithm to solve numerical problem
1. Compute the global mean (M) using the samples.
2. Compute the statistics like Mean Vector and Covariance Matrix for
samples.
3. Compute within-class scatter matrix C. ( Termed as pooled within
group matrix)
4. Create Discriminant Functions (F1 and F2) by using formula
Discriminant function is
1
fi  i C x  i C i  ln  pi 
1 T
k
1

2
CLASSROOM PROBLEM
SOLUTION
LDA versus QDA

LDA QDA
LDA (Linear Discriminant Analysis) is used QDA (Quadratic Discriminant
when a linear boundary is required
between classifiers Analysis) is used to find a non-
linear boundary between
classifiers.

Common covariance across all Different covariance for each


response classes of the response classes.
Distribution of observation in Distribution of observation in
each of the response classes is each of the response class is
normal with a class-specific normal with a class-specific
mean (µk) and common mean (µk) and class-specific
covariance σ. covariance (σk2).

You might also like