9 - Linear Discriminant Analysis
9 - Linear Discriminant Analysis
@IIT Roorkee
INTRODUCTION
• Linear Discriminant Analysis (LDA) is a statistical and machine learning
technique used for dimensionality reduction, classification, and pattern
recognition.
• It is primarily employed
when you have labeled data
(supervised learning) and
want to separate different
classes based on the features
LDA is based on the concept of finding a linear combination of features that best separates two or more
classes of objects or events. It essentially looks for the axes (linear discriminants) in the feature space that
provide the greatest separation between the different classes.
• Dimensionality Reduction : LDA can reduce the dimensionality of the dataset. It finds a lower-
dimensional space (e.g., projecting data from a 2D space to a 3D space) where the different classes
are better separated.
• Linear Separability : LDA assumes that the different classes can be linearly separated. This
means that the boundary between classes is a straight line (in two dimensions), a plane (in three
dimensions), or a hyperplane in higher dimensions
• Maximizing Between-Class Separation: LDA works by maximizing the distance between the means of
different classes (between-class variance) while minimizing the variance within each class (within-
class variance)
Key Mathematical Concepts in LDA
Let's assume we have a dataset X with N samples, each having d features, and the data is classified into C classes. We will
represent the data as follows:
• Xi∈R^(N×d) where each row is a sample and each column is a feature.
• y∈{1,2,…,C} represents the class labels for each sample.
MEAN VECTORS:
For each class k, we calculate the mean vector μk, which represents the average position of the data points in that class
in the feature space:
Where:
• Nk is the number of samples in class k.
• xi is a feature vector for sample i in class k.
• μk is the mean vector for class k.
Where
• N is the total number of samples across all classes.
2.Scatter Matrices:
Within-class scatter matrix (SW): Measures the spread (covariance) of points within each class. The within-class scatter matrix is
calculated by summing up the scatter within each class. For class k, the scatter of points within that class is:
Where:
• C is the number of classes,
• Nk is the number of samples in class k
• (xi−μk) is the deviation of each sample from the mean of its class.
Between-class scatter matrix (SB): Measures the separation between the means of different classes.The between-class scatter matrix
captures how the mean of each class differs from the overall mean:
Where:
• Nk is the number of samples in class k,
• μk is the mean vector of class k,
• μ is the overall mean vector of the dataset.
The idea is to maximize SB (separation between classes) and minimize SW (spread within classes).
3. Optimization Objective
To find the optimal projection that maximizes the separation between the classes, we solve for the projection matrix W
that maximizes the following criterion:
• Solve the eigenvalue problem to get the eigenvalues and corresponding eigenvectors of SW−1SB.
• The eigenvectors corresponding to the largest eigenvalues define the directions in which the classes are best separated. These eigenvectors
form the columns of the projection matrix W.
• Once W is computed, we project the original data X onto the new subspace using:
The new data X′ is the lower-dimensional representation of the original data with maximum class separability.
Applications of LDA
• Face Recognition: LDA is used for dimensionality reduction before classifying faces in
face recognition systems.
• Text Classification: It helps categorize documents or emails into different classes like
spam or non-spam.
• Medical Diagnosis: LDA is applied to predict the category of diseases based on patient
data.
• Credit Scoring: Banks use LDA for distinguishing between good and bad credit risks.
• Assumes Linearly Separable Data: LDA works best when data is linearly separable. It may perform
poorly when the decision boundary is nonlinear.
• Normality Assumption: LDA assumes that the features follow a Gaussian distribution, which may
not always hold true in practice.
• Equal Covariance Matrices: It assumes that the covariance matrices of the classes are equal,
which may not be the case in all applications.
9
10