0% found this document useful (0 votes)
3 views25 pages

Dimensionality Reduction - PCA LDA

The document explains the concepts of eigendecomposition, eigenvectors, and eigenvalues, emphasizing their importance in understanding matrix transformations and simplifying operations. It connects these concepts to Principal Component Analysis (PCA), which reduces data dimensionality by retaining directions of maximum variance. Additionally, it provides examples using Python code to demonstrate PCA on datasets, including the Iris dataset and a hand span-height ratio dataset.

Uploaded by

vishesh.yo.34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views25 pages

Dimensionality Reduction - PCA LDA

The document explains the concepts of eigendecomposition, eigenvectors, and eigenvalues, emphasizing their importance in understanding matrix transformations and simplifying operations. It connects these concepts to Principal Component Analysis (PCA), which reduces data dimensionality by retaining directions of maximum variance. Additionally, it provides examples using Python code to demonstrate PCA on datasets, including the Iris dataset and a hand span-height ratio dataset.

Uploaded by

vishesh.yo.34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Dimensionality Reduction

PCA and LDA

Basic Math for PCA

What is Eigendecomposition?

Think of a matrix as a machine that performs transformations on vectors. It can stretch, shrink,
rotate, or otherwise alter a vector.

● Eigenvectors: Special vectors that, when fed into this matrix-machine, only change in
length (get scaled), but their direction remains fundamentally the same.
● Eigenvalues: The factor by which each eigenvector gets scaled is its corresponding
eigenvalue.

Eigendecomposition is the process of breaking down a matrix into a set of eigenvectors and
their corresponding eigenvalues.

Why is this useful?

1. Understanding Transformations: Eigendecomposition reveals the core directions


along which a matrix acts, and by how much it stretches or shrinks things along those
directions.

2. Simplifying Operations: Many matrix operations (like powers, inverses) become easier
with eigendecomposition because you're essentially just scaling the eigenvectors.

3. Applications: Used in countless fields:

○ Principal Component Analysis (PCA): Reducing data dimensions by keeping


the directions of the most variation (eigenvectors with largest eigenvalues).
○ Image compression: Identifying less important directions to save space.
○ Vibration analysis: Finding natural modes of vibration in structures.

Example

Let's consider this matrix:

A = [2 1]
[1 2]

1. Find Eigenvalues: Solve the equation: det(A - λI) = 0 where 'λ' represents
eigenvalues, and 'I' is the identity matrix. For our matrix, this gives us eigenvalues λ1 = 3
and λ2 = 1.

2. Find Eigenvectors: For each eigenvalue, solve the equation: (A - λI) v = 0 where
'v' is the eigenvector.

○ For λ1 = 3, we get the eigenvector v1 = [1 1]


○ For λ2 = 1, we get the eigenvector v2 = [-1 1]

Representation

We can now represent our matrix A as:

A = V * D * V^-1

Where:

● V: Matrix whose columns are the eigenvectors ([v1 v2])


● D: Diagonal matrix with eigenvalues on the diagonal
● V^-1: The inverse of V

Remember

● Eigendecomposition isn't always possible (some matrices don't diagonalize this way).
● A matrix represents a transformation; eigenvectors are the directions of this
transformation, and eigenvalues are how much scaling happens along those directions.

PCA and Eigendecomposition: The Connection

1. Goal of PCA: PCA aims to reduce the dimensionality of data while retaining the most
important directions of variation.

2. Variance as Information: Eigenvectors of the data's covariance matrix represent the


directions of greatest variance. Eigenvalues tell us how much variance is captured along
each direction.

3. Key Idea: PCA projects the data onto the eigenvectors with the largest eigenvalues.
These eigenvectors become the new "principal components" of the data. Since
eigenvalues represent variance, this keeps directions of high variation and discards
those with less variation.

Example with Python

Let's use the Iris dataset, a common example for PCA:


import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data[:, :2] # For this example, we'll use only the first 2 features
y = iris.target

# Center the data


X_meaned = X - np.mean(X, axis=0)

# Calculate the covariance matrix and eigendecomposition


cov_mat = np.cov(X_meaned, rowvar=False)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)

# Sort components by explained variance (eigenvalues)


idx = np.argsort(eig_vals)[::-1]
eig_vecs = eig_vecs[:, idx]
eig_vals = eig_vals[idx]

# Select top 2 components (capturing most of the variance)


W = eig_vecs[:, :2]

# Project the data onto the new space


X_reduced = np.dot(X_meaned, W)

# Plot the results


plt.figure(figsize=(8,6))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='viridis')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset')
plt.show()

See the code in action on this Colab -


https://colab.research.google.com/drive/1QIvSmD42RDqO1whluBp6VHbHIP8aN9Mm?usp
=sharing

A modified code snippet incorporating the visualization of the original data along with the
principal components (eigenvectors) using Matplotlib:

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data[:, :2] # For this example, we'll use only the first 2 features
y = iris.target

# Center the data


X_meaned = X - np.mean(X, axis=0)

# Calculate the covariance matrix and eigendecomposition


cov_mat = np.cov(X_meaned, rowvar=False)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)

# Sort components by explained variance (eigenvalues)


idx = np.argsort(eig_vals)[::-1]
eig_vecs = eig_vecs[:, idx]
eig_vals = eig_vals[idx]

# Select top 2 components (capturing most of the variance)


W = eig_vecs[:, :2]

# Project the data onto the new space


X_reduced = np.dot(X_meaned, W)

# Define scaling factors for visualizing eigenvectors (adjust as needed)


scale1 = 3 # Adjust for better visualization of eigenvector 1
scale2 = 2 # Adjust for better visualization of eigenvector 2

# Plot the results


plt.figure(figsize=(8,6))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='viridis', alpha=0.7, label='Data')

# Plot the eigenvectors with arrows


plt.arrow(0, 0, scale1*eig_vecs[:, 0][0], scale1*eig_vecs[:, 0][1], color='red', linewidth=2,
label='PC1')
plt.arrow(0, 0, scale2*eig_vecs[:, 1][0], scale2*eig_vecs[:, 1][1], color='blue', linewidth=2,
label='PC2')

plt.xlabel('Principal Component 1')


plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset with Eigenvectors')
plt.legend()
plt.show()

Explanation:

1. We define scaling factors (scale1 and scale2) to visually represent the eigenvectors
with a reasonable length. You might need to adjust these values depending on your data
spread.
2. We use plt.arrow to draw arrows starting from the origin (0, 0) with scaled lengths
based on the eigenvectors and colored differently for better distinction.
3. We add labels ('PC1' and 'PC2') to the plotted arrows for clarity.

Hand Span - Height Ratio - Another example with two features that have a high degree of
covariance, allowing for reduction to a single feature using PCA.

Scenario: Imagine a dataset where one feature represents the height of a person and the other
represents their hand span. These features are likely to be highly correlated, meaning taller
people generally have wider hand spans and vice versa.

Data Generation (Python Code):

import numpy as np

# Define the mean and standard deviation for height and hand span
mean_height = 170
std_height = 10
mean_span = 18
std_span = 8

# Generate correlated data using a linear relationship with some noise


covariance = 0.8 # Adjust for desired level of covariance
np.random.seed(10) # Set a seed for reproducible results
height = np.random.normal(mean_height, std_height, 100)
noise = np.random.normal(0, 1, 100)
hand_span = covariance * height + mean_span + std_span * noise

Explanation:

1. We define the mean and standard deviation for both height and hand span.
2. We set a high covariance value (0.8) to create a strong linear relationship between the
two features.
3. We use np.random.normal to generate data with specified means and standard
deviations for both height and noise.
4. The hand span is calculated using a linear equation with the defined covariance, mean,
standard deviation, and added noise to introduce some variation.

Visualization :

plt.scatter(height, hand_span)
plt.xlabel('Height')
plt.ylabel('Hand Span')
plt.title('Original Data (Height vs. Hand Span)')
plt.show()

This scatter plot will visually demonstrate the strong positive correlation between height and
hand span.

Applying PCA:

Using PCA on this data with only two features will essentially capture the single direction of
variation (the linear relationship between height and hand span). The first principal
component will represent this dominant direction, and the second component will
contain minimal variance and can be discarded with minimal information loss.

Remember

● The discarded component (second principal component) will primarily capture the noise
added to the data.

Create a sample dataset for heights and hand spans of 100 people, followed by applying a PCA
for dim reduction, visualizing the PCA. Also show what the original feature matrices were, and
what is the feature matrix looking like after PCA - Check on Colab

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Define parameters
num_people = 100
mean_height = 170
std_height = 10
mean_span = 18
std_span = 8
covariance = 0.8
# Generate correlated data
np.random.seed(10)
height = np.random.normal(mean_height, std_height, num_people)
noise = np.random.normal(0, 1, num_people)
hand_span = covariance * height + mean_span + std_span * noise

# Create scatter plot


plt.figure(figsize=(8, 6))
plt.scatter(height, hand_span)
plt.xlabel("Height")
plt.ylabel("Hand Span")
plt.title("Original Data (Height vs. Hand Span)")
plt.show()

# Apply PCA
pca = PCA(n_components=1) # Reduce to 1 dimension
pca.fit(np.c_[height, hand_span])

# Transform data
data_reduced = pca.transform(np.c_[height, hand_span])

# Print original and transformed feature matrices


print("Original feature matrix:\n", np.c_[height, hand_span])
print("\nTransformed feature matrix:\n", data_reduced)

# Visualize PCA results (scatter plot of transformed data)


plt.figure(figsize=(8, 6))
plt.scatter(data_reduced, np.zeros_like(data_reduced)) # Plot on x-axis only
plt.xlabel("Principal Component 1")
plt.title("Data after PCA (Reduced to 1 Dimension)")
plt.show()

This code creates the graphs as shown on your colab sheets, notice how two dims are
represented with a single dim:

Principal
Component 1

15.941
-4.421

-26.499

1.409

5.433

-12.921

0.04

5.236

3.05

-8.529

5.816

10.656

1.56

6.794

-0.526

1.953

-18.366

1.146

18.824

-25.034

-29.71

-31.458

9.63

43.782

21.261

21.631
6.209

17.079

5.55

10.411

-6.202

-15.35

-3.77

-5.409

7.251

-0.376

11.954

-5.458

26.686

-1.515

5.022

-14.309

-13.458

-10.543

-9.111

6.016

-12.685

0.043

7.713

3.833

1.446
4.777

0.242

10.507

7.612

1.476

24.062

13.453

3.068

-24.584

3.294

-8.045

15.556

-5.997

-16.793

-15.515

9.558

-24.092

-4.237

-6.168

-8.138

8.161

2.761

14.458

-1.952

27.764
-16.833

10.099

-8.181

-13.233

14.481

16.121

-21.269

0.446

11.67

-27.316

-0.505

2.379

-1.249

-21.444

14.621

0.333

-8.951

-11.61

5.022

17.895

-1.522

-6.7

16.204

-29.413
Original Data (Height vs. Hand Span)
Height Hand
Span

183.316 165.592

177.153 144.463

154.546 134.254

169.916 157.691

176.213 157.816

162.799 145.038

172.655 153.756

171.085 161.654

170.043 159.689

168.254 146.305

174.33 159.808

182.03 159.858

160.349 165.514

180.283 156.311

172.286 153.326

174.451 154.77

158.634 141.395

171.351 156.21

184.845 168.061

159.202 132.413

150.223 133.593

152.566 129.489

172.661 166.018
193.85 192.802

181.237 174.055

186.726 170.151

170.991 162.973

183.98 166.518

167.288 165.084

176.132 164.249

167.327 150.022

164.507 140.569

171.327 149.942

165.239 152.702

183.085 154.661

171.95 153.786

174.002 167.921

166.624 151.533

182.565 179.935

162.68 159.722

176.602 156.981

166.491 140.319

160.606 146.1

165.107 146.239

161.954 150.585

167.873 165.214

166.609 142.302

173.122 153.388
175.652 161.18

168.526 161.9

169.741 157.878

172.891 159.627

164.601 160.437

177.082 163.613

178.422 158.842

172.036 156.087

193.947 167.501

179.175 165.713

168.877 160.642

166.378 127.266

167.678 161.888

164.983 149.534

181.288 166.717

163.022 153.717

169.189 134.99

164.707 140.198

180.462 159.705

155.814 136.32

166.375 153.293

168.781 148.905

173.194 142.866

174.609 162.586

167.842 161.075
179.891 166.427

173.148 150.816

194.677 171.654

154.917 146.321

176.206 163.79

159.549 153.694

162.02 145.261

189.851 158.514

187.448 162.527

151.438 143.422

167.772 158.17

169.342 171.274

148.683 137.884

169.512 155.566

173.933 155.728

172.173 152.492

150.056 144.299

181.077 165.689

172.445 154.298

169.381 144.867

162.461 146.984

177.12 156.568

179.183 171.388

165.179 157.72

170.896 146.538
178.27 169.953

150.455 133.788

Bonus : give an example of matrix, some possible eigenvectors of the same and step by step
calculations of the eigenvectors

Example Matrix and Eigenvectors


Consider the following matrix:

A = [[2, 1], # First row


[1, 2]] # Second row

This matrix represents a linear transformation in two dimensions. Eigenvectors of this matrix will
tell us the directions (along which lines) this transformation stretches or shrinks vectors, and the
corresponding eigenvalues will tell us by how much it stretches or shrinks along those
directions.

Possible Eigenvectors:

There are two possible eigenvectors for this matrix:

1. v1 = [1, 1]: This eigenvector points in the direction of the line y = x.


2. v2 = [-1, 1]: This eigenvector points in the direction of the line y = -x.

Calculating Eigenvectors:

1. For v1:

○ Set up the equation (A - λI) * v1 = 0, where λ is the eigenvalue and I is


the identity matrix.

Substitute the values:


[[2-λ, 1], # First row of (A - λI)
[1, 2-λ]] * [1, 1] = [0, 0] # v1 and the zero vector


○ Solve the system of equations:
■ (2 - λ) * 1 + 1 * 1 = 0
■ 1 + (2 - λ) * 1 = 0
○ Solving, we get λ = 3.
2. For v2:

Repeat the process with v2:


[[2-λ, 1], # First row of (A - λI)
[1, 2-λ]] * [-1, 1] = [0, 0] # v2 and the zero vector


○ Solve the system of equations:
■ (2 - λ) * (-1) + 1 * 1 = 0
■ 1 + (2 - λ) * 1 = 0
○ Solving, we get λ = 1.

Therefore:

● This matrix has two distinct eigenvalues: λ1 = 3 and λ2 = 1.


● The corresponding eigenvectors are v1 = [1, 1] and v2 = [-1, 1].

How do i create a matrix given eigenvectors and eigenvalues? explain with


example

Key Idea

The relationship between a matrix, its eigenvalues, and eigenvectors is represented by the
following equation:

A*v=λ*v

where:

● A is the matrix
● v is an eigenvector
● λ is the corresponding eigenvalue

We can rearrange and rewrite this relationship as a matrix equation:

A = P * D * P^-1

Where:

● P is the matrix whose columns are the eigenvectors of A


● D is a diagonal matrix whose entries are the eigenvalues of A
● P^-1 is the inverse of P
Example

Let's say we have the following:

● Eigenvalues:
○ λ1 = 2
○ λ2 = 5
● Corresponding Eigenvectors:
○ v1 = [1, 1]
○ v2 = [2, -1]

Steps to form the matrix:

Create the eigenvector matrix (P): Place eigenvectors as columns of this matrix:

P = [[1, 2],
[1, -1]]

1.

Create the diagonal eigenvalue matrix (D): Place eigenvalues on the diagonal, and set other
entries to zero:

D = [[2, 0],
[0, 5]]

2.

Calculate the inverse of P (P^-1):

P^-1 = [1/3, 2/3]


[1/3, -1/3]]

3.

Construct the matrix (A): Use the formula A = P * D * P^-1

A = [[1, 2] [[2, 0] [1/3, 2/3]


[1, -1]] [0, 5]] [1/3, -1/3]]

A = [[2, -4/3],
[5/3, 1/3]]

4.

Result:
The constructed matrix A is:

[[2, -4/3],
[5/3, 1/3]]

The resulting matrices formed from eigenvectors and eigenvalues are not always unique.

Uniqueness Conditions:

1. Ordering of Eigenvectors: While the eigenvectors themselves capture independent


directions of transformation, the order in which they are arranged in the matrix P can be
different. This implies that the resulting matrix A might not be exactly the same, even
though it represents the same underlying transformation.

2. Scaling of Eigenvectors: Eigenvectors corresponding to the same eigenvalue can be


scaled by a non-zero constant without affecting the validity of the solution. This means
that if v is an eigenvector, then kv (where k is any non-zero constant) is also an
eigenvector with the same eigenvalue.

Consequently, scaling the eigenvectors in P can lead to different-looking matrices A, even


though they represent the same transformation.

Example:

Consider the following matrix:

A = [[1, 0],
[0, 1]]

This matrix represents the identity transformation (it leaves vectors unchanged). It has an
eigenvalue of 1 with two possible eigenvectors:

● v1 = [1, 0]
● v2 = [0, 1]

However, we can also use any non-zero scalar multiples of these eigenvectors, like:

● 2v1 = [2, 0]
● 3v2 = [0, 3]

If we construct the matrices using different combinations of eigenvectors (or their scalar
multiples), we'll end up with different-looking matrices even though they all represent the identity
transformation:
Using v1 and v2:

P = [[1, 0],
[0, 1]]
D = [[1, 0],
[0, 1]]
P^-1 = [[1, 0],
[0, 1]]
A = P * D * P^-1 = [[1, 0],
[0, 1]]

Using 2v1 and 3v2:

P = [[2, 0],
[0, 3]]
D = [[1, 0],
[0, 1]]
P^-1 = [[1/2, 0],
[0, 1/3]]
A = P * D * P^-1 = [[1, 0],
[0, 1]]

Both resulting matrices, A in both cases, are equal and represent the same identity
transformation, but they look different due to the different choices of eigenvectors (and their
scalings) used in the construction process.

While the core transformation captured by the eigenvalues and eigenvectors is unique, the
specific representation of the matrix using these components can have variations in ordering
and scaling.

Linear Discriminant Analysis


What is a Linear Discriminant?

A linear discriminant is a line (or a hyperplane in higher dimensions) that aims to separate data
points belonging to different classes. The main purpose of a linear discriminant is to make
classification decisions easier. For example, if a new data point falls on one side of the line, we
classify it as belonging to one class; if it falls on the other side, we classify it as belonging to the
other class.
How to Choose a Good Linear Discriminant

A good linear discriminant should maximize the separation between classes while minimizing
the spread within each class. Here's a general approach, frequently seen in techniques such as
Linear Discriminant Analysis (LDA):

1. Calculate Scatter:

○ Within-Class Scatter (Sw): Measures how spread out the data points are within
each class. We want to minimize this.
○ Between-Class Scatter (Sb): Measures the distance between the mean vectors
of each class. We want to maximize this.
2. Find Projection Direction: Find a direction (line) onto which we project the data that
maximizes the ratio of the between-class scatter to the within-class scatter (Sb / Sw).
This direction will be our linear discriminant.

Let's consider a sample dataset with two classes:

import numpy as np
import matplotlib.pyplot as plt

# Sample Data (Class 1)


class1_x = np.random.normal(loc=2, size=20)
class1_y = np.random.normal(loc=5, size=20)

# Sample Data (Class 2)


class2_x = np.random.normal(loc=6, size=20)
class2_y = np.random.normal(loc=3, size=20)

Visualization of Sample Data

plt.scatter(class1_x, class1_y, color='blue', label='Class 1')


plt.scatter(class2_x, class2_y, color='red', label='Class 2')
plt.legend()
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Original Data')
plt.show()

Finding the Linear Discriminant


We won't perform manual LDA calculations here, but the steps would involve calculating scatter
matrices, determining the ideal projection direction, and finding the discriminant. Let's assume
we've computed the following linear discriminant function:

y = -x + 8 # Sample Discriminant Function

Visualization with Discriminant

# Plot the discriminant


x_vals = np.array([0, 10])
y_vals = -x_vals + 8
plt.plot(x_vals, y_vals, color='black', linestyle='dashed', label='Discriminant')

# Visualize original data as before


plt.scatter(class1_x, class1_y, color='blue', label='Class 1')
plt.scatter(class2_x, class2_y, color='red', label='Class 2')

plt.legend()
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data with Linear Discriminant')
plt.show()

The dashed line in the visualization represents the linear discriminant. This line shows how the
data could be projected to optimally separate the two classes.

● In real-world situations, calculating a good linear discriminant is often done using


techniques like Linear Discriminant Analysis (LDA).
● For multi-class scenarios, multiple linear discriminants might be needed.

Python code for LDA on IRIS dataset - See Colab


https://colab.research.google.com/drive/10ebLgJkZaOJ-1HSXKC_ifWtChaYS2O8N?usp=sharin
g

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Apply LDA (reducing to 2 dimensions for visualization)


lda = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda.fit_transform(X, y)

# Project data using PCA for visualization comparison (optional)


pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the data with class labels


plt.figure(figsize=(8, 6))
markers = ['s', 'x', 'o'] # Markers for different classes
for i, target_name in enumerate(iris.target_names):
plt.scatter(X_lda[:, 0][y == i], X_lda[:, 1][y == i], marker=markers[i], label=target_name)
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.title('Iris Dataset with LDA (Reduced to 2 Dimensions)')
plt.legend()
plt.show()

# Optional: Plot with PCA for comparison


plt.figure(figsize=(8, 6))
for i, target_name in enumerate(iris.target_names):
plt.scatter(X_pca[:, 0][y == i], X_pca[:, 1][y == i], marker=markers[i], label=target_name)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Iris Dataset with PCA (Reduced to 2 Dimensions)')
plt.legend()
plt.show()

1. Import necessary libraries for data loading, dimensionality reduction, and plotting.
2. Load the Iris dataset using sklearn.datasets.
3. Define the features (X) and target labels (y).
4. Create an LDA instance (LinearDiscriminantAnalysis) with the desired number of
components (2 for visualization).
5. Use fit_transform on the LDA object to transform the data onto the LDA subspace
(X_lda).
6. We can also perform PCA using PCA for comparison.
7. Use matplotlib to create scatter plots of the transformed data points, colored based
on their class labels.
8. The first plot shows the data projected onto the two LDA components (LD1 and LD2).
9. The optional second plot shows the data projected onto the first two principal
components using PCA (PC1 and PC2).

● This is a basic example using two dimensions for visualization. In practice, the number of
components chosen for LDA would depend on the specific data and analysis goals.
● LDA assumes that the data follows a Gaussian distribution within each class. If this
assumption is not met, alternative methods like Support Vector Machines (SVM) might
be more suitable.

How does LDA reduce dimensionality?


LDA (Linear Discriminant Analysis) reduces dimensionality by finding a **linear projection that
maximizes the separation between different classes in the data, while minimizing the spread
within each class.

1. Scatter Matrices:

● Within-Class Scatter (Sw): This matrix captures the variance within each class. It
measures how "spread out" the data points belonging to the same class are. We want to
minimize this when choosing a good projection.
● Between-Class Scatter (Sb): This matrix captures the variance between the means
of different classes. It measures how "separated" the class means are in the original
space. We want to maximize this when choosing a good projection.

2. Finding the Optimal Projection:

LDA aims to find a linear transformation (projection) that projects the data onto a
lower-dimensional space while maximizing the ratio of the between-class scatter (Sb) to the
within-class scatter (Sw). This ratio is often referred to as the Fisher Ratio. Mathematically, this
can be represented as:

W = arg max (Sb / Sw)

where W is the matrix representing the projection direction.

3. Dimensionality Reduction:

Once the optimal projection direction is found, we can use it to project the data points onto this
new, lower-dimensional space. This reduces the number of features while still maintaining as
much information as possible relevant to class separation.
Visualization:

Imagine a two-dimensional dataset with two classes. The original data points might be scattered
in a way that makes it difficult to separate the classes using a simple decision boundary (e.g., a
line). LDA finds a direction (projection) in the original space that stretches the data along this
direction, effectively separating the classes in the projected space. This might allow for a simpler
classification decision boundary in the lower-dimensional space.

● LDA is primarily used for supervised learning tasks like classification, where class
labels are available.
● It assumes an underlying linear relationship between the features.
● The number of dimensions to which the data is reduced is typically the number of
classes minus one (except when the number of classes is high or when there is
redundancy in the data).

Comparison with PCA:

While both LDA and PCA are dimensionality reduction techniques, they have different
objectives:

● PCA: Aims to capture the maximum overall variance in the data, regardless of class
labels.
● LDA: Specifically targets maximizing class separation, focusing on the variance that
helps distinguish between classes.

Choosing between LDA and PCA depends on the specific problem. If class labels are available
and class separation is the primary concern, LDA can be a good choice. On the other hand, if
class labels are not available or if overall variance and data exploration are more important,
PCA might be a better option.

You might also like