PCA and LDA Assignment
PCA and LDA Assignment
EcE-52005
Assignment I
Feature extraction step is one of the most important steps in pattern recognition system.
Feature extraction is a process in machine learning and data analysis that involves identifying
and extracting relevant features from raw data. These features are later used for further analysis,
such as classification, recognition, or detection tasks. It aims to reduce the dimensionality of the
data and helps to improve the performance and efficiency of the model.
PCA was invented in 1901 by Karl Pearson as an analogue of the principle axis theorem
in mechanics. It was later independently developed and named by Harold hotelling in 1930. The
main goal of the PCA analysis is to identify patterns in data. It is basically used to reduce the
dimension of data set. PCA aims to detect the correlation between variables.
Mathematical Foundation
Variance and Covariance
Variance measures how much the data points deviate from the mean.
Covariance measures how two variables change together. Positive covariance indicates
that the variables increase together, while negative covariance indicates an inverse
relationship.
Steps in PCA
(1) Standardized the data
Standardizing the data ensures that each feature contributes equally to the analysis. This involves
centering the data which is subtracting the mean and scaling to unit variance.
The eigenvalues and eigenvectors of the convariance matrix are computed. These provide the
principle components and their corresponding variance.
Principle components are selected based on the eigenvalues, with the top k components capturing
the most variance.
The original data is projected onto the selected principle components, reducing the
dimensionality while preserving most of the variance.
Application of PCA
Data compression: Reducing the number of features while retaining most of the
information.
Visualization: Plotting high dimensional data in 2D or 3D.
Noise reduction: Removing less significant components (with smaller eigenvalues)
Feature extraction: Creating new features that summarize the original dataset.
Conclusion
Principle component analysis is a valuable tool for simplifying complex datasets,
uncovering hidden patterns and enhancing data visualization. By transforming the data into a set
of uncorrelated principle components, PCA facilitates more efficient data analysis and
interpretation.
Steps in LDA
(1) Compute the mean vectors
Compute the mean vector for each class and overall mean vector for the entire dataset.
Calculate the within class scatter matrix and between class scatter matrix.
Solve the eigenvalue problems and eigenvectors corresponding to the largest eigenvalues from
the linear discriminants.
Project the original data onto the new lower dimensional space defined by the selected linear
discriminants.
Application of LDA
Classification: Distinguishing between different classes in supervised learning.
Dimensionality Reduction: Reducing the number of features while retaining class-
discriminatory information.
Pattern Recognition: Recognizing patterns and making decisions based on the linear
combination of features.
Conclusion
Linear Discriminant Analysis is a valuable tool for both classification and dimensionality
reduction. By focusing on maximizing the separation between classes, LDA provides a robust
method for analyzing and visualizing high-dimensional data. Its effectiveness is grounded in its
ability to project data onto a space where class distinctions are clearer, facilitating more accurate
and insightful analyses.