Need of Principal Component Analysis
Need of Principal Component Analysis
1. Dimensionality Reduction
Problem: High-dimensional datasets may suffer from the curse of dimensionality, leading to
increased computational complexity, overfitting, and difficulty in visualization.
Solution: PCA reduces the number of dimensions while retaining as much variability as possible,
helping to simplify the dataset without losing critical information.
2. Visualization
Problem: Visualizing high-dimensional data is challenging, and it is often beneficial to reduce the
data to two or three dimensions for easier interpretation.
Solution: PCA projects data onto a lower-dimensional space, allowing for visualization in 2D or
3D while preserving the major trends and patterns in the data.
3. Noise Reduction
Problem: Datasets may contain noise or irrelevant features that can obscure meaningful
patterns.
Solution: By focusing on the principal components associated with the highest variance, PCA
helps to filter out noise and highlight the most significant features.
Principal Component Analysis
• Principal Component Analysis (PCA) is a dimensionality reduction
technique widely used in machine learning and statistics.
• Its primary goal is to transform a dataset into a new coordinate system
in such a way that the greatest variance lies along the first axis, the
second greatest variance along the second axis, and so on.
• This helps to reduce the dimensionality of the data while retaining as
much of its variability as possible.
• Principal component analysis (PCA) is a statistical procedure that uses
an orthogonal transformation to convert a set of observations of
possibly correlated variables into a set of values of linearly
uncorrelated variables called principal components.
Covariance Matrix
• The covariance matrix is crucial in PCA for dimensionality reduction and
feature extraction.
• PCA aims to transform data into a new set of uncorrelated variables, called
principal components, ordered by their variance.
• The covariance matrix captures the relationships between original features,
guiding PCA to find the directions of maximum variance.
• Consider a dataset with features like height and weight; the covariance matrix
reveals how changes in one variable relate to changes in the other. High
covariance suggests a strong relationship, while low covariance indicates
independence.
• PCA leverages this information to identify principal components, allowing for
dimensionality reduction while retaining key information, as exemplified in
transforming height and weight data into uncorrelated principal components.
Eigen Vectors and Eigen values
• In PCA, the covariance matrix represents the relationships and variability among
features.
• For example, consider a covariance matrix representing the heights and weights
of individuals; the eigenvectors would show the directions in this two-
dimensional space with the most variance, like a principal axis of height and
weight. The corresponding eigenvalues would reveal how much variance is
captured along each principal axis, guiding the selection of the most informative
Principal Components
Principal components in PCA are the new orthogonal variables formed
as linear combinations of the original features.
They capture the maximum variance in the data, with the first principal
component explaining the most variance, followed by subsequent
components. The use of principal components lies in dimensionality
reduction, allowing representation of the dataset in a lower-
dimensional space while retaining essential information.
Step-by-step explanation of the PCA
1.Suppose we consider a dataset having n features denoted by X1,X2,X3----
Xn.
Standardize the Data by calculating the mean
2.Computes the covariance matrix.
U= u1
u2
5. 0 =(S-ℷI)U
0
Find out value of Eigen vector U1 and U2.
After that, we project the data along the eigen vectors. If the original data
has a dimensionality of n, we can reduce dimensions to k, such that k≤ n.