0% found this document useful (0 votes)
15 views4 pages

Module 5.2 Principal Component Analysis - V1

The document outlines the steps for performing Principal Component Analysis (PCA) for data dimensionality reduction, including data normalization, covariance matrix calculation, and eigenvalue extraction. It highlights the advantages of PCA, such as reducing computational complexity and improving machine learning performance, while also noting its disadvantages, including interpretability challenges. The analysis demonstrates that using the first two eigenvectors can retain approximately 80% of the data variance, effectively reducing the dimensionality from 4 to 2.

Uploaded by

stutiii24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

Module 5.2 Principal Component Analysis - V1

The document outlines the steps for performing Principal Component Analysis (PCA) for data dimensionality reduction, including data normalization, covariance matrix calculation, and eigenvalue extraction. It highlights the advantages of PCA, such as reducing computational complexity and improving machine learning performance, while also noting its disadvantages, including interpretability challenges. The analysis demonstrates that using the first two eigenvectors can retain approximately 80% of the data variance, effectively reducing the dimensionality from 4 to 2.

Uploaded by

stutiii24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Principal Component Analysis (PCA) for data dimensionality reduction

Steps to perform PCA:

I. Read data: Number of predictors =4; number of data instances =5

Sepallength=F1 Sepalwidth=F2 Petallength=F3 Petalwidth=F4


1 5 3 1
4 2 6 3
1 4 3 2
4 4 1 1
5 5 2 3

Data has 4 predictors and number of data instances in 5(small number considered for demonstration
purpose). To represent each data instance 4 dimensional space is required. Is it possible to present
the same data in lesser dimensional space without loss of information?(lesser than 4)

Advantage of smaller data dimension is reduction in computational complexity and memory


requirement. Mny a times PCA helps to speed up learning in Machine Learning algorithms

2. Normalize the data by subtracting the mean of each of the data dimensions and dividing by
standard deviation of each dimension (x-xmean )\ xstd dev

Normalized data
3. Find co-variance matrix of normalized data matrix:

4 Get the eigen values and eigen vectors of the co-variance matrix:

Eigen values are: λ1 = 2.11691, λ2= 0.855413 , λ3=0.481689 λ4=, 0.334007

Corresponding eigen vectors are:

Explained variance in data by first eigen vector= (2.11691\[2.11691+0.85541+0.48168+0.33400]


)*100=55.88%
Explained variance in data by second eigen vector=(0.85541\[2.11691+0.85541+0.48168+0.33400]
)*100=22.58%

Explained variance in data by third eigen vector ==(0.48168\[2.11691+0.85541+0.48168+0.33400]


)*100=12.71%

Explained variance in data by fourth eigen vector ==(0.334\[2.11691+0.85541+0.48168+0.33400]


)*100=8.83%

5 Determine the number of eigen vectors to be retained based on explained variance and
transform data in terms of retained eigen vectors which results in dimensionality reduction.

If only first two eigen vectors are used, around 80% variance in the data is retained and we get data
dimensionality reduction from 4 to 2(50% lower dimension). In practice, number of dimensions
retained contain 95% variance in data.

To train a machine learning algorithm, the normalized train and test data is transformed with 2
eigen vectors(considered here) as:

The transformed data features Ф1 and Ф2 do not have any units


Applications of PCA Analysis
 PCA in machine learning is used to visualize multidimensional data.
 PCA helps to compress data.
 PCA can be used to analyze patterns in data when you are dealing with high-
dimensional data sets.

Advantages of Principal Component Analysis


 Easy to calculate and compute.
 Speeds up machine learning computing processes and algorithms.
 Prevents predictive algorithms from data overfitting issues
 Increases performance of ML algorithms by eliminating unnecessary correlated variables
 Helps reduce noise that cannot be eliminated otherwise

Disadvantages of Principal Component Analysis


 Sometimes, PCA is difficult to interpret. In rare cases, you may feel difficult to identify the
most important features even after computing the principal components.
 It is difficult to calculate the covariances and covariance matrices.
 Sometimes, the computed principal components can be more difficult to read rather than the
original data.

You might also like