0% found this document useful (0 votes)
36 views9 pages

Calculation of PCA - Data Science Stack Exchange

data analytics topic

Uploaded by

bhikharilal0711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

Calculation of PCA - Data Science Stack Exchange

data analytics topic

Uploaded by

bhikharilal0711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

Calculation of PCA
Asked 3 years, 7 months ago Modified 2 years ago Viewed 3k times

Consider the following data set:

Now, we need to calculate the principal component analysis for this data. Here are the eigenvalues and eigenvectors calculated for the covariance
matrix of this data:

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 1/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

So the principal component is:

Now, when I have tried to do so by hand, I have found that the eigenvalues are 1.28 and 0.0492 (which are identical to the above solution). Surely, the
principal component corresponds to the eigenvalue = 1.28. However, when I tried to solve for the eigenvector, the solution was [[0] [0]] as the
augmented matrix of BX=0 Was [[1 0 0] [0 1 0]]. So where is the problem here? And also how can I find the transformed data after I calculate the
principle component?

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 2/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

Edit: the covariance matrix is

pca

Share Improve this question Follow edited Oct 20, 2022 at 17:32 asked Feb 27, 2021 at 0:30
Ethan John adams
1,647 9 24 39 231 2 9

I am not exactly sure how to help out without trying a few different thing because I'm not exactly sure how you have defined Covariance matrix and principal component.
There are a lot of non-standard, but still often used, definitions for both of these objects. If you could clarify, I might be able to help you out. For example, you ask about
the transformed data, but also refer to the calculated principal component, which as far as I know it, is the transformed data. – ARandomName Feb 27, 2021 at 6:15

I have added the definition of the covariance matrix in the question. For the transformed data, after finding the principal components we need to reduce the
dimensionality of the data using the calculated principal components. This is the transformed data. – John adams Feb 27, 2021 at 16:12

to calculate the principal components (i.e. dimension reduced data), you can decompose the original data matrix as D = [X, Y ] = U ΣV ∗ via an SVD. You can
reconstruct the data matrix D and capture some percent of the variance in the data by taking only r first columns in U , r first rows in V ∗ and r first singular values in Σ
(a diagonal matrix). Then the principal components, or as you call the transformed data, is given by U ∗ D = ΣV ∗ . Either lhs or rhs suffice. You can also compute the PC
by taking eigendecomposition of DT D . – ARandomName Feb 27, 2021 at 19:14

1 Answer Sorted by: Highest score (default)

Before you start anything, it can be helpful to conduct some exploratory data analysis (EDA) so that you can get a general sense of the data you are
working with and what you are going to be doing.
1
Here is a plot of the data:

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 3/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

It looks like there is a strong linear relationship in this dataset, so this is a strong candidate for where using PCA will likely be able to capture a large
portion of the variance using only a single feature.

Method 1: By hand (sort of)


Step 1

Start by converting your dataset into a matrix.

X Y
2.5 2.4
⎡ ⎤
1 2.5 2.4
⎢ .5 .7 ⎥
2 .5 .7 ⎢ ⎥
⎢ 2.2 2.9 ⎥
⎢ ⎥
3 2.2 2.9 ⎢ ⎥
⎢ 1.9 2.2 ⎥
4 1.9 2.2 ⎢ ⎥
⎢ ⎥
⎢ 3.1 3.0 ⎥
5 3.1 3.0 A = ⎢ ⎥
⎢ 2.3 2.7 ⎥
⎢ ⎥
6 2.3 2.7 ⎢ ⎥
⎢ 2.0 1.6 ⎥
7 2.0 1.6 ⎢ ⎥
⎢ ⎥
⎢ 1.0 1.1 ⎥
8 1.0 1.1 ⎢ ⎥
⎢ 1.5 1.6 ⎥
9 1.5 1.6 ⎣ ⎦
1.1 .9
10 1.1 .9

Step 2

Calculate the Covariance matrix.

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 4/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

Recall, that this is specified by:

V ar(x) C on(x, y)
[ ]
C ov(x, y) V ar(y)

In order to calculate these values, we need to get the means X


¯
and Y
¯

¯ 2.5 + .5 + 2.2 + 1.9 + 3.1 + 2.3 + 2.0 + 1.0 + 1.5 + 1.1


X = = 1.81
10

¯ 2.4 + .7 + 2.9 + 2.2 + 3.0 + 2.7 + 1.6 + 1.1 + 1.6 + .9


Y = = 1.91
10

We calculate the values for V ar(X), V ar(Y ), and C ov(X, Y ) below by plugging in the values to the formulas:
n 2
∑ (xi −x̄) 5.549
i=1
V ar(X) = = = 0.616555555556
n−1 9

n 2
∑ (yi −ȳ ) 6.449
i=1
V ar(Y ) = = = 0.716555555556
n−1 9

n
∑ (xi −x̄)(yi −ȳ ) 5.539
i=1
C ov(X, Y ) = = = .61544444444
n−1 9

Putting these results into the matrix, we get the Covariance matrix:

0.61655556 0.61544444
[ ]
0.61544444 0.71655556

Step 3

Find the Eigen Values by taking the Eigen decomposition of the Covariance matrix.

Using the formula Av = λv , we can rewrite it as (A − λI )v = 0 and note that this equation will have a solution at det(A − λI ) = 0

∣ 0.61655556 0.61544444 1 0 ∣
det = ∣ [ ] − λ[ ]∣ = 0
∣ 0.61544444 0.71655556 0 1 ∣

∣ 0.61655556 0.61544444 λ 0 ∣
det = ∣ [ ] − [ ]∣ = 0
∣ 0.61544444 0.71655556 0 λ ∣

∣ 0.61655556 − λ 0.61544444 ∣
det = ∣ [ ]∣ = 0
∣ 0.61544444 0.71655556 − λ ∣

(0.61655556 − λ)(0.71655556 − λ) − (0.61544444)(0.61544444) =

2
(λ − 1.33311112λ + .441796314567 − .378771858727) =

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 5/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

2
(λ − 1.33311112λ + .06302445584) =

Eigen Values: λ = .0490834 or λ = 1.2840277

Step 4 (Here is where we cannot really proceed solely by hand)

Find the Eigen Vectors.

For the first Eigen Vector v1 for λ = .0490834

Using the formula Av1 = λ1 v1

0.61655556 − λ 0.61544444 v1,1


[ ][ ] = 0
0.61544444 0.71655556 − λ v1,2

0.61655556 − .0490834 0.61544444 v1,1


[ ][ ] = 0
0.61544444 0.71655556 − .0490834 v1,2

.56747216 0.61544444 v1,1


[ ][ ] = 0
0.61544444 .66747216 v1,2

Note: This is not an easy equation to solve by hand! I would recommend using MATLAB here.

For the second Eigen Vector v2 for λ = 1.2840277

Using the formula Av2 = λ1 v2

0.61655556 − λ 0.61544444 v2,1


[ ][ ] = 0
0.61544444 0.71655556 − λ v2,2

0.61655556 − 1.2840277 0.61544444 v2,1


[ ][ ] = 0
0.61544444 0.71655556 − 1.2840277 v2,2

−.667477214 0.61544444 v2,1


[ ][ ] = 0
0.61544444 −.61655554 v2,2

Note: This is not an easy equation to solve by hand! I would recommend using MATLAB here.

Since it is not an easy way to solve for either of these Eigen vectors by hand, I recommend using MATLAB.

MATLAB code:

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 6/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

A = [0.61655556 0.61544444; 0.61544444 0.71655556]


A

[v,d]=eig(A)

Output:

-0.735178655741955 0.677873398313764
0.677873398313764 0.735178655741955

Method 2 (using NumPy)


We can verify the results that we get above by using NumPy.

import numpy as np

# create a numpy array that stores the data matrix


matrix = np.array([[2.5, 2.4], [.5, .7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0],
[2.3, 2.7],[2.0, 1.6],[1.0, 1.1],[1.5, 1.6], [1.1, .9]])

# calculate the covariance matrix


covariance_matrix = np.cov(matrix[:,0], matrix[:,1])

# create variables to store the Eigen values and Eigen vectors


eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)

Output:

Eigen Values:

array([0.0490834 , 1.28402771])

Eigen Vectors:

array([[-0.73517866, -0.6778734 ],
[ 0.6778734 , -0.73517866]])

Method 3 (using scikit-learn)


Using the same matrix variable defined above in the NumPy example:

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 7/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

from sklearn.decomposition import PCA


import pandas as pd

# create and fit a PCA model


pca = PCA(n_components=2)
pca.fit(matrix)

# show the Eigen vector


pca.components_

# show the Eigen values


pca.explained_variance_

Output:

Eigen Vectors:

[[-0.6778734 -0.73517866]
[-0.73517866 0.6778734 ]]

Eigen Values:

[1.28402771 0.0490834 ]

Finally, we can show plot vectors that show the direction and magnitudes of the principal axes. As you can see, the first component is longer because it
explains more of the variance. We can also project the data onto the first principal component using only one feature.

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 8/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange

Conclusion: In short, while you can calculate a good portion of this problem by hand, once you get to the part about calculating Eigen Vectors, this
becomes fairly difficult. At this point, I would recommend using a computational method, such as using MATLAB, NumPy, or scikit-learn.

Share Improve this answer Follow edited Oct 18, 2022 at 4:50 answered Oct 18, 2022 at 3:42
Ethan
1,647 9 24 39

https://datascience.stackexchange.com/questions/90007/calculation-of-pca 9/9

You might also like