Calculation of PCA - Data Science Stack Exchange
Calculation of PCA - Data Science Stack Exchange
Calculation of PCA
Asked 3 years, 7 months ago Modified 2 years ago Viewed 3k times
Now, we need to calculate the principal component analysis for this data. Here are the eigenvalues and eigenvectors calculated for the covariance
matrix of this data:
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 1/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
Now, when I have tried to do so by hand, I have found that the eigenvalues are 1.28 and 0.0492 (which are identical to the above solution). Surely, the
principal component corresponds to the eigenvalue = 1.28. However, when I tried to solve for the eigenvector, the solution was [[0] [0]] as the
augmented matrix of BX=0 Was [[1 0 0] [0 1 0]]. So where is the problem here? And also how can I find the transformed data after I calculate the
principle component?
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 2/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
pca
Share Improve this question Follow edited Oct 20, 2022 at 17:32 asked Feb 27, 2021 at 0:30
Ethan John adams
1,647 9 24 39 231 2 9
I am not exactly sure how to help out without trying a few different thing because I'm not exactly sure how you have defined Covariance matrix and principal component.
There are a lot of non-standard, but still often used, definitions for both of these objects. If you could clarify, I might be able to help you out. For example, you ask about
the transformed data, but also refer to the calculated principal component, which as far as I know it, is the transformed data. – ARandomName Feb 27, 2021 at 6:15
I have added the definition of the covariance matrix in the question. For the transformed data, after finding the principal components we need to reduce the
dimensionality of the data using the calculated principal components. This is the transformed data. – John adams Feb 27, 2021 at 16:12
to calculate the principal components (i.e. dimension reduced data), you can decompose the original data matrix as D = [X, Y ] = U ΣV ∗ via an SVD. You can
reconstruct the data matrix D and capture some percent of the variance in the data by taking only r first columns in U , r first rows in V ∗ and r first singular values in Σ
(a diagonal matrix). Then the principal components, or as you call the transformed data, is given by U ∗ D = ΣV ∗ . Either lhs or rhs suffice. You can also compute the PC
by taking eigendecomposition of DT D . – ARandomName Feb 27, 2021 at 19:14
Before you start anything, it can be helpful to conduct some exploratory data analysis (EDA) so that you can get a general sense of the data you are
working with and what you are going to be doing.
1
Here is a plot of the data:
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 3/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
It looks like there is a strong linear relationship in this dataset, so this is a strong candidate for where using PCA will likely be able to capture a large
portion of the variance using only a single feature.
X Y
2.5 2.4
⎡ ⎤
1 2.5 2.4
⎢ .5 .7 ⎥
2 .5 .7 ⎢ ⎥
⎢ 2.2 2.9 ⎥
⎢ ⎥
3 2.2 2.9 ⎢ ⎥
⎢ 1.9 2.2 ⎥
4 1.9 2.2 ⎢ ⎥
⎢ ⎥
⎢ 3.1 3.0 ⎥
5 3.1 3.0 A = ⎢ ⎥
⎢ 2.3 2.7 ⎥
⎢ ⎥
6 2.3 2.7 ⎢ ⎥
⎢ 2.0 1.6 ⎥
7 2.0 1.6 ⎢ ⎥
⎢ ⎥
⎢ 1.0 1.1 ⎥
8 1.0 1.1 ⎢ ⎥
⎢ 1.5 1.6 ⎥
9 1.5 1.6 ⎣ ⎦
1.1 .9
10 1.1 .9
Step 2
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 4/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
V ar(x) C on(x, y)
[ ]
C ov(x, y) V ar(y)
We calculate the values for V ar(X), V ar(Y ), and C ov(X, Y ) below by plugging in the values to the formulas:
n 2
∑ (xi −x̄) 5.549
i=1
V ar(X) = = = 0.616555555556
n−1 9
n 2
∑ (yi −ȳ ) 6.449
i=1
V ar(Y ) = = = 0.716555555556
n−1 9
n
∑ (xi −x̄)(yi −ȳ ) 5.539
i=1
C ov(X, Y ) = = = .61544444444
n−1 9
Putting these results into the matrix, we get the Covariance matrix:
0.61655556 0.61544444
[ ]
0.61544444 0.71655556
Step 3
Find the Eigen Values by taking the Eigen decomposition of the Covariance matrix.
Using the formula Av = λv , we can rewrite it as (A − λI )v = 0 and note that this equation will have a solution at det(A − λI ) = 0
∣ 0.61655556 0.61544444 1 0 ∣
det = ∣ [ ] − λ[ ]∣ = 0
∣ 0.61544444 0.71655556 0 1 ∣
∣ 0.61655556 0.61544444 λ 0 ∣
det = ∣ [ ] − [ ]∣ = 0
∣ 0.61544444 0.71655556 0 λ ∣
∣ 0.61655556 − λ 0.61544444 ∣
det = ∣ [ ]∣ = 0
∣ 0.61544444 0.71655556 − λ ∣
2
(λ − 1.33311112λ + .441796314567 − .378771858727) =
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 5/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
2
(λ − 1.33311112λ + .06302445584) =
Note: This is not an easy equation to solve by hand! I would recommend using MATLAB here.
Note: This is not an easy equation to solve by hand! I would recommend using MATLAB here.
Since it is not an easy way to solve for either of these Eigen vectors by hand, I recommend using MATLAB.
MATLAB code:
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 6/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
[v,d]=eig(A)
Output:
-0.735178655741955 0.677873398313764
0.677873398313764 0.735178655741955
import numpy as np
Output:
Eigen Values:
array([0.0490834 , 1.28402771])
Eigen Vectors:
array([[-0.73517866, -0.6778734 ],
[ 0.6778734 , -0.73517866]])
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 7/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
Output:
Eigen Vectors:
[[-0.6778734 -0.73517866]
[-0.73517866 0.6778734 ]]
Eigen Values:
[1.28402771 0.0490834 ]
Finally, we can show plot vectors that show the direction and magnitudes of the principal axes. As you can see, the first component is longer because it
explains more of the variance. We can also project the data onto the first principal component using only one feature.
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 8/9
10/22/24, 11:41 PM Calculation of PCA - Data Science Stack Exchange
Conclusion: In short, while you can calculate a good portion of this problem by hand, once you get to the part about calculating Eigen Vectors, this
becomes fairly difficult. At this point, I would recommend using a computational method, such as using MATLAB, NumPy, or scikit-learn.
Share Improve this answer Follow edited Oct 18, 2022 at 4:50 answered Oct 18, 2022 at 3:42
Ethan
1,647 9 24 39
https://datascience.stackexchange.com/questions/90007/calculation-of-pca 9/9