Pca PDF
Pca PDF
Rohan Bansal
1 Introduction
Principal Component Analysis is a tool used to reduce the dimensions of a set
of variables, still retrieving majority of the information. Working with a high
dimensional dataset often leads to difficulties. Low dimensional data set is eas-
ier to analyze, view and storage is less expensive. A given set of variables might
be correlated, which causes certain redundancies. PCA deletes these redundan-
cies by transforming the original variables to an independent set of variables.
For example, working with a colour image by transforming the red green blue
colours into combinations of white and black. In simpler terms, it’s always eas-
ier to work in 2D than 3D.
In PCA, we try to reflect a d-dimensional set of variables, say xRD onto a or-
thogonal set of k-dimensional vectors, say uRk such that u=[b1 , b2 ....bk ]. Then
we can write the projection of vector x in terms of vectors b1 , b2 ....bk . Let us
donate projection of x by πu (x).
k
X
πu (x) = λi bi
i=1
This is the basic intuition of PCA. Our next job is to find the k-dimensional
subspace maximizing the (uncentered) variance of this d-dimensional set of vari-
ables inside the sub-space[1]. So our optimisation problem can be written as
[1]:-
maxB T xxT B (2)
subject to
BT B = 1
1
2 PCA Algorithms
2.1 Singular Value Decomposition
In principal component analysis we find the directions in the data with the most
variation, i.e. the eigenvectors corresponding to the largest eigenvalues of the
covariance matrix, and project the data onto these directions. Suppose U is the
matrix consisting of the eigenvectors with the largest eigenvalues of X, then the
PCA transformation can be given by Y=U T X.
3 Implementation
Here, we implement PCA using SVD on a random dataset.It is done using
sklearn library on a random dataset created using ’random’ library.
2
x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6
754 787 742 759 785 722 290 243 294 245 265 254
492 501 495 489 494 483 502 515 474 507 452 468
57 63 54 53 71 56 442 459 460 444 477 483
769 744 753 733 766 782 738 756 782 795 772 750
863 894 886 884 894 876 557 572 567 537 602 546
Dataset
3
Figure 2: variance comparison
4 Conclusion
4.1 PCA in noisy settings
Till now, we have considered stochastic settings for implementation of our al-
gorithms. Here, we move on to non-stochastic settings ,i.e, we deal with noisy
gradients and missing data[6]. Oja’s method works well in case of bounded noise.
It cannot be implemented if the noise is unbounded because the optimization
objective of maximization can never be achieved.
4
4.2 Kernel PCA
Kernel methods represent an important class of machine learning algorithms
that simultaneously enjoy strong theoretical guarantees as well as empirical per-
formance[7]. Standard PCA only allows linear dimensionality reduction. How-
ever, if the data has more complicated structures which cannot be well repre-
sented in a linear subspace, standard PCA will not be very helpful. Fortunately,
kernel PCA allows us to generalize standard PCA to nonlinear dimensionality
reduction[8].
References
[1] Raman Arora, Andrew Cotter, Karen Livescu and Nathan Srebro, ”Stochas-
tic optimization for PCA and PLS,” in Allerton, 2012.
[2] Raman Arora, Andrew Cotter and Nathan Srebro, ”Stochastic Optimization
of PCA with Capped MSG,” Advances in Neural Information Processing
Systems 26 (NIPS 2013).
[3] Mianjy, Poorya and Raman Arora. “Stochastic PCA with l2 and l1 Regu-
larization.” ICML (2018).
[4] Arora, Raman Mianjy, Poorya Marinov, Teodor. (2016). Stochastic opti-
mization for multiview representation learning using partial least squares.
1786-1794.
5
[5] Shamir, Ohad. “Convergence of Stochastic Gradient Descent for PCA.”
ICML (2016).
[6] Marinov, Teodor Vanislavov, Poorya Mianjy and Raman Arora. “Streaming
Principal Component Analysis in Noisy Settings.” ICML (2018).