PCA and Sparse PCA Principal Component Analysis
PCA and Sparse PCA Principal Component Analysis
Principal component analysis is a very useful tool for dimensionality reduction. Dimensionality reduction can be visualized
as extracting the essential information from a given data. PCA is used in a wide variety of fields, from computer vision to
neuro-biology. One of the advantages of dimensionality reduction is that it can reveal the hidden, simplified dynamics
underlying the data.
Principal component analysis can be viewed as a best-fit subspace problem. Lets say we have a data in d-dimensional
space, and want to find a subspace S of dimension k that is closest to the data in the minimum squared error sense, this
is indeed a PCA problem with k principal components. A more formal definition of the problem is given as follows:
Where,
u Rd, Subspace S Rd
(u) is the vector in subspace S that is closest to the vector u argmin || ||2 .
argmin () , dim() .
1. Dimension reduction:
Lets say there are 100,000 vectors in a 10000-dimensional vector space. So, n = 100000, d = 10000. This demands
a huge space requirement and this data will be hard to transfer as well. We can find a subspace S using PCA and
obtain a k-dimensional subspace of Rd. This subspace will have k orthonormal vectors as the basis. So, these k
vectors are each of dimension d and the 100000 data vectors can be represented with k components each, thus
requiring storing just the k 10000-dimensional vectors and the 100000 data vectors are stored using k
components instead of d components, which will improve the storage and ease transfer.
2. Denoising the signal
PCA has found a lot of applications in speech processing and other fields where noise is a common occurrence in
the signal. PCA can help us discard the noise by finding a subspace along the maximum variability of the data, thus
minimizing the effect of noise.
3. Applications in neuroscience:
A variant of PCA is used in neuroscience to identify the specific properties of a stimulus that increase a neurons
probability of generating an action potential. PCA is also used to find the identity of a neuron from the shape
of its action potential. PCA as a dimension reduction technique is suited to detect coordinated activities of
large neuronal ensembles.
While PCA finds the mathematically optimal method (as in minimizing the squared error), it is sensitive to outliers in the
data that produce large errors PCA tries to avoid. It therefore is widespread practice to remove outliers before computing
PCA. However, in some contexts, outliers can be difficult to identify. For example, in data mining algorithms like correlation
clustering, the assignment of points to clusters and outliers is not known beforehand. A recently proposed generalization
of PCA based on a weighted PCA increases robustness by assigning different weights to data objects based on their
estimated relevancy. [Wikipedia]
Sparse Principal Component Analysis
A disadvantage of PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA
overcomes this disadvantage by finding linear combinations that contain just a few input variables.
A sparse signal is one which has most of its components equal to or very close to 0 and a few components with high value.
A sparse vector can be considered as one with all but a few zeros in its components. A vector x* is k-sparse
|( )| ; . We know that the L0 norm of a vector is the number of non-zero components of the vector.
Therefore, sparsity can formally be represented using the ||||0 ( |{| 0}|).
In order to incorporate sparsity into the solution, we have to add a constraint which represents the sparsity of the solution.
That constraint is the L0 norm. But it is clear that the L0 norm is not convex and hence, convex optimization techniques
cant be applied to arrive a solution to optimization problems involving sparsity as a constraint. So, we employ convex
relaxations to take the sparsity into account.
It has been observed that L1 norm is the best relaxation for L0 norm, because the L1 norm grows the most along the
coordinate axes.
Keeping the above analysis in consideration, sparse PCA problem can be formalized as:
Given a matrix Anxd, which is the collection of data vectors, we want to write A as X+S = A, where rank(x) <= k (PCA problem)
AND ||s||0 <= s (sparsity condition).
Where ||x||F is the Frobenius norm, which is given by |||| = [=1 =1 2 ]1/2 and is used widely in low-rank
approximation problems.
The above definition is NP-hard. The objective function is convex, but the constraints are not convex. The constraints can
be relaxed to:
Rank(X)<=k ||x||* <= and ||S||1 <= where , are determined from k and s, and ||x||* is the nuclear norm, which
can be used to approximate rank of matrix ( ||()||1 , = is the singular value decomposition of X.
Applications: