0% found this document useful (0 votes)
121 views2 pages

PCA and Sparse PCA Principal Component Analysis

PCA finds a lower dimensional subspace that best represents the data by minimizing squared error between the projections of the data onto the subspace and the original data points. Sparse PCA extends PCA by imposing sparsity constraints to find components that are linear combinations of only a few input variables, unlike standard PCA which uses all variables. This allows for more interpretable components. Applications include financial data analysis where sparse PCA identifies principal components based on only a few important assets, and biology to identify genes most relevant to specific phenotypes.

Uploaded by

RAVI TEJ AMBATI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views2 pages

PCA and Sparse PCA Principal Component Analysis

PCA finds a lower dimensional subspace that best represents the data by minimizing squared error between the projections of the data onto the subspace and the original data points. Sparse PCA extends PCA by imposing sparsity constraints to find components that are linear combinations of only a few input variables, unlike standard PCA which uses all variables. This allows for more interpretable components. Applications include financial data analysis where sparse PCA identifies principal components based on only a few important assets, and biology to identify genes most relevant to specific phenotypes.

Uploaded by

RAVI TEJ AMBATI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

PCA and sparse PCA

Principal Component Analysis

Principal component analysis is a very useful tool for dimensionality reduction. Dimensionality reduction can be visualized
as extracting the essential information from a given data. PCA is used in a wide variety of fields, from computer vision to
neuro-biology. One of the advantages of dimensionality reduction is that it can reveal the hidden, simplified dynamics
underlying the data.

Principal component analysis can be viewed as a best-fit subspace problem. Lets say we have a data in d-dimensional
space, and want to find a subspace S of dimension k that is closest to the data in the minimum squared error sense, this
is indeed a PCA problem with k principal components. A more formal definition of the problem is given as follows:

Given a dataset a1, a2, ..., an Rd.


1/2
Define () = =1 || ( )||2 , where the norm used is L2 norm, defined as ||||2 = (=1 2 ) ,
.

Where,

u Rd, Subspace S Rd

(u) is the vector in subspace S that is closest to the vector u argmin || ||2 .

Then, the PCA problem can be formally defined as:

argmin () , dim() .

Lets look at some applications of principal component analysis:

1. Dimension reduction:
Lets say there are 100,000 vectors in a 10000-dimensional vector space. So, n = 100000, d = 10000. This demands
a huge space requirement and this data will be hard to transfer as well. We can find a subspace S using PCA and
obtain a k-dimensional subspace of Rd. This subspace will have k orthonormal vectors as the basis. So, these k
vectors are each of dimension d and the 100000 data vectors can be represented with k components each, thus
requiring storing just the k 10000-dimensional vectors and the 100000 data vectors are stored using k
components instead of d components, which will improve the storage and ease transfer.
2. Denoising the signal
PCA has found a lot of applications in speech processing and other fields where noise is a common occurrence in
the signal. PCA can help us discard the noise by finding a subspace along the maximum variability of the data, thus
minimizing the effect of noise.
3. Applications in neuroscience:
A variant of PCA is used in neuroscience to identify the specific properties of a stimulus that increase a neurons
probability of generating an action potential. PCA is also used to find the identity of a neuron from the shape
of its action potential. PCA as a dimension reduction technique is suited to detect coordinated activities of
large neuronal ensembles.

While PCA finds the mathematically optimal method (as in minimizing the squared error), it is sensitive to outliers in the
data that produce large errors PCA tries to avoid. It therefore is widespread practice to remove outliers before computing
PCA. However, in some contexts, outliers can be difficult to identify. For example, in data mining algorithms like correlation
clustering, the assignment of points to clusters and outliers is not known beforehand. A recently proposed generalization
of PCA based on a weighted PCA increases robustness by assigning different weights to data objects based on their
estimated relevancy. [Wikipedia]
Sparse Principal Component Analysis

A disadvantage of PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA
overcomes this disadvantage by finding linear combinations that contain just a few input variables.

What is a sparse signal?

A sparse signal is one which has most of its components equal to or very close to 0 and a few components with high value.
A sparse vector can be considered as one with all but a few zeros in its components. A vector x* is k-sparse
|( )| ; . We know that the L0 norm of a vector is the number of non-zero components of the vector.
Therefore, sparsity can formally be represented using the ||||0 ( |{| 0}|).

Some common sparse signals:

1. Speech and music are sparse in frequency domain.


2. Images are sparse in wavelet basis.

In order to incorporate sparsity into the solution, we have to add a constraint which represents the sparsity of the solution.
That constraint is the L0 norm. But it is clear that the L0 norm is not convex and hence, convex optimization techniques
cant be applied to arrive a solution to optimization problems involving sparsity as a constraint. So, we employ convex
relaxations to take the sparsity into account.

It has been observed that L1 norm is the best relaxation for L0 norm, because the L1 norm grows the most along the
coordinate axes.

Keeping the above analysis in consideration, sparse PCA problem can be formalized as:

Given a matrix Anxd, which is the collection of data vectors, we want to write A as X+S = A, where rank(x) <= k (PCA problem)
AND ||s||0 <= s (sparsity condition).

Formally, the problem is:


2
min|| ( + )|| , such that rank(X) <= K, ||S||0 <= s.

Where ||x||F is the Frobenius norm, which is given by |||| = [=1 =1 2 ]1/2 and is used widely in low-rank
approximation problems.

The above definition is NP-hard. The objective function is convex, but the constraints are not convex. The constraints can
be relaxed to:

Rank(X)<=k ||x||* <= and ||S||1 <= where , are determined from k and s, and ||x||* is the nuclear norm, which
can be used to approximate rank of matrix ( ||()||1 , = is the singular value decomposition of X.

Applications:

1. Financial Data Analysis


Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may
generate principal components that are weighted combination of all the assets. In contrast, sparse PCA would
produce principal components that are weighted combination of only a few input assets, so one can easily
interpret its meaning. Furthermore, if one uses a trading strategy based on these principal components, fewer
assets imply less transaction costs.
2. Biology
Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principal
component that involves only a few genes, so researchers can focus on these specific genes for further analysis.

You might also like