PCA and Sparse PCA Principal Component Analysis

PCA finds a lower dimensional subspace that best represents the data by minimizing squared error between the projections of the data onto the subspace and the original data points. Sparse PCA extends PCA by imposing sparsity constraints to find components that are linear combinations of only a few input variables, unlike standard PCA which uses all variables. This allows for more interpretable components. Applications include financial data analysis where sparse PCA identifies principal components based on only a few important assets, and biology to identify genes most relevant to specific phenotypes.

Uploaded by

RAVI TEJ AMBATI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views2 pages

PCA and Sparse PCA Principal Component Analysis

Uploaded by

RAVI TEJ AMBATI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

PCA and sparse PCA

Principal Component Analysis

Principal component analysis is a very useful tool for dimensionality reduction. Dimensionality reduction can be visualized
as extracting the essential information from a given data. PCA is used in a wide variety of fields, from computer vision to
neuro-biology. One of the advantages of dimensionality reduction is that it can reveal the hidden, simplified dynamics
underlying the data.

Principal component analysis can be viewed as a best-fit subspace problem. Lets say we have a data in d-dimensional
space, and want to find a subspace S of dimension k that is closest to the data in the minimum squared error sense, this
is indeed a PCA problem with k principal components. A more formal definition of the problem is given as follows:

Given a dataset a1, a2, ..., an Rd.

1/2
Define () = =1 || ( )||2 , where the norm used is L2 norm, defined as ||||2 = (=1 2 ) ,
.

Where,

u Rd, Subspace S Rd

(u) is the vector in subspace S that is closest to the vector u argmin || ||2 .

Then, the PCA problem can be formally defined as:

argmin () , dim() .

Lets look at some applications of principal component analysis:

1. Dimension reduction:
Lets say there are 100,000 vectors in a 10000-dimensional vector space. So, n = 100000, d = 10000. This demands
a huge space requirement and this data will be hard to transfer as well. We can find a subspace S using PCA and
obtain a k-dimensional subspace of Rd. This subspace will have k orthonormal vectors as the basis. So, these k
vectors are each of dimension d and the 100000 data vectors can be represented with k components each, thus
requiring storing just the k 10000-dimensional vectors and the 100000 data vectors are stored using k
components instead of d components, which will improve the storage and ease transfer.
2. Denoising the signal
PCA has found a lot of applications in speech processing and other fields where noise is a common occurrence in
the signal. PCA can help us discard the noise by finding a subspace along the maximum variability of the data, thus
minimizing the effect of noise.
3. Applications in neuroscience:
A variant of PCA is used in neuroscience to identify the specific properties of a stimulus that increase a neurons
probability of generating an action potential. PCA is also used to find the identity of a neuron from the shape
of its action potential. PCA as a dimension reduction technique is suited to detect coordinated activities of
large neuronal ensembles.

While PCA finds the mathematically optimal method (as in minimizing the squared error), it is sensitive to outliers in the
data that produce large errors PCA tries to avoid. It therefore is widespread practice to remove outliers before computing
PCA. However, in some contexts, outliers can be difficult to identify. For example, in data mining algorithms like correlation
clustering, the assignment of points to clusters and outliers is not known beforehand. A recently proposed generalization
of PCA based on a weighted PCA increases robustness by assigning different weights to data objects based on their
estimated relevancy. [Wikipedia]
Sparse Principal Component Analysis

A disadvantage of PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA
overcomes this disadvantage by finding linear combinations that contain just a few input variables.

What is a sparse signal?

A sparse signal is one which has most of its components equal to or very close to 0 and a few components with high value.
A sparse vector can be considered as one with all but a few zeros in its components. A vector x* is k-sparse
|( )| ; . We know that the L0 norm of a vector is the number of non-zero components of the vector.
Therefore, sparsity can formally be represented using the ||||0 ( |{| 0}|).

Some common sparse signals:

1. Speech and music are sparse in frequency domain.

2. Images are sparse in wavelet basis.

In order to incorporate sparsity into the solution, we have to add a constraint which represents the sparsity of the solution.
That constraint is the L0 norm. But it is clear that the L0 norm is not convex and hence, convex optimization techniques
cant be applied to arrive a solution to optimization problems involving sparsity as a constraint. So, we employ convex
relaxations to take the sparsity into account.

It has been observed that L1 norm is the best relaxation for L0 norm, because the L1 norm grows the most along the
coordinate axes.

Keeping the above analysis in consideration, sparse PCA problem can be formalized as:

Given a matrix Anxd, which is the collection of data vectors, we want to write A as X+S = A, where rank(x) <= k (PCA problem)
AND ||s||0 <= s (sparsity condition).

Formally, the problem is:

2
min|| ( + )|| , such that rank(X) <= K, ||S||0 <= s.

Where ||x||F is the Frobenius norm, which is given by |||| = [=1 =1 2 ]1/2 and is used widely in low-rank
approximation problems.

The above definition is NP-hard. The objective function is convex, but the constraints are not convex. The constraints can
be relaxed to:

Rank(X)<=k ||x||* <= and ||S||1 <= where , are determined from k and s, and ||x||* is the nuclear norm, which
can be used to approximate rank of matrix ( ||()||1 , = is the singular value decomposition of X.

Applications:

1. Financial Data Analysis

Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may
generate principal components that are weighted combination of all the assets. In contrast, sparse PCA would
produce principal components that are weighted combination of only a few input assets, so one can easily
interpret its meaning. Furthermore, if one uses a trading strategy based on these principal components, fewer
assets imply less transaction costs.
2. Biology
Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principal
component that involves only a few genes, so researchers can focus on these specific genes for further analysis.

Orthopedic Physical Assessment 5th Ed
No ratings yet
Orthopedic Physical Assessment 5th Ed
468 pages
20 Pca
No ratings yet
20 Pca
50 pages
Lec 15
No ratings yet
Lec 15
28 pages
Activity
No ratings yet
Activity
14 pages
AA11 - Unsupervised Learning - 2024
No ratings yet
AA11 - Unsupervised Learning - 2024
39 pages
Question and Answer PCA
No ratings yet
Question and Answer PCA
4 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
10 Pca
No ratings yet
10 Pca
26 pages
Payroll Management System
84% (64)
Payroll Management System
56 pages
Acml 18
No ratings yet
Acml 18
16 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
No ratings yet
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
19 pages
PCA Slides
No ratings yet
PCA Slides
11 pages
Mat 211 - 7
No ratings yet
Mat 211 - 7
14 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Robust Principal Component Analysis?
No ratings yet
Robust Principal Component Analysis?
39 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Feature Construction
No ratings yet
Feature Construction
8 pages
Robust Principal Component Analysis
No ratings yet
Robust Principal Component Analysis
39 pages
Dimensionality Reduction: Principal Component Analysis (PCA)
No ratings yet
Dimensionality Reduction: Principal Component Analysis (PCA)
11 pages
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
No ratings yet
Advances in Principal Component Analysis Research and Development - Ganesh R. Naik
256 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Unit 3
No ratings yet
Unit 3
31 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Clustering and Feature Selection Using Sparse Principal Component Analysis
No ratings yet
Clustering and Feature Selection Using Sparse Principal Component Analysis
13 pages
Zou 2006
No ratings yet
Zou 2006
23 pages
Principal Component Analysis (PCA) : Feature Extraction Node
No ratings yet
Principal Component Analysis (PCA) : Feature Extraction Node
4 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
A11 Find
No ratings yet
A11 Find
37 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
STAT502
No ratings yet
STAT502
13 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
PCA Basics
No ratings yet
PCA Basics
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
No ratings yet
Pattern Recognition: Meng Lu, Jianhua Z. Huang, Xiaoning Qian
11 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
No ratings yet
Pattern Recognition: Gui-Fu Lu, Jian Zou, Yong Wang, Zhongqun Wang
7 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Presentation
No ratings yet
Presentation
31 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
16 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
VSR 411 QB Anaesthesia
No ratings yet
VSR 411 QB Anaesthesia
7 pages
Things To Remember - Principal Component Analysis
No ratings yet
Things To Remember - Principal Component Analysis
2 pages
PCA Tutorial: Instructor: Forbes Burkowski
No ratings yet
PCA Tutorial: Instructor: Forbes Burkowski
12 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Catalogue of G K Publishers
No ratings yet
Catalogue of G K Publishers
10 pages
Pesco Officer Grade NTS Pattern
100% (1)
Pesco Officer Grade NTS Pattern
8 pages
English Quarter 1 Week 7
No ratings yet
English Quarter 1 Week 7
30 pages
Volume 11 Complete
No ratings yet
Volume 11 Complete
143 pages
Newborn Care Checklist Edited
No ratings yet
Newborn Care Checklist Edited
3 pages
Feminism The Power of Being Heard
No ratings yet
Feminism The Power of Being Heard
5 pages
Kavita Nayar
No ratings yet
Kavita Nayar
4 pages
Teaching Practice-ELT619: Sr. No Grade/Class Subject Topic
100% (5)
Teaching Practice-ELT619: Sr. No Grade/Class Subject Topic
27 pages
Reactive Blue 221
No ratings yet
Reactive Blue 221
4 pages
Story Comprehension and Retelling Language Arts Pre K
No ratings yet
Story Comprehension and Retelling Language Arts Pre K
11 pages
RACGP Managing Wellbeing
No ratings yet
RACGP Managing Wellbeing
36 pages
Mirgiyas Usmanov v. U.S. Attorney General, 11th Cir. (2014)
No ratings yet
Mirgiyas Usmanov v. U.S. Attorney General, 11th Cir. (2014)
9 pages
Acsm Get Certified Guide: Be The Gold Standard
No ratings yet
Acsm Get Certified Guide: Be The Gold Standard
16 pages
Introduction: Royal Kingdom of Maharlikan
100% (1)
Introduction: Royal Kingdom of Maharlikan
2 pages
EEE 531 A MGT and Organization
No ratings yet
EEE 531 A MGT and Organization
8 pages
Buyer-Seller Relationship: Industrial Marketing: Chapter 4
No ratings yet
Buyer-Seller Relationship: Industrial Marketing: Chapter 4
10 pages
Use of Motivation in The Teaching-Learning Process Intrinsic and Extrinsic Motivation
No ratings yet
Use of Motivation in The Teaching-Learning Process Intrinsic and Extrinsic Motivation
14 pages
Necromancer Reference Card
No ratings yet
Necromancer Reference Card
1 page
Denton Et Al-2003-Cochrane Database of Systematic Reviews
No ratings yet
Denton Et Al-2003-Cochrane Database of Systematic Reviews
33 pages
BCCM - Session 21 - Crisis of Confrontation, Malevolence
No ratings yet
BCCM - Session 21 - Crisis of Confrontation, Malevolence
46 pages
Ket Test 25
No ratings yet
Ket Test 25
3 pages
Potential of Galvanic Cell
No ratings yet
Potential of Galvanic Cell
4 pages
Abdominal Pain and Seizure in A 4-Year-Old Boy: Presentation
No ratings yet
Abdominal Pain and Seizure in A 4-Year-Old Boy: Presentation
5 pages
Candida-Associated Denture Stomatitis
No ratings yet
Candida-Associated Denture Stomatitis
5 pages
Grade - 7
No ratings yet
Grade - 7
1 page
Iii. Financial Assessment 1. Profitability Ratios
No ratings yet
Iii. Financial Assessment 1. Profitability Ratios
2 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet