0% found this document useful (0 votes)
62 views17 pages

7.3 Pca

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of large data sets by transforming correlated variables into a smaller number of uncorrelated variables called principal components. It works by calculating the eigenvalues and eigenvectors of the covariance matrix and choosing principal components with the highest eigenvalues to account for as much of the variability in the data as possible. PCA is commonly used for dimensionality reduction in applications like data analysis, neuroscience, and image processing.

Uploaded by

Matrix Bot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views17 pages

7.3 Pca

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of large data sets by transforming correlated variables into a smaller number of uncorrelated variables called principal components. It works by calculating the eigenvalues and eigenvectors of the covariance matrix and choosing principal components with the highest eigenvalues to account for as much of the variability in the data as possible. PCA is commonly used for dimensionality reduction in applications like data analysis, neuroscience, and image processing.

Uploaded by

Matrix Bot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Principal Component Analysis

(PCA)
Introduction
Principal component analysis (PCA) is a standard tool in modern
data analysis - in diverse fields from neuroscience to computer
graphics.

It is very useful method for extracting relevant information from


confusing data sets.

PCA is “an orthogonal linear transformation that transfers


the data to a new coordinate system such that the greatest
variance by any projection of the data comes to lie on the
first coordinate (first principal component), the second
greatest variance lies on the second coordinate (second
principal component), and so on.”
Definition
Principal component analysis (PCA) is a statistical procedure that
uses an orthogonal transformation to convert a set of observations
of possibly correlated variables into a set of values of linearly
uncorrelated variables called principal components.

The number of principal components is less than or equal to the


number of original variables.

PCA used to reduce dimensions of data without much loss of


information.
Goals

• The main goal of a PCA analysis is to identify patterns


in data
• PCA aims to detect the correlation between variables.
• It attempts to reduce the dimensionality.
• If covariance is positive, both dimensions increase
together. If negative, as one increases, the other
decreases. Zero: independent of each other.
Dimensionality Reduction

It reduces the dimensions of a d-dimensional dataset by


projecting it onto a (k)-dimensional subspace
(where k<d) in order to increase the computational
efficiency while retaining most of the information.
Transformation

This transformation is defined in such a way that the first


principal component has the largest possible variance and each
succeeding component in turn has the next highest possible
variance.
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0 C= cov(x,x) cov(x,y) = 0.6165 0.6154
cov(y,x) cov(y,y) 0.6154 0.7165
2.3 2.7
2 1.6 (C – λI) = 0 (C – λI) XVector = 0
1 1.1 I= 1 0
1.5 1.6 0 1
1.1 0.9 Quadratic equation of the Determinant is
x̄= 1.81 ȳ= 1.91 λ2-1.333λ+0.0630 = 0 C V = λ V VT = [X1 Y1]
Eigen Values: λ1 = 0.04908 λ2 = 1.2840
0.6165 X1 + 0.6154 Y1 = 0.0490 X1 0.6154X1+0.7165Y1 = 0.0490 Y1
Eigen Vectors: -0.735 0.677
-0.678 -0.73
The process of obtaining principle
components from a raw dataset
can be simplified in six parts
1. Take the whole dataset consisting of d+1 dimensions and
ignore the labels such that our new dataset becomes d
dimensional.
2. Compute the mean for every dimension of the whole
dataset.
3. Compute the covariance matrix of the whole dataset.
4. Compute eigenvalues and the corresponding eigenvectors.
5. Sort the eigenvectors by decreasing eigenvalues and
choose k eigenvectors with the largest eigenvalues to form
a d × k dimensional matrix W.
6. Use this d × k eigenvector matrix to transform the samples
onto the new subspace.
1. Given original data set S = {x1, ..., xk}, produce new
set by subtracting the mean of attribute Ai from each
xi.
2. <xi,yi> from DataAdjust . <v1> = zi
3. <xi,yi> from DataAdjust . <v2> = zi
Reconstructing the original data

We did:
TransformedData = RowFeatureVector × RowDataAdjust

so we can do
RowDataAdjust = RowFeatureVector -1 × TransformedData

and
RowDataOriginal = RowDataAdjust + OriginalMean
PCA Approach

• Standardize the data.


• Perform Singular Vector Decomposition to get the
Eigenvectors and Eigenvalues.
• Sort eigenvalues in descending order and choose
the k- eigenvectors
• Construct the projection matrix from the
selected k- eigenvectors.
• Transform the original dataset via projection matrix to obtain
a k-dimensional feature subspace.
Limitation of PCA

The results of PCA depend on the scaling of the variables.

A scale-invariant form of PCA has been developed.


Applications of PCA :

• Interest Rate Derivatives Portfolios


• Neuroscience
• Image Processing
Thank You

You might also like