0% found this document useful (0 votes)
90 views70 pages

Pac

The document provides information about principal component analysis (PCA), including: - PCA is a technique used to reduce the dimensionality of large datasets while retaining as much information as possible. It converts high-dimensional data into a lower-dimensional representation. - PCA uses linear transformations to convert correlated variables into uncorrelated variables called principal components. The first principal component accounts for the largest possible variance in the data, and each succeeding component accounts for the highest variance possible under the constraint of being orthogonal to the preceding components. - PCA is useful for data compression, visualization, and interpreting data patterns in large, complex datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views70 pages

Pac

The document provides information about principal component analysis (PCA), including: - PCA is a technique used to reduce the dimensionality of large datasets while retaining as much information as possible. It converts high-dimensional data into a lower-dimensional representation. - PCA uses linear transformations to convert correlated variables into uncorrelated variables called principal components. The first principal component accounts for the largest possible variance in the data, and each succeeding component accounts for the highest variance possible under the constraint of being orthogonal to the preceding components. - PCA is useful for data compression, visualization, and interpreting data patterns in large, complex datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Contents

 Introduction
 Working
 Graphical Overview
 Math Reminder
 PCA Process
 Algorithm
 Uses and Limitations
Abstract

 In the present big data era, there is a need to process large amounts of
unlabeled data and find some patterns in the data to use it further.
 Need to discard features that are unimportant and discover only the
representations that are needed.
 It is possible to convert high-dimensional data to low-dimensional data using
different techniques, this dimension reduction is important and makes tasks
such as classification, visualization, communication and storage much easier.
 The loss of information should be less while mapping data from high-
dimensional space to low-dimensional space
Data Reduction
 summarization of data with many (p) variables by a smaller set of
(k) derived (synthetic, composite) variables.

p k

n A n X
Data Reduction
 “Residual” variation is information in A that is not retained in
X
 balancing act between
 clarity of representation, ease of understanding
 oversimplification: loss of important or relevant
information.
Principal Component Analysis
(PCA)
 takes a data matrix of n objects by p variables, which may be
correlated, and summarizes it by uncorrelated axes (principal
components or principal axes) that are linear combinations of
the original p variables
 the first k components display as much as possible of the
variation among objects.
Working
Geometric Rationale of PCA
 objects are represented as a cloud of n points in a
multidimensional space with an axis for each of the p
variables
 the centroid of the points is defined by the mean of each
variable
 the variance of each variable is the average squared
deviation of its n values around the mean of that variable.

X im  X i 
n
1

2
Vi 
n  1 m 1
Geometric Rationale of PCA
 degree to which the variables are linearly correlated
is represented by their covariances.

Cij 
1 n
 X im  X i X jm  X j 
n  1 m 1
Geometric Rationale of PCA
 objective of PCA is to rigidly rotate the axes of this
p-dimensional space to new positions (principal axes)
that have the following properties:
 ordered such that principal axis 1 has the highest
variance, axis 2 has the next highest variance, ....
, and axis p has the lowest variance
 covariance among each pair of the principal axes
is zero (the principal axes are uncorrelated).
2D Example of PCA
 variables X1 and X2 have positive covariance & each
has a similar variance.
14

12

10
Variable X 2

6 X 2  4.91
+
4

X 1  8.35
2

0
0 2 4 6 8 10 12 14 16 18 20

Variable X1

V1  6.67 V2  6.24 C 1, 2  3.42


Configuration is Centered
 each variable is adjusted to a mean of zero (by
subtracting the mean from each value).
8

4
Variable X 2

0
-8 -6 -4 -2 0 2 4 6 8 10 12

-2

-4

-6

Variable X1
Principal Components are Computed
 PC 1 has the highest possible variance (9.88)
 PC 2 has a variance of 3.03
 PC 1 and PC 2 have zero covariance.
6

2
PC 2

0
-8 -6 -4 -2 0 2 4 6 8 10 12

-2

-4

-6

PC 1
The Dissimilarity Measure Used in PCA is
Euclidean Distance

 PCA uses Euclidean Distance calculated from the p variables as


the measure of dissimilarity among the n objects
 PCA derives the best possible k dimensional (k < p)
representation of the Euclidean distances among objects.
Generalization to p-dimensions
 In practice nobody uses PCA with only 2 variables
 The algebra for finding principal axes readily generalizes to p
variables
 PC 1 is the direction of maximum variance in the p-dimensional
cloud of points
 PC 2 is in the direction of the next highest variance, subject
to the constraint that it has zero covariance with PC 1.
Generalization to p-dimensions
 PC 3 is in the direction of the next highest variance, subject to
the constraint that it has zero covariance with both PC 1 and PC
2
 and so on... up to PC p
 each principal axis is a linear combination of the original
two variables
 PCj = ai1Y1 + ai2Y2 + … ainYn
 aij’s are the coefficients for factor i, multiplied by the
measured value for variable j
8

PC 1
6

PC 2
Variable X 2

0
-8 -6 -4 -2 0 2 4 6 8 10 12

-2

-4

-6

Variable X1
 PC axes are a rigid rotation of the original variables
 PC 1 is simultaneously the direction of maximum variance
and a least-squares “line of best fit” (squared distances
of points away from PC 1 are minimized).
8

PC 1
6

PC 2
Variable X 2

0
-8 -6 -4 -2 0 2 4 6 8 10 12

-2

-4

-6

Variable X1
Generalization to p-dimensions
 if we take the first k principal components, they define the k-
dimensional “hyperplane of best fit” to the point cloud
 of the total variance of all p variables:
 PCs 1 to k represent the maximum possible proportion of that
variance that can be displayed in k dimensions
 i.e. the squared Euclidean distances among points calculated
from their coordinates on PCs 1 to k are the best possible
representation of their squared Euclidean distances in the
full p dimensions.
Performance Measure PCA

 Principal Component Analysis (PCA) has been applied to thru-


transmission ultrasound data taken on ceramic armor.
 The thru-transmission ultrasound data was analyzed.
 As the ultrasound transducer moves along the surface of the
tile, the signal from the sound wave is measured as it reaches
the receiver, giving a time signal at each tile location
 From the ballistics tests, it can be seen that the performance
measure correlates well to the penetration velocities.
Graphical Overview
PCA With Example
algorithm
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Cont.
Math Reminder
Standard Deviation
 Statistics – analyzing data sets in terms of the
relationships between the individual points
 Standard Deviation is a measure of the spread of the data
 Calculation: average distance from the mean of the data
Variance
 Another measure of the spread of the data in a data set
 Calculation:
 Var( X ) = E(( x – μ )^2)
 Why have both variance and SD to calculate the spread of
data?
 Variance is claimed to be the original statistical measure
of spread of data. However it’s unit would be expressed as
a square e.g. cm^2, which is unrealistic to express heights
or other measures. Hence SD as the square root of
variance was born.
Covariance
 Variance – measure of the deviation from the mean for
points in one dimension e.g. heights
 Covariance as a measure of how much each of the
dimensions vary from the mean with respect to each
other.
 Covariance is measured between 2 dimensions to see if
there is a relationship between the 2 dimensions e.g.
number of hours studied & marks obtained
 The covariance between one dimension and itself is the
variance
Covariance Matrix
 Representing Covariance between dimensions as a matrix
e.g. for 3 dimensions:
cov(x,x) cov(x,y) cov(x,z)
C= cov(y,x) cov(y,y) cov(y,z)
cov(z,x) cov(z,y) cov(z,z)

 Diagonal is the variances of x, y and z


 cov(x,y) = cov(y,x) hence matrix is symmetrical about the
diagonal
 N-dimensional data will result in nxn covariance matrix.
Covariance
 Exact value is not as important as it’s sign.
 A positive value of covariance indicates both dimensions
increase or decrease together e.g. as the number of hours
studied increases, the marks in that subject increase.
 A negative value indicates while one increases the other
decreases, or vice-versa e.g. active social life at RIT vs
performance in CS dept.
 If covariance is zero: the two dimensions are independent
of each other e.g. heights of students vs the marks
obtained in a subject
Eigenvalue problem
 The eigenvalue problem is any problem having the following
form:
A.v=λ.v
 A: n x n matrix
 v: n x 1 non-zero vector
 λ: scalar
 Any value of λ for which this equation has a solution is
called the eigenvalue of A and vector v which corresponds
to this value is called the eigenvector of A.
Eigenvalue problem
2 3 3 12 3
2 1 X 2 = 8 = 4 x 2

A . V = λ . v

Therefore, (3,2) is an eigenvector of the square matrix A and 4 is


an eigenvalue of A

Given matrix A, how can we calculate the eigenvector and


eigenvalues for A?
Calculating eigenvectors & eigenvalues

Given A.v=λ.v
A.v-λ.I.v=0
(A - λ . I ). v = 0
Finding the roots of |A - λ . I| will give the eigenvalues and for
each of these eigenvalues there will be an eigenvector
Example …
Calculating eigenvectors & eigenvalues

If A= 0 1
-2 -3

Then |A - λ . I| = 0 1 λ 0 = 0
-2 -3 0 λ

-λ 1 = λ2 + 3λ + 2 = 0
-2 -3-λ

This gives us 2 eigenvalues:


λ1 = -1 and λ2 = -2
Properties of eigenvectors and
eigenvalues
 Note that Irrespective of how much we scale (3,2) by, the
solution is always a multiple of 4.
 Eigenvectors can only be found for square matrices and
not every square matrix has eigenvectors.
 Given an n x n matrix, we can find n eigenvectors
Principle Components
Analysis
PCA
 Principal components analysis (PCA) is a technique that can
be used to simplify a dataset
 It is a linear transformation that chooses a new
coordinate system for the data set such that
 greatest variance by any projection of the data set comes
to lie on the first axis (then called the first principal
component),
 the second greatest variance on the second axis, and so
on.
 PCA can be used for reducing dimensionality by eliminating
the later principal components.
PCA
 By finding the eigenvalues and eigenvectors of the
covariance matrix, we find that the eigenvectors with the
largest eigenvalues correspond to the dimensions that
have the strongest correlation in the dataset.
 This is the principal component.
 PCA is a useful statistical technique that has found
application in:
 fields such as face recognition and image compression
 finding patterns in data of high dimension
PCA process –STEP 1

 Subtract the mean from each of the data dimensions. All


the x values have x subtracted and y values have y
subtracted from them. This produces a data set whose
mean is zero.
 Subtracting the mean makes variance and covariance
calculation easier by simplifying their equations. The
variance and co-variance values are not affected by the
mean value.
PCA process –STEP 1

DATA: ZERO MEAN DATA:


x y x y_
2.5 2.4 .69 .49
0.5 0.7 -1.31 -1.21
2.2 2.9 .39 .99
1.9 2.2 .09 .29
3.1 3.0 1.29 1.09
2.3 2.7 .49 .79
2 1.6 .19 -.31
1 1.1 -.81 -.81
1.5 1.6 -.31 -.31
1.1 0.9 -.71 -1.01
PCA process –STEP 1
PCA process –STEP 2
 Calculate the covariance matrix

cov = .616555556 .615444444


.615444444 .716555556

 since the non-diagonal elements in this covariance matrix


are positive, we should expect that both the x and y
variable increase together.
PCA process –STEP 3
 Calculate the eigenvectors and eigenvalues of the
covariance matrix
eigenvalues = .0490833989
1.28402771

eigenvectors = -.735178656 -.677873399


.677873399 -.735178656
PCA process –STEP 3
Eigenvectors are plotted as
diagonal dotted lines on the
plot.
Note they are perpendicular
to each other.
Note one of the eigenvectors
goes through the middle of
the points, like drawing a line
of best fit.
The second eigenvector gives
us the other, less important,
pattern in the data, that all
the points follow the main
line, but are off to the side
of the main line by some
amount.
PCA process –STEP 4
 Reduce dimensionality and form feature vector the
eigenvector with the highest eigenvalue is the principle
component of the data set.

 In our example, the eigenvector with the larges eigenvalue


was the one that pointed down the middle of the data.

 Once eigenvectors are found from the covariance matrix,


the next step is to order them by eigenvalue, highest to
lowest. This gives you the components in order of
significance.
PCA process –STEP 4
 Now, if you like, you can decide to ignore the components
of lesser significance

 You do lose some information, but if the eigenvalues are


small, you don’t lose much

 n dimensions in your data


 calculate n eigenvectors and eigenvalues
 choose only the first p eigenvectors
 final data set has only p dimensions.
PCA process –STEP 4
 Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
 We can either form a feature vector with both of the
eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less significant
component and only have a single column:
- .677873399
- .735178656
PCA process –STEP 5

 Deriving the new data


 FinalData = RowFeatureVector x RowZeroMeanData
 RowFeatureVector is the matrix with the eigenvectors in
the columns transposed so that the eigenvectors are now
in the rows, with the most significant eigenvector at the
top
 RowZeroMeanData is the mean-adjusted data transposed,
ie. the data items are in each column, with each row
holding a separate dimension.
PCA process –STEP 5

R = U S VT
variables factors factors variables
sig. significant

significant
noise noise

noise
factors factors

samples samples
PCA process –STEP 5
 FinalData is the final data set, with data items in columns,
and dimensions along rows.
 What will this give us?
 It will give us the original data solely in terms of the vectors
we chose.
 We have changed our data from being in terms of the axes
x and y , and now they are in terms of our 2 eigenvectors.
PCA process –STEP 5
FinalData transpose: dimensions along
columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
PCA process –STEP 5
Reconstruction of original Data
 If we reduced the dimensionality, obviously, when
reconstructing the data we would lose those dimensions we
chose to discard. In our example let us assume that we
considered only the x dimension…
Reconstruction of original Data

x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
Algorithm
How do I do a PCA?

 Step 1 – Standardize
 Step 2 – Calculate covariance
 Step 3 – Deduce Eigen's:
 Step 4 – Re-orient data:
 Step 5 – Plot re-oriented data:
Use of PCA
It is often helpful to use a dimensionality-reduction technique
such as PCA prior to performing machine learning because:

 Reducing the dimensionality of the dataset reduces the size


of the space on which k-nearest-neighbors (kNN) must
calculate distance, which improve the performance of kNN.
 Reducing the dimensionality of the dataset reduces the
number of degrees of freedom of the hypothesis, which
reduces the risk of over fitting.
 Most algorithms will run significantly faster if they have
fewer dimensions they need to look at.
 Reducing the dimensionality via PCA can simplify the dataset,
facilitating description, visualization, and insight.
Problems with PCA
PCA is not without its problems and limitations:
 PCA assumes approximate normality of the input space
distribution
 PCA may still be able to produce a \good" low dimensional
projection of the data even if the data isn't normally
distributed
Thanks 

You might also like