0% found this document useful (0 votes)
68 views9 pages

MLPDF 2

The document provides steps for performing principal component analysis (PCA) and applying it to reduce the dimensions of a dataset. It describes 1) representing the dataset as a matrix, 2) standardizing the data, 3) calculating the covariance matrix and eigenvectors/eigenvalues, 4) using the eigenvalues to select principal components and reduce dimensions, and 5) an example of applying PCA to reduce a 2D dataset to 1D. Singular value decomposition (SVD) is also discussed as an alternative technique for dimensionality reduction.

Uploaded by

ssalman85072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views9 pages

MLPDF 2

The document provides steps for performing principal component analysis (PCA) and applying it to reduce the dimensions of a dataset. It describes 1) representing the dataset as a matrix, 2) standardizing the data, 3) calculating the covariance matrix and eigenvectors/eigenvalues, 4) using the eigenvalues to select principal components and reduce dimensions, and 5) an example of applying PCA to reduce a 2D dataset to 1D. Singular value decomposition (SVD) is also discussed as an alternative technique for dimensionality reduction.

Uploaded by

ssalman85072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Steps for PCA

⚫ 1. Getting the dataset - get the input dataset and divide it into two subparts X and Y, where X is the
training set, and Y is the validation set.

2. Representing data into a structure

Represent our dataset into a structure. i.e. represent the

two-dimensional matrix of independent variable X. • Here each row corresponds to the data items, and
the

column corresponds to the Features.

The number of columns is the dimensions of the dataset

Steps for PCA...

3. Standardizing the data

from X, In a particular column, the features with high variance are more important, compared to the
features with lower variance. If the importance of features is independent of the variance of the feature,
then divide each data item in a column with the standard

deviation of the column.

• This matrix is Z.
4. Calculating the Covariance of Z find Z transpose, and multiply it by Z.

i.e. Z'Z is the Covariance matrix of Z.

.Steps for PCA...

⚫5, Calculating the Eigen Values and Eigen Vectors for

covariance matrix Z Eigenvectors are the directions of the axes with high information. And the
coefficients of these eigenvectors are defined as the eigenvalues.

6. Sorting the Eigen Vectors all the eigenvalues will sort in decreasing order, And sort the eigenvectors
accordingly in matrix P of

eigenvalues. The resultant matrix will be named as P*

Steps for PCA...

⚫7. Calculating the new features Or Principal Components multiply the P* matrix to the Z, and the
resultant matrix Z*, Each column of the Z* matrix is independent of each other. ⚫ 8. Remove less or
unimportant features from the new

dataset, Z*

Example

• Given the following data, use PCA to reduce the dimension from 2 to 1

Step 1: Given Data Set


Feature

Example 1 Example 2 Example 3 Example 4 4

No. of features, n: 2

No. of samples, N: 4

Step 2: Computation of mean of variables

-X (4+8+13+7)/4= 8

y=(11+4+5+14) / 48.5

Step 3: computation of covariance matrix

Example

• Write ordered pairs of (x,y) are

Feature

x4

(x,x), (x,y), (y.x), (y.y)


11

i) find covariance of all ordered pairs Cov(x,x)= (1/(N-1)) E-1(xik-x)(xjk-x)

Cov(x,x) = (1/(N-1)) E-1(xk-x)2 (both covariance are same

"x")

(1/4-1)(4-8)+(8-8)+(13-8)2-(7-8)²=14

• Cov(x,y)=(1/4-1)((4-8) (11-8.5)+(8-8) (4-8.5)+(13-8) (5-8.5)+(7-8)(14-8.5))

=-11

• Cov(y,x)=cov(x,y)=-11

• Cov(y,y)=(1/4-1)((11-8.5)²+(4-8.5)+(5-8.5)²+(14-8.5)2)=23

ii) Covariance matrix nxn (2x2)

S=
S=

[cov(x, x) cov(x,y) cov(y,x) cov(y,y)

[14 - 11] -11 23

Step 4: Eigen Value, Eigen vector and normalized eigen vector.

⚫i) Eigen Value

| and 2*1= [] Determinent [S-A11-0 1-ar

• Where S is Covariance mateix, I is Identity matrix and 2 is eigen value. Det(S-AI)=det [111 [14-2 11]

23-λ

=0

(14-A)(23-A)-(-11-11)=22-37A+201=0 A 30.3849, 6.6151

- 21> A2,

-A1-30.3849 (first principal component) . and A2=6.6151


Step 6: Coordinate system for principal component

• Select mean of x,y (8,8.5) Then select e1 & e2

16

14 12

10 6

0.5574

• el=

-0.8303

0.8303] e2= 0.5574]

Then draw the line

-e1 and e2

2 4 6 8 10 12 14 16 18

Then place the table values on e1


PCA...

• If every values lies on e1 i.e. on single dimension, then it is easy for computation, when compared to
two dimension.

Applications of PCA

• PCA is mainly used as the dimensionality reduction

technique in various Al applications such as computer vision, image compression, etc. ⚫ It can also be
used for finding hidden patterns if data has

high dimensions. Some fields where PCA is used are Finance,

data mining, Psychology, etc

Singular value decomposition (SVD)

- Singular value decomposition (SVD) is a matrix factorization technique

commonly used in linear algebra. SVD of a matrix A (mx n) is a factorization of the form:

A = UMV

where, U and V are orthonormal matrices,

⚫ U is an m x m unitary matrix,
-Vis an nxn unitary matrix and

Σ is an m x n rectangular diagonal matrix.

The diagonal entries of Σ are known as singular values of matrix A.

The columns of U and Vare called the left-singular and right-singular

vectors of matrix A, respectively.

Singular Value Decomposition (SVD)...

• SVD of a data matrix - the properties:

• 1. Patterns in the attributes are captured by the right-singular vectors, i.e. the columns of V.

• 2. Patterns among the instances are captured by the left-singular, i.e. the columns

of U. 3. Larger a singular value, larger is the part of the matrix A that it accounts for and its associated
vectors.

. 4. New data matrix with 'K' attributes is obtained using the equation •D=DX [V,,V,,V₂]

• Thus, the dimensionality gets reduced to k

• SVD is often used in the context of text data.

You might also like