MLPDF 2
MLPDF 2
⚫ 1. Getting the dataset - get the input dataset and divide it into two subparts X and Y, where X is the
training set, and Y is the validation set.
two-dimensional matrix of independent variable X. • Here each row corresponds to the data items, and
the
from X, In a particular column, the features with high variance are more important, compared to the
features with lower variance. If the importance of features is independent of the variance of the feature,
then divide each data item in a column with the standard
• This matrix is Z.
4. Calculating the Covariance of Z find Z transpose, and multiply it by Z.
covariance matrix Z Eigenvectors are the directions of the axes with high information. And the
coefficients of these eigenvectors are defined as the eigenvalues.
6. Sorting the Eigen Vectors all the eigenvalues will sort in decreasing order, And sort the eigenvectors
accordingly in matrix P of
⚫7. Calculating the new features Or Principal Components multiply the P* matrix to the Z, and the
resultant matrix Z*, Each column of the Z* matrix is independent of each other. ⚫ 8. Remove less or
unimportant features from the new
dataset, Z*
Example
• Given the following data, use PCA to reduce the dimension from 2 to 1
No. of features, n: 2
No. of samples, N: 4
-X (4+8+13+7)/4= 8
y=(11+4+5+14) / 48.5
Example
Feature
x4
"x")
(1/4-1)(4-8)+(8-8)+(13-8)2-(7-8)²=14
=-11
• Cov(y,x)=cov(x,y)=-11
• Cov(y,y)=(1/4-1)((11-8.5)²+(4-8.5)+(5-8.5)²+(14-8.5)2)=23
S=
S=
• Where S is Covariance mateix, I is Identity matrix and 2 is eigen value. Det(S-AI)=det [111 [14-2 11]
23-λ
=0
- 21> A2,
16
14 12
10 6
0.5574
• el=
-0.8303
-e1 and e2
2 4 6 8 10 12 14 16 18
• If every values lies on e1 i.e. on single dimension, then it is easy for computation, when compared to
two dimension.
Applications of PCA
technique in various Al applications such as computer vision, image compression, etc. ⚫ It can also be
used for finding hidden patterns if data has
commonly used in linear algebra. SVD of a matrix A (mx n) is a factorization of the form:
A = UMV
⚫ U is an m x m unitary matrix,
-Vis an nxn unitary matrix and
• 1. Patterns in the attributes are captured by the right-singular vectors, i.e. the columns of V.
• 2. Patterns among the instances are captured by the left-singular, i.e. the columns
of U. 3. Larger a singular value, larger is the part of the matrix A that it accounts for and its associated
vectors.
. 4. New data matrix with 'K' attributes is obtained using the equation •D=DX [V,,V,,V₂]