Unit 3
Unit 3
Unit-3
Unit 3
• Dimensionality Reduction: Introduction, Subset
Selection, Principal Component Analysis PCA, Factor
Analysis, Singular Value Decomposition and Matrix
Factorization, Multidimensional Scaling, Linear
Discriminant Analysis LDA
What is Dimensionality Reduction?
• The number of input features, variables, or columns
present in a given dataset is known as dimensionality, and
the process to reduce these features is called
dimensionality reduction.
• 2.Wrapper Method
• Forward Selection
• Backward Selection
• Bi-directional Elimination
• 3.Embedded Method
• LASSO
• Elastic Net
• Ridge Regression, etc.
Feature Extraction
• Feature extraction is the process of transforming the
space containing many dimensions into space with
fewer dimensions.
• 0.5674 x1 = -0.6154 y1
• Divide both side by 0.5674.
• You will get : x1 = -1.0845 y1
• x1 = -1.0845 y1
• So in that case (x1, y1) will be (-1.0845,1). This will be the initial eigen vector.
Needs normalization to get the final value.
• Reducing the number of variables of a data set naturally comes at the expense of accuracy, but
the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller
data sets are easier to explore and visualize and make analyzing data much easier and faster
for machine learning algorithms without extraneous variables to process.
• So to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while
preserving as much information as possible.
• What do the covariances that we have as entries of the matrix tell us
about the correlations between the variables?
• It’s actually the sign of the covariance that matters
• Now, that we know that the covariance matrix is not more than a table
that summaries the correlations between all the possible pairs of
variables, let’s move to the next step.
Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute
from the covariance matrix in order to determine the principal components of the data.
Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e., principal
components) are uncorrelated and most of the information within the initial variables is
squeezed or compressed into the first components.
So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to
put maximum possible information in the first component.
Then maximum remaining information in the second and so on, until having something
like shown in the scree plot below.
• As there are as many principal components as there are variables in the data, principal
components are constructed in such a manner that the first principal component accounts for
the largest possible variance in the data set.
• Organizing information in principal components this way, will allow you to reduce
dimensionality without losing much information, and this by discarding the components with
low information and considering the remaining components as your new variables.
• An important thing to realize here is that, the principal components are less interpretable and
don’t have any real meaning since they are constructed as linear combinations of the initial
variables.
Characteristic Polynomial and characteristic equation
and
Eigen Values and Eigen Vectors
Computation for 2x2 and 3x3 Square Matrix
Eigen Values and Eigen Vectors
58
2 X 2 Example : Compute Eigen Values
A= 1 -2 so A - I = 1 - -2
3 -4 3 -4 -
Set 2 + 3 + 2 to 0
1 2 3 1
0 1 −
0 2 3
A − I n = 0 −4 2 − 0
0 = 0
1 −4− 2
0 0 7 1
0
0
0 0 7 −
1 − 2 3
det( A − I n ) = 0 → det 0 −4− 2 =0
0 0 7 −
(1 − )(− 4 − )(7 − ) = 0
= 1, − 4, 7
Example 3: Eigenvalues and Eigenvectors
Find the eigenvalues and eigenvectors of the matrix
5 4 2
A = 4 5 2
2
2 2
Solution The matrix A – I3 is obtained by subtracting from the diagonal elements of A.Thus
5 − 4 2
A − I 3 = 4 5− 2
2
2 −
2
The characteristic polynomial of A is |A – I3|. Using row and column operations to simplify
determinants, we get
Alternate Solution
Solve any two equations
• 2 = 1
Let = 1 in (A – I3)x = 0. We get
( A − 1I 3 ) x = 0
4 4 2 x1
4 4 2 x2 = 0
2 2 1 x3
The solution to this system of equations can be shown to be x1 = – s – t, x2 = s, and x3 = 2t, where s and
t are scalars. Thus the eigenspace of 2 = 1 is the space of vectors of the form.
− s − t
s
2t
Separating the parameters s and t, we can write
− s − t − 1 − 1
s = s 1 + t 0
2t
0
2
Thus the eigenspace of = 1 is a two-dimensional subspace of R3 with basis
− 1 − 1
0
1 ,
0
0
If an eigenvalue occurs as a k times repeated root of the characteristic equation, we say that it is of
multiplicity k. Thus =10 has multiplicity 1, while =1 has multiplicity 2 in this example.
Linear Discriminant Analysis (LDA)
Data representation vs. Data Classification
Difference between PCA vs. LDA
• PCA finds the most accurate data representation in a lower
dimensional space.
• Projects the data in the directions of maximum variance.
• However the directions of maximum variance may be useless for
classification
• In such condition LDA which is also called as Fisher LDA works
well.
• LDA is similar to PCA but LDA in addition finds the axis that
maximizes the separation between multiple classes.
LDA Algorithm
• PCA is good for dimensionality reduction.
• However Figure shows how PCA fails to classify. (because it will try
to project this points which maximizes variance and minimizes the
error)