0% found this document useful (0 votes)

19 views28 pages

Unit 3

Uploaded by

Sunil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views28 pages

Unit 3

Uploaded by

Sunil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Unit-3

(Dimensionality reduction): Introduction to Dimensionality

reduction, Data set representation, Matrix representation of data set,
Covariance of the Data Matrix. Principal component analysis
(PCA)): Introduction to PCA, Geometric intuition, PCA for
dimensionality reduction, PCA Limitations.
Introduction to Dimensionality reduction

• Dimensionality reduction is a technique used in machine learning and data analysis to

reduce the number of features or variables under consideration.
• The aim is to simplify the dataset while retaining as much relevant information as
possible.
• This is particularly useful when dealing with high-dimensional data, where the number of
features is large compared to the number of samples.

There are various methods for dimensionality reduction, including:

1.Feature selection: Selecting a subset of the original features based on specific criteria such
as relevance, importance, or correlation.
2.Feature extraction: Transforming the original features into a lower-dimensional space
using techniques like principal component analysis (PCA), linear discriminant analysis
(LDA), or t-distributed stochastic neighbor embedding (t-SNE). These methods aim to
preserve the most important information while reducing the dimensionality.
Data Representation
There are two components of • Matrices serve as a powerful tool for
dimensionality reduction: representing datasets.
•Feature selection: In this, we try to • Consider a dataset containing
find a subset of the original set of information about various individuals,
variables, or features, to get a smaller such as age, income, and education
level.
subset which can be used to model the
• By organizing this data into a matrix,
problem. It usually involves three ways: where each row corresponds to an
• Filter individual and each column represents a
• Wrapper different attribute, we create a structured
• Embedded representation that facilitates analysis.
•Feature extraction: This reduces the • This tabular arrangement simplifies
data in a high dimensional space to a operations like finding averages, and
lower dimension space, i.e. a space with correlations, and performing statistical
analyses.
lesser no. of dimensions.
Covariance of the Data Matrix
The variance-covariance matrix is a square matrix with
diagonal elements that represent the variance and the
non-diagonal components that express covariance. The
covariance of a variable can take any real value- positive,
negative, or zero. A positive covariance suggests that the
two variables have a positive relationship, whereas a
negative covariance indicates that they do not. If two
elements do not vary together, they have a zero
covariance.
Principal Component Analysis(PCA)
Principal Component Analysis(PCA) technique was introduced by the mathematician Karl
Pearson in 1901. It works on the condition that while the data in a higher dimensional space is
mapped to data in a lower dimension space, the variance of the data in the lower dimensional
space should be maximum.
•Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal
transformation that converts a set of correlated variables to a set of uncorrelated variables.PCA is
the most widely used tool in exploratory data analysis and in machine learning for predictive
models. Moreover,
•Principal Component Analysis (PCA) is an unsupervised learning algorithm technique used to
examine the interrelations among a set of variables. It is also known as a general factor analysis
where regression determines a line of best fit.
•The main goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a
dataset while preserving the most important patterns or relationships between the variables
without any prior knowledge of the target variables.
The first principal component captures the most variation in the data, but the second
principal component captures the maximum variance that is orthogonal to the first principal
component, and so on.
Step-By-Step Explanation of PCA
Step 1: Standardization
First, we need to standardize our dataset to ensure that each variable has a mean of 0 and a
standard deviation of 1.

Step2: Covariance Matrix Computation

Covariance measures the strength of joint variability between two or more variables, indicating
how much they change in relation to each other. To find the covariance we can use the formula:

The value of covariance can be positive, negative, or zeros.

•Positive: As the x1 increases x2 also increases.
•Negative: As the x1 increases x2 also decreases.
•Zeros: No direct relation
Step 3: Compute Eigenvalues and Eigenvectors of Covariance Matrix to Identify Principal
Components

Let A be a square n x n matrix and X be a non-zero vector for which

AX==λX
for some scalar values λ. then λ is known as the eigenvalue of matrix A and X is known as
the eigenvector of matrix A for the corresponding eigenvalue.
It can also be written as :
AX−λX=0
(A−λI)X=0
where I am the identity matrix of the same shape as matrix A. And the above conditions will be
true only if (A–λI) will be non-invertible (i.e. singular matrix). That means,
∣A–λI∣=0
From the above equation, we can find the eigenvalues \lambda, and therefore corresponding
eigenvector can be found using the equation AX=λX.
Consider the following dataset
x1 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1

x2 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9

Step 1: Standardize the Dataset

Mean for x1 x1= 1.81 = x1meanx1mean
Mean for x2 x2= 1.91 = x2meanx2mean
We will change the dataset.

x1new
=x1– 0.69 -1.31 0.39 0.09 1.29 0.49 0.19 -0.81 -0.31 -0.71
x1mean

x2new=
x2–
x2meanx 0.49 -1.21 0.99 0.29 1.09 0.79 -0.31 -0.81 -0.31 -1.01
2new
=x2–
x2mean
Step 3: Arrange Eigenvalues
The eigenvector with the highest eigenvalue is the Principal Component of the dataset. So in
this case, eigenvectors of lambda1 are the principal components.
{Basically in order to complete the numerical we have to only solve till this step, but if we
have to prove why we have chosen that particular eigenvector we have to follow the steps from
4 to 6}
Q. Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).

Step-01
The given feature vectors are-

x1 = (2, 1)
x2 = (3, 5)
x3 = (4, 3)
x4 = (5, 6)
x5 = (6, 7)
x6 = (7, 8)
Step-02:
Mean vector (µ)
Calculate the mean vector (µ). = ((2 + 3 + 4 + 5 + 6 + 7) / 6, (1 + 5 + 3 + 6 + 7 + 8) / 6)
= (4.5, 5)
Step-04:
Calculate the covariance matrix.
Covariance matrix is given by-

Step-03:
Subtract mean vector (µ) from the given feature vectors.
x1 – µ = (2 – 4.5, 1 – 5) = (-2.5, -4)
x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0)
x3 – µ = (4 – 4.5, 3 – 5) = (-0.5, -2)
x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1)
x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2)
x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3

Feature vectors (xi) after subtracting mean vector (µ) are-

Covariance matrix = (m1 + m2 + m3 + m4 +
m5 + m6) / 6

On adding the above matrices and dividing

by 6, we get-
Step-05:

Calculate the eigen values and eigen vectors of the covariance matrix.
λ is an eigen value for a matrix M if it is a solution of the characteristic equation |M – λI| = 0.

From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67) = 0
16.56 – 2.92λ – 5.67λ + λ2 – 13.47 = 0
λ2 – 8.59λ + 3.09 = 0

Solving this quadratic equation, we get λ = 8.22, 0.38

Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38.

Clearly, the second eigen value is very small compared

to the first eigen value.
So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest Eigen value is the principal
component for the given data set.
So. we find the Eigen vector corresponding to Eigen value λ1.

We use the following equation to find the Eigen vector-

MX = λX
where-
•M = Covariance Matrix
•X = Eigen vector
•λ = Eigen value

Substituting the values in the above equation, we get-

Solving these, we get- Thus, principal component for the given data set is-
2.92X1 + 3.67X2 = 8.22X1
3.67X1 + 5.67X2 = 8.22X2

On simplification, we get-
5.3X1 = 3.67X2 ………(1)
3.67X1 = 2.55X2 ………(2)

From (1) and (2), X1 = 0.69X2

From (2), the Eigen vector is-
PCA Limitations

Principal Component Analysis (PCA) is a dimensionality reduction technique in machine

learning, but it has some limitations:
•Assumptions: PCA assumes that features are correlated, have a linear relationship, and
are orthogonal to each other.
•Sensitivity to outliers: PCA is biased by outliers and can be affected by them, which can
distort the principal components and affect the accuracy of the results.
•Missing data: PCA often assumes that the feature set has no missing values.
•Scale of features: PCA is sensitive to the scale of the features.
•Interpretability: It can be difficult to interpret the results of PCA.
•Information loss: PCA always leads to some loss of information when reducing
dimensions.
•Categorical features: PCA is only suitable for continuous, non-discrete data.
•Non-linear relationships: PCA is not well suited to capturing non-linear relationships.

Mastering Math M2A
73% (11)
Mastering Math M2A
306 pages
Calculus For Business and Economics An Example Based Introduction
No ratings yet
Calculus For Business and Economics An Example Based Introduction
344 pages
Introduction To Number Theory
No ratings yet
Introduction To Number Theory
95 pages
Mid-Term Syllabus Class 9
No ratings yet
Mid-Term Syllabus Class 9
2 pages
MYP Mathematics - Subject Group Overview SGO 2021-2022 (Updated)
No ratings yet
MYP Mathematics - Subject Group Overview SGO 2021-2022 (Updated)
27 pages
3111 Group Theory w3 p3 Pa
No ratings yet
3111 Group Theory w3 p3 Pa
1 page
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
PCA Steps - Numerical Problem
No ratings yet
PCA Steps - Numerical Problem
8 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
CIS - LP - Simultaneous - Linear - and - Quadratic - Equations 2 - WK10 - Y11 - V2
No ratings yet
CIS - LP - Simultaneous - Linear - and - Quadratic - Equations 2 - WK10 - Y11 - V2
8 pages
Grade 5 Adding Mixed Numbers Fractions Like Denominators
No ratings yet
Grade 5 Adding Mixed Numbers Fractions Like Denominators
2 pages
3D Stress MATLAB Examples
No ratings yet
3D Stress MATLAB Examples
19 pages
Linear System and Control - Graduate - LQR - Lecture
No ratings yet
Linear System and Control - Graduate - LQR - Lecture
8 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Sample Assessment Materials Model Answers - Core Pure For As Further Mathematics
No ratings yet
Sample Assessment Materials Model Answers - Core Pure For As Further Mathematics
27 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Pca 1
No ratings yet
Pca 1
3 pages
Exp 15
No ratings yet
Exp 15
12 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Prelim 2ND Sem
No ratings yet
Prelim 2ND Sem
16 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
Eigensystems by Naresh
No ratings yet
Eigensystems by Naresh
35 pages
Pca
No ratings yet
Pca
16 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Math Grade 9 Q1 Module 2.solving Quadratic Equations 2
100% (2)
Math Grade 9 Q1 Module 2.solving Quadratic Equations 2
12 pages
Order Exponents 3 Operators P6
No ratings yet
Order Exponents 3 Operators P6
2 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
Hermitian Matrixes
No ratings yet
Hermitian Matrixes
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Exercise A2 Multiple-Degree-Of-Freedom System: Authors: Submitted (Date) : 2019-08-30 Approved by (Name/date)
No ratings yet
Exercise A2 Multiple-Degree-Of-Freedom System: Authors: Submitted (Date) : 2019-08-30 Approved by (Name/date)
5 pages
09 Pca
No ratings yet
09 Pca
22 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Campus Apti 1
No ratings yet
Campus Apti 1
3 pages
Math 9 1st PT - Edited
No ratings yet
Math 9 1st PT - Edited
2 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Mathematics Magazine Volume 48 Issue 5 1975 (Doi 10.2307/2690063) Ian Richards - Impossibility PDF
No ratings yet
Mathematics Magazine Volume 48 Issue 5 1975 (Doi 10.2307/2690063) Ian Richards - Impossibility PDF
15 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
System of Linear Equations: 3 X y +2 Z 10 4 X 6 y +2 Z 8 2 X 3 Y+ Z 4
No ratings yet
System of Linear Equations: 3 X y +2 Z 10 4 X 6 y +2 Z 8 2 X 3 Y+ Z 4
3 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
神经网络中涉及的向量和矩阵求导
100% (1)
神经网络中涉及的向量和矩阵求导
18 pages
Generalized Jordan triple (σ, τ) -higher derivations in rings
No ratings yet
Generalized Jordan triple (σ, τ) -higher derivations in rings
18 pages
Fem1d F PDF
No ratings yet
Fem1d F PDF
119 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Mathematical Approach To PCA
No ratings yet
Mathematical Approach To PCA
8 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
VI Maths Assignment For FA3
No ratings yet
VI Maths Assignment For FA3
6 pages
Lecture Note 1
No ratings yet
Lecture Note 1
22 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Chapter 4 Sensitivity Analysis 2 (Compatibility Mode)
100% (1)
Chapter 4 Sensitivity Analysis 2 (Compatibility Mode)
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Pca
No ratings yet
Pca
18 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
A Consistent Co-Rotational Formulation For Non-Linear, Three-Dimensional, Beam-Elements
No ratings yet
A Consistent Co-Rotational Formulation For Non-Linear, Three-Dimensional, Beam-Elements
20 pages
Week 11: Complex Numbers and Vectors Complex Numbers: Powers of J
No ratings yet
Week 11: Complex Numbers and Vectors Complex Numbers: Powers of J
8 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Summer Packet For BC Calc
No ratings yet
Summer Packet For BC Calc
8 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
DR Pca
No ratings yet
DR Pca
22 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Pac
No ratings yet
Pac
70 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
PCA
100% (1)
PCA
33 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit-3

(Dimensionality reduction): Introduction to Dimensionality

• Dimensionality reduction is a technique used in machine learning and data analysis to

There are various methods for dimensionality reduction, including:

Step2: Covariance Matrix Computation

The value of covariance can be positive, negative, or zeros.

Let A be a square n x n matrix and X be a non-zero vector for which

Step 1: Standardize the Dataset

Feature vectors (xi) after subtracting mean vector (µ) are-

On adding the above matrices and dividing

Solving this quadratic equation, we get λ = 8.22, 0.38

Clearly, the second eigen value is very small compared

We use the following equation to find the Eigen vector-

Substituting the values in the above equation, we get-

From (1) and (2), X1 = 0.69X2

Principal Component Analysis (PCA) is a dimensionality reduction technique in machine

You might also like