0% found this document useful (0 votes)

6 views38 pages

Principal Component Analysis: by Eesha Tur Razia Babar

The document provides an overview of Principal Component Analysis (PCA), a dimensionality reduction technique aimed at transforming high-dimensional datasets into lower dimensions while preserving information. It outlines the steps involved in PCA, including standardization, covariance matrix computation, and the calculation of eigenvectors and eigenvalues to identify principal components. Additionally, it discusses criteria for determining the number of components to retain, such as the eigenvalue criterion and scree plot criterion.

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views38 pages

Principal Component Analysis: by Eesha Tur Razia Babar

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

PRINCIPAL

COMPONENT ANALYSIS
BY EESHA TUR RAZIA BABAR
2 PCA

• Dimensionality reduction technique to help represent data in lower dimensions

• Helpful in visualizations and modeling
• Naturally comes at the expense of accuracy

• The goal is to transform high-dimensional datasets (having a large number of features) into a low-dimensional one
(having a smaller number of features) without losing too much information.
• Datasets can include images or simple structured datasets.

• Deal with the curse of dimensionality, which results in complex models and difficulty in visualizing.
• Helps us to remove the Multi-collinearity situation in which some input features are correlated with each other and
provide redundant information.
• PCA - reduce the number of features/variables of a data set, while preserving as much information as possible
3 HOW PCA WORKS
4 PCA STEPS

1. Standardize the range of continuous initial variables

2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes
5 STEP 1: STANDARDIZATION

• Standardize the range of the continuous initial variables

• Variables with larger ranges will dominate over those with small ranges
• Could lead to biased results

• After normalization, all the variables will be transformed to the same scale.
6 STEP 2: COVARIANCE MATRIX COMPUTATION

• The goal of this step is to understand that if there is any relationship between input
variables
• Correlated variables = redundant information

• We’ll use covariance matrix to identify these correlations

• p × p symmetric matrix (where p is the number of dimensions)
• Covariance matrix is a 3×3 matrix of this from
7 CONT.

• Covariance of a variable with itself is its variance (Cov(a,a)=Var(a))

• Thus main diagonal (Top left to bottom right) we actually have the variances of each initial
variable

• Covariance is commutative (Cov(a,b)=Cov(b,a)),

• Means that the upper and the lower triangular portions are equal.
8 CONT.

• Sign of the covariance matters:

• If positive then: the two variables increase or decrease together (correlated)
• If negative then: one increases when the other decreases (Inversely correlated)
9 STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE
COVARIANCE MATRIX TO IDENTIFY THE PRINCIPAL COMPONENTS

• Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the
covariance matrix
• Eigen-Vectors are also called Principal Components in the context of the topic.
• Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.
• Combinations are done in such a way that the new variables (i.e., principal components) are
uncorrelated
• Most of the information within the initial variables is squeezed or compressed into the first
components.
10 SCREE PLOT
11 NOTE THAT

• Principal components are less interpretable and don’t have any real meaning since they
are constructed as linear combinations of the initial variables.
• Geometrically, principal components represent the directions of the data that explain
a maximal amount of variance.
• In simple terms, lines that capture most information of the data
12 HOW PCA CONSTRUCTS THE PRINCIPAL
COMPONENTS
• There are as many principal components as there are variables in the data
• First principal component accounts for the largest possible variance, and so on.
13 EIGENVECTORS AND EIGENVALUES

• Comes in pairs, so that every eigenvector has an eigenvalue.

• Their number is equal to the number of dimensions of the data
• For example, for a 3-dimensional data set, there are 3 variables, therefore there are 3
eigenvectors with 3 corresponding eigenvalues.
14 CONT.

• Eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most
information) and that we call Principal Components.

• Eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each
Principal Component.

• We construct a total of n principal components where n is the total number of dimensions of the dataset.

• By sorting the Principal Components in order of their EigenValues, highest to lowest, you get the principal
components in order of significance.

• Eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most
information) and that we call Principal Components.

• Eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each
Principal Component.

• We construct a total of n principal components where n is the total number of dimensions of the dataset.

• By sorting the Principal Components in order of their EigenValues, highest to lowest, you get the principal
components in order of significance.

• In the last step, we decide the number of Principal Components to keep and choose the ones of greater significance
and construct a matrix of vectors that we call Feature Vectors. These are our features in the new lower
dimensions.
16 EXAMPLE 1: STEP 1: FIND MEAN
17 EXAMPLE 1: STEP 2: FIND COVARIANCE
18 EXAMPLE 1: STEP 3: FIND EIGEN VALUE
19 EXAMPLE 1: STEP 4: FIND EIGEN VALUE
20 EXAMPLE 1: STEP 4: FIND EIGEN VALUE
21 EXAMPLE 1: STEP 4: FIND EIGEN VECTOR
22 EXAMPLE 1: STEP 4: FIND PRINCIPLE COMPONENT
DATA AND SCATTER PLOT
SUBTRACTING MEAN
CONT.
27 COVARIANCE MATRIX
28 EIGEN VALUES AND EIGEN VECTORS

•
29 EIGEN VECTOR: UNIT LENGTH VECTOR
30 SORT BY EIGEN VALUES

•
31 PRINCIPLE COMPONENTS CALCULATION
32 CONT.
33 INTERPRET THE PRINCIPLE COMPONENTS
34 HOW MANY COMPONENTS SHOULD WE
EXTRACT?
• The motivations for PCA was to reduce the number of features.
• The question arises, “How do we determine how many components to extract?”
• For example, should we retain only the first principal component, as it explains nearly
half the variability? Or, should we retain all eight components, as they explain 100% of
the variability?
• Retaining all eight components does not help us to reduce the number of dimension
• The answer lies somewhere between these two extremes.
35 HOW MANY COMPONENTS SHOULD WE
EXTRACT?
• The criteria used for deciding how many components to extract are the following:
1. The Eigenvalue Criterion
2. The Proportion of Variance Explained Criterion
3. The Scree Plot Criterion
4. The Minimum Communality Criterion
36 1. THE EIGENVALUE CRITERION

• An eigenvalue of 1 would mean that the component would explain about “one variable's worth” of the
variability.
• The rationale for using the eigenvalue criterion is that each component should explain at least one
variable's worth of the variability, and therefore, the eigenvalue criterion states that only components with
eigenvalues greater than 1 should be retained.
• Note that, if there are fewer than 20 variables, the eigenvalue criterion tends to recommend extracting
too few components, while, if there are more than 50 variables, this criterion may recommend extracting
too many.
37 2. THE PROPORTION OF VARIANCE EXPLAINED
CRITERION
• Specified by the analytst that how much
of the total varability would like the
principal components to account for
• Selects the components one by one until
the desired proportion of variability
explained is attained.
• For example, suppose we would like our
components to explain 85% of the
variability in the variables.
38 3. THE SCREE PLOT CRITERION
• A scree plot is a graphical plot of the eigenvalues against the component
number.

• Scree plots are useful for finding an upper bound (maximum) for the number of
components that should be retained.

• Most scree plots look broadly similar in shape, starting high on the left, falling
rather quickly, and then flattening out at some point.

• This is because the first component usually explains much of the variability, the
next few components explain a moderate amount, and the latter components
only explain a small amount of the variability.

• The scree plot criterion is this: The maximum number of components that
should be extracted is just before where the plot first begins to straighten out
into a horizontal line.

• Sometimes, the curve in a scree plot is so gradual that no such elbow point is
evident; in that case, turn to the other criteria.

Kawasaki z250 Owners Manual
100% (1)
Kawasaki z250 Owners Manual
142 pages
Enbridge Gas Form
No ratings yet
Enbridge Gas Form
1 page
WsCube Tech Online MERN Stack Course
No ratings yet
WsCube Tech Online MERN Stack Course
24 pages
FX Dryer Part List
100% (1)
FX Dryer Part List
22 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
DR Pca
No ratings yet
DR Pca
22 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Pca
No ratings yet
Pca
18 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
Program 3
No ratings yet
Program 3
7 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Chapter6 MV
No ratings yet
Chapter6 MV
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Princomps George Dallas
No ratings yet
Princomps George Dallas
9 pages
Principal Component Analysis 4 Dummies
100% (1)
Principal Component Analysis 4 Dummies
8 pages
Pac
No ratings yet
Pac
70 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
09 Pca
No ratings yet
09 Pca
22 pages
Pca 1
No ratings yet
Pca 1
3 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
ACPusing R
No ratings yet
ACPusing R
25 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
Lgep 2: SKF High Load, Extreme Pressure Bearing Grease
No ratings yet
Lgep 2: SKF High Load, Extreme Pressure Bearing Grease
2 pages
FSM 4 Basic Baking Report
No ratings yet
FSM 4 Basic Baking Report
12 pages
Calamba Doctors' College: Entrepreneurship Subject Activity Sheet
No ratings yet
Calamba Doctors' College: Entrepreneurship Subject Activity Sheet
1 page
Annual Report-2014 PDF
No ratings yet
Annual Report-2014 PDF
108 pages
Determination of Crushing Value of Coarse Aggregates and 10% Fine Value Test
No ratings yet
Determination of Crushing Value of Coarse Aggregates and 10% Fine Value Test
6 pages
HMEdedelingen 1206
No ratings yet
HMEdedelingen 1206
32 pages
NCO Tutorial
100% (1)
NCO Tutorial
3 pages
Optimization of Transportation Costs in Supply Cha PDF
No ratings yet
Optimization of Transportation Costs in Supply Cha PDF
83 pages
01-Historical Perspectives
No ratings yet
01-Historical Perspectives
22 pages
Bahasa Inggris
No ratings yet
Bahasa Inggris
4 pages
138 Modeling Stochastic Wind - Loads - On Vertical - Axis Wind Turbines VEERS SANDIA
No ratings yet
138 Modeling Stochastic Wind - Loads - On Vertical - Axis Wind Turbines VEERS SANDIA
20 pages
Kolkata City Accident Report - 2018
No ratings yet
Kolkata City Accident Report - 2018
48 pages
50 Safety Director Interview Questions and Answers 1734275478
No ratings yet
50 Safety Director Interview Questions and Answers 1734275478
5 pages
Autocad Multtiple Choice Questions
No ratings yet
Autocad Multtiple Choice Questions
10 pages
Background of The Study
No ratings yet
Background of The Study
2 pages
Introduction To Environmental Engineering
No ratings yet
Introduction To Environmental Engineering
17 pages
LM4040, LM4041 Precision Micro-Power Shunt Voltage References
No ratings yet
LM4040, LM4041 Precision Micro-Power Shunt Voltage References
14 pages
Soal Tes TOEFL Dan Pembahasan Jawaban Written Expression (Complete Test 2 by Longman) - Pusat TOEFL
No ratings yet
Soal Tes TOEFL Dan Pembahasan Jawaban Written Expression (Complete Test 2 by Longman) - Pusat TOEFL
1 page
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
No ratings yet
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
1 page
Adhoc Faculty Application Form
No ratings yet
Adhoc Faculty Application Form
3 pages
Identifying Ethical Issues in AI Partners in Human-AI Co-Creation
No ratings yet
Identifying Ethical Issues in AI Partners in Human-AI Co-Creation
6 pages
Eng201 Final Term Solved Paper Spring 2010
No ratings yet
Eng201 Final Term Solved Paper Spring 2010
17 pages
KEI HW List Price - 15th Feb 2025
No ratings yet
KEI HW List Price - 15th Feb 2025
1 page
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
Department Presentation From BMTU New
No ratings yet
Department Presentation From BMTU New
18 pages

Principal Component Analysis: by Eesha Tur Razia Babar

Uploaded by

Principal Component Analysis: by Eesha Tur Razia Babar

Uploaded by

PRINCIPAL

• Dimensionality reduction technique to help represent data in lower dimensions

1. Standardize the range of continuous initial variables

• Standardize the range of the continuous initial variables

• We’ll use covariance matrix to identify these correlations

• Covariance of a variable with itself is its variance (Cov(a,a)=Var(a))

• Covariance is commutative (Cov(a,b)=Cov(b,a)),

• Sign of the covariance matters:

• Comes in pairs, so that every eigenvector has an eigenvalue.

You might also like