0% found this document useful (0 votes)

55 views16 pages

Principal Component Analysis Concepts: T56Gzsrvah

Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of data while retaining as much information as possible. It works by transforming the data to a new coordinate system where the greatest variance by any projection of the data lies on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA is useful for visualizing high-dimensional data or reducing the number of variables in the data without much loss of information.

Uploaded by

StocknEarn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views16 pages

Principal Component Analysis Concepts: T56Gzsrvah

Uploaded by

StocknEarn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Principal Component Analysis Concepts

[email protected]
T56GZSRVAH

Proprietary content.
This file©Great Learning.
is meant All Rights
for personal useReserved. Unauthorized use or distribution
by [email protected] only. prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis

1. Main idea: seek most accurate data representation in a lower dimensional space

1. Example in 2-D, project data to 1-D subspace (a line) with minimal projection error

[email protected]
T56GZSRVAH

1. In both the pictures above, the data points (black dots) are projected to one line but the
second line is closer to the actual points (less projection errors) than first one

1. Notice that the good line to use for projection lies in the direction of largest variance

Ref: http://www.cs.haifa.ac.il/~rita/uml_course/add_mat/PCA.pdf

5. After the data is projected on the best line, need to transform the coordinate system to
get 1D representation for vector y

5. Note that new data y has the same variance as old data x in the direction of the green
[email protected]
T56GZSRVAH line

5. PCA preserves largest variances in the data

Ref: http://www.cs.haifa.ac.il/~rita/uml_course/add_mat/PCA.pdf
Proprietary content.
This file©Great Learning.
is meant All Rights
for personal useReserved. Unauthorized use or distribution
by [email protected] only. prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis

8. In general PCA on n dimensions will result in another set of new n dimensions. The
one which captures maximum variance in the underlying data is the principal
component 1, principal component 2 is orthogonal to it

8. Example in 2-D, project data to 1-D subspace (a line) with minimal projection error

[email protected]
T56GZSRVAH

Ref: http://www.cs.haifa.ac.il/~rita/uml_course/add_mat/PCA.pdf

http://setosa.io/ev/principal-component-analysis/

1. Begins by standardizing the data. Data on all the dimensions are subtracted from their
means to shift the data points to the origin. i.e. the data is centered on the origins

1. Generate the covariance matrix / correlation matrix for all the dimensions

1. Perform eigen decomposition, that is, compute eigen vectors which are the principal
components and the corresponding eigen values which are the magnitudes of variance
captured
[email protected]
T56GZSRVAH

1. Sort the eigen pairs in descending order of eigen values and select he one with the
largest value. This is the first principal component that covers the maximum
information from the original data

Ref: http://www.cs.haifa.ac.il/~rita/uml_course/add_mat/PCA.pdf

1. PCA effectiveness depends upon the scales of the attributes. If attributes have
different scales, PCA will pick variable with highest variance rather than picking up
attributes based on correlation

1. Changing scales of the variables can change the PCA

1. Interpreting PCA can become challenging due to presence of discrete data

[email protected]
T56GZSRVAH 1. Presence of skew in data with long thick tail can impact the effectiveness of the PCA
(related to point 1)

1. PCA assumes linear relationship between attributes. It is ineffective when relationships

are non linear

Description – Explore the iris data set and perform PCA

The data set is winequality-red.csv

[email protected]
T56GZSRVAH

Sol: PCA-iris.ipynb
Proprietary content.
This file©Great Learning.
is meant All Rights
for personal useReserved. Unauthorized use or distribution
by [email protected] only. prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (Signal to noise ratio)
Signal – all valid values for a variable
(show between max and min values for
Y max
x axis and y axis). Represents a valid
data

Noise – The spread of data points

across the best fit line. For a given
value of x, there are multiple values of
[email protected]
T56GZSRVAH Y min y (some on line and some around the
X min X max line). This spread is due to random
factors

Signal to Noise Ratio – Variance of

Signal
signal / variance in noise.
X_std_df = pd.DataFrame(X_std)
axes = pd.plotting.scatter_matrix(X_std_df)
plt.tight_layout() Greater the SNR the better the model
will be

1. Variance is measured within the

dimensions and co-variance is
among the dimensions

1. Express total variance (variance and

cross variance between dimensions
[email protected]
T56GZSRVAH

as a matrix (variance matrix)

1. Covariance matrix is a mathematical

representation of the total variance
of individual dimension and across
dimensions .
Covariance matrix for three dimensions x,y and z

eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

2nd Principal Component

1. The mean is subtracted from all the
points on both dimensions i.e. (xi –
xbar) and (yi – ybar)

1. The dimensions are transformed using

algebra into new set of dimensions
[email protected]
T56GZSRVAH

1. The transformation is a rotation of axes

in mathematical space

1 st Principal Component

X_std = StandardScaler().fit_transform(X)
eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

4. Multiplying the two matrices produces a matrix of total variance also called
covariance matrix (a square and symmetric matrix).

[email protected]
T56GZSRVAH

5. The original data points are now

represented by the red dots on
new dimensions

5. It also introduces error of

representation (vertical red lines Noise
[email protected]
T56GZSRVAH
from the blue dots to
corresponding red dots on the
X min X max
new dimension)

Signal
5. The axis rotation is done such that
the new dimension captures max
variance in the data points and print('Eigen Vectors \n%s', eig_vecs)
also reduces total error of print('\n Eigen Values \n%s', eig_vals)
representation

8. Thus to find principal components we need to get the diagonal matrix

from the original covariance matrix

[email protected]
T56GZSRVAH

8. For this we have to transform the matrix A to a new matrix B such that the
covariance matrix of B ( ), is a diagonal matrix (Ref to part 2, bullet
5)

1. PCA can also be used to reduce dimensions

1. Arrange all eigen vectors along with corresponding eigen values in

descending order of eigen values

1. Plot a cumulative eigen_value graph as shown below

[email protected]
T56GZSRVAH
1. Eigen vectors with insignificant contribution to total eigen values can be
removed from analysis (for e.g. eigen vector 6 and 7 below)

Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
100% (8)
Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
64 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Elementary Topology
100% (2)
Elementary Topology
229 pages
Pandas Guide
No ratings yet
Pandas Guide
3,071 pages
Math LF
No ratings yet
Math LF
169 pages
Daoist Mineral, Plant and Animal Magic: The Secret Teaching of Esoteric Daoist Magic Jerry Alan Johnson instant download
No ratings yet
Daoist Mineral, Plant and Animal Magic: The Secret Teaching of Esoteric Daoist Magic Jerry Alan Johnson instant download
67 pages
Activity 4 - Geometric Sequence
No ratings yet
Activity 4 - Geometric Sequence
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Caratheodory, Calculus of Variations and Partial Differential Equations PDF
No ratings yet
Caratheodory, Calculus of Variations and Partial Differential Equations PDF
412 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
3rd Quarter Gr.9
No ratings yet
3rd Quarter Gr.9
52 pages
DSML Brochure Scaler
No ratings yet
DSML Brochure Scaler
29 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Quadratic Equations - DPP 02 - Arjuna JEE 2024
No ratings yet
Quadratic Equations - DPP 02 - Arjuna JEE 2024
2 pages
EXERCISE Coordinate
No ratings yet
EXERCISE Coordinate
2 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Riemann and Physics
100% (1)
Riemann and Physics
38 pages
Wk05 EE379 CS PDF
No ratings yet
Wk05 EE379 CS PDF
40 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
No ratings yet
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
16 pages
Declaration Cum Affidavit
No ratings yet
Declaration Cum Affidavit
1 page
Non Verbal Reasoning Answers
No ratings yet
Non Verbal Reasoning Answers
7 pages
On An Irreducibility Theorem of A Cohn
No ratings yet
On An Irreducibility Theorem of A Cohn
5 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
6 Ee 2b Vocab Cards
No ratings yet
6 Ee 2b Vocab Cards
7 pages
Coordinate Geometry 2
No ratings yet
Coordinate Geometry 2
39 pages
Time Series 1
No ratings yet
Time Series 1
23 pages
2020 AMC 10A Solution
No ratings yet
2020 AMC 10A Solution
51 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Instrumented Principal Component Analysis
No ratings yet
Instrumented Principal Component Analysis
71 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
09 Pca
No ratings yet
09 Pca
22 pages
MATHEMATICS
No ratings yet
MATHEMATICS
30 pages
N Umpy Notebook
No ratings yet
N Umpy Notebook
17 pages
Engr. Rizaldo B. Fuentes
No ratings yet
Engr. Rizaldo B. Fuentes
36 pages
Demand Forecasting
No ratings yet
Demand Forecasting
31 pages
Pca
No ratings yet
Pca
18 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
Chapter 3 Binary Number Arithmetic and Number Representation
No ratings yet
Chapter 3 Binary Number Arithmetic and Number Representation
23 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Data Structures & Algorithms: Lecture 7: Stack & Recursion
No ratings yet
Data Structures & Algorithms: Lecture 7: Stack & Recursion
36 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Pandas Notebook
No ratings yet
Pandas Notebook
24 pages
Program 3
No ratings yet
Program 3
7 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
(1st Ed.) Tenko Raykov, - George A. Marcoulides - Introduction To Psychometric Theory-Routledge-115-136
No ratings yet
(1st Ed.) Tenko Raykov, - George A. Marcoulides - Introduction To Psychometric Theory-Routledge-115-136
22 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Lesson Plan Template MAED 3224: Ccss - Math.Content.2.Nbt.B.7
No ratings yet
Lesson Plan Template MAED 3224: Ccss - Math.Content.2.Nbt.B.7
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Probabilistic Method: Aditya Ghosh 26 February, 2021
No ratings yet
Probabilistic Method: Aditya Ghosh 26 February, 2021
15 pages
Senior Faculty, Sri Chaitanya Educational Institutions, Hyderabad
No ratings yet
Senior Faculty, Sri Chaitanya Educational Institutions, Hyderabad
10 pages
Analysis and Optimal Control of Pulse Width Modulation PDF
No ratings yet
Analysis and Optimal Control of Pulse Width Modulation PDF
10 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
A Case Study With Conditional Probability - Kaggle
No ratings yet
A Case Study With Conditional Probability - Kaggle
11 pages
wp-contentuploads202212PROMYS India 2023 Application
No ratings yet
wp-contentuploads202212PROMYS India 2023 Application
4 pages
Sanjay Singh Principal Component Analysis
No ratings yet
Sanjay Singh Principal Component Analysis
9 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Amrapali Final LISTtoi
No ratings yet
Amrapali Final LISTtoi
4 pages
Updated+delivery+schedule+-+PGPDSBA O APR23 A
No ratings yet
Updated+delivery+schedule+-+PGPDSBA O APR23 A
3 pages
Legal Notice DPS Dwarka
No ratings yet
Legal Notice DPS Dwarka
3 pages
Svis
No ratings yet
Svis
2 pages
NZM Ers
No ratings yet
NZM Ers
1 page
Amc Warm-Up Paper Senior Paper 7 Solutions: 2009 Australian Mathematics Trust 3 X 3
No ratings yet
Amc Warm-Up Paper Senior Paper 7 Solutions: 2009 Australian Mathematics Trust 3 X 3
4 pages

Principal Component Analysis Concepts: T56Gzsrvah

Uploaded by

Principal Component Analysis Concepts: T56Gzsrvah

Uploaded by

Principal Component Analysis Concepts

5. PCA preserves largest variances in the data

1. Changing scales of the variables can change the PCA

1. Interpreting PCA can become challenging due to presence of discrete data

1. PCA assumes linear relationship between attributes. It is ineffective when relationships

Description – Explore the iris data set and perform PCA

The data set is winequality-red.csv

Noise – The spread of data points

Signal to Noise Ratio – Variance of

1. Variance is measured within the

1. Express total variance (variance and

as a matrix (variance matrix)

1. Covariance matrix is a mathematical

eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

2nd Principal Component

1. The dimensions are transformed using

1. The transformation is a rotation of axes

5. The original data points are now

5. It also introduces error of

8. Thus to find principal components we need to get the diagonal matrix

1. PCA can also be used to reduce dimensions

1. Arrange all eigen vectors along with corresponding eigen values in

1. Plot a cumulative eigen_value graph as shown below

You might also like