0% found this document useful (0 votes)

9 views6 pages

Principal Component Analysis

Uploaded by

SWETA KUMARI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Principal Component Analysis

Uploaded by

SWETA KUMARI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

What is PCA

Principal component analysis (PCA) is one of the most commonly used dimensionality reduction techniques in the industry. Converting large data sets
into smaller ones containing fewer variables helps improve model performance, visualise complex data sets, and in many more areas.

Common Interview Questions Applications of PCA:

1. What is the curse of dimensionality? • Dimensionality reduction
2. Deﬁne Principal component analysis (PCA)? • Data Visualisation and EDA
3. Can PCA be used in feature selection, if yes then how? • For creating uncorrelated features that can be input to a prediction model

4. How to select the ﬁrst principal component axis? • Finding latent themes in the data
• Noise Reduction
5. What does a Principal Component Analysis’s major
component represent? Why PCA?
6. What are the disadvantages of dimension reduction? • Feature Selection: Iteratively removing features takes time and leads to
7. Why do we standardize before using Principal information loss.
Component Analysis? • Data Visualisation: It is not possible to visualise more than two variables
at the same time using any 2-D plot. Therefore, ﬁnding relationships
8. What happens when the eigenvalues are nearly equal? between the observations in a data set with several variables through
9. What happens if the PCA components are not rotated? visualisation is tricky.

10. Can we implement PCA for Regression?

11. Can PCA be used on Large Datasets?
12. How is PCA used to detect anomalies?

PCA is fundamentally a dimensionality reduction technique, and it helps manipulate a data set to one with fewer variables. In simple terms, dimensionality
reduction is the exercise of dropping the unnecessary variables, i.e., the ones that add no useful information. Now, this is something that you must have done
in the previous modules.
In EDA, you dropped columns with many nulls or duplicate values, and so on. In linear and logistic regression, you dropped columns based on their p-values
and VIF scores in the feature elimination step. Similarly, what PCA does is that it converts the data by creating new features from old ones, where it becomes
easier to decide which features to consider and which not to
What is PCA

• Definition: PCA is a statistical procedure to convert observations of possibly correlated variables to ‘principal components’ such that:
• They are uncorrelated with each other
• They are linear combinations of the original variables.
• They help in capturing maximum information in the data set. Now, the aforementioned definition introduces some new terms, such as ‘linear combinations’
and ‘capturing maximum information’, for which you will need some knowledge of linear algebra concepts and other building blocks of PCA.
• Basis: The first fundamental building block of PCA is Basis. Essentially, ‘basis’ is a unit in which we express the vectors of a matrix. For example, we describe
the weight of an object in terms of the kilogram, gram and so on; to describe length, we use a meter, centimetre.

1. Eigenvalues: In PCA, eigenvalues represent the variance of the data along the principal components. Each eigenvalue corresponds to a principal
component and indicates the amount of variance explained by that component. Higher eigenvalues signify that the corresponding principal
component carries more information from the original data.
2. Eigenvectors: Eigenvectors are the directions or axes along which the data varies the most. They are associated with the eigenvalues and determine the
principal components. Each eigenvector points in a direction that maximizes the variance of the data when projected onto that direction. In PCA, the
eigenvectors are orthogonal to each other.
3. Principal Components: Principal components are the transformed variables that result from PCA. They are linear combinations of the original variables,
where each component is a weighted sum of the original variables. The ﬁrst principal component corresponds to the eigenvector with the highest
eigenvalue and explains the most variance in the data. Subsequent principal components explain the remaining variance in decreasing order.
4. Scree Plot: A scree plot is a graphical tool used in Principal Component Analysis (PCA) to visualize the eigenvalues of the principal components. It helps in
determining the number of principal components to retain for further analysis.

In the scree plot, the x-axis represents the number of principal components, and the y-axis represents the corresponding eigenvalues. Each eigenvalue is
represented by a point or a dot on the plot. The scree plot is typically sorted in descending order of eigenvalues. The first principal component (PC1) has
the highest eigenvalue, followed by PC2 with the second-highest eigenvalue, and so on. The scree plot helps in visualizing the "elbow" or "knee" point,
which indicates the point at which the eigenvalues start to level off. The eigenvalues before this point contribute significantly to the variance explained by
the principal components, while the eigenvalues after this point contribute less. The elbow point is considered as a cutoff for selecting the number of
principal components to retain.
By examining the scree plot, you can make an informed decision about how many principal components to retain based on the eigenvalues. Retaining a
sufficient number of principal components ensures that you capture a significant amount of variance in the data while reducing dimensionality.
What is PCA

Introduction:
Using the analogy of basis as a unit of representation, diﬀerent basis vectors can be used to represent the same observations, just like you can represent the
weight of a patient in kilograms or pounds.

Demonstration:
Suppose you make a list of places for your friend to visit on a Roadmap on 2-D cartesian, with 2 directions North and east where x-direction is East and
y-direction is North. Every single point on the map is represented by 2 dimensions, for example, Factory can be (10, 8) as 10 units East and 8 units North.
For successive movements, like from Hospital to Housing Society, it will be 3 unit North, 4 unit east.

North New direction

Pactory

Police Station
Housing Society
Hospital

Playground

School

Now, your friend sees all the points are on a single line and can be represented in this new direction without north or east like Hospital to Housing Society
is 5 units in the new direction. So, with this new direction, we have reduced the dimensions from 2 to 1 without losing information. This is PCA, better
representation was in a new direction, so by dimension reduction, we changed the basis.
What is PCA
Calculation: When you have one dimension, the calculations for the change of basis are pretty straightforward. All you need to do here is to multiply
the factor M which gives you the method of transforming from one basis to another.

CHANGE OF BASIS 2-DIMENSION

NEW BASIS OLD BASIS
REPRESENTION
(FT)
= MX REPRESENTATION
(FT)
BASIS:
{[ 1 ft
0 lbs ],[ 0 cm
1 lbs ]} BASIS:
{[ 1 cm
0 kg ],[ 0 cm
1 kg ]}
SAME AS:
{[ 30.48 cm
0 kg ], [ 0 cm
0.453kg ]}

M -1
X
NEW BASIS
REPRESENTION =
OLD BASIS
REPRESENTATION
[ 30.48 0
0 0.45 ] x [ ]= [ ]
5.4
121.3
165
55

(FT) (FT) 5.4 ft 5.4 x 30.48 cm 165 cm New basis Represention Mx Old basis Representation
121.3 lbs 121.3 x 0.45-55kg Mis a representation of old basis in new basis

CHANGE OF BASIS
M = B2 * B 1
[ ][ ]
-1
30.48 0.0 5.4
0.0 0.45 121.3

Height (ft) Weight (lbs) Height (ft) Weight (lbs)

And ﬁnally the transformation is given as
54 121.3 165 55
5.1 156.5 155 71
54 194.0 165 88 v2 = M * v 1
5.2 231.5 160 105
5.2 207.2 160 94

[ 0.0328 0.0
0.0 2.22 ][ ]
165
55
PCA

The ideal basis vectors required has the following properties:

● They explain the directions of maximum variance

● When used as the new set of basis vectors, the transformed dataset is now suitable for dimensionality reduction.
● These directions explaining the maximum variance are called the Principal Components of our data.

Implementation of PCA:
1. After basic data cleaning procedures, standardize your data
2. Once standardization has been done, you can go ahead and perform PCA on the dataset. For doing this you import the necessary libraries
from sklearn.decomposition.

from sklearn.decomposition import PCA

3. Instantiate the PCA function and set the random state to some speciﬁc number so that you get the same result every time you execute that code.

pca = PCA(random_state=42)
4. Perform PCA on the dataset by using the pca.fit function. pca.fit(x) The above function does both the steps: finding the covariance matrix and doing
an eigen decomposition of it to obtain the eigenvectors, which are nothing but the Principal Components of the original dataset.
5. The Principal Components can be accessed using the following code.

pca.components_
6. Variance is being explained by each Principal Component using the following code.
pca.explained_variance_ratio_
PCA : Practical Considerations

Important points to remember while using PCA:

● Most software packages use SVD to compute the principal components and assume that the data is scaled and centered, so it is important to do
standardization/normalization.
● PCA is a linear transformation method and works well in tandem with linear models such as linear regression, logistic regression etc., though it
can be used for computational eﬃciency with non-linear models as well.
● It should not be used forcefully to reduce dimensionality (when the features are not correlated).

Important shortcomings of PCA :

● PCA is limited to linearity, though we can use non-linear techniques such as t-SNE as well (you can read more about t-SNE in the optional reading
material below).
● PCA needs the components to be perpendicular, though in some cases, that may not be the best solution. The alternative technique is to use
Independent Components Analysis.
● PCA assumes that columns with low variance are not useful, which might not be true in prediction setups (especially classiﬁcation problems with a
high-class imbalance).

List of useful functions that use after importing the PCA function from sklearn libraries.
● pca.fit() - Perform PCA on the dataset.
● pca.components_ - Explains the principal components in the data
● pca.explained_variance_ratio_ - Explains the variance explained by each component
● pca.fit(n_components = k) - Perform PCA and choose only k components
● pca.fit_transform - Transform the data from original basis to PC basis.
● pca(var) - Here 'var' is a number between 0-1. Perform PCA on the dataset and choose the number of components automatically such that the
variance explained is (100*var) %.

Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Unit 3
No ratings yet
Unit 3
31 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Module 3
No ratings yet
Module 3
41 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Chapter-Wise NCERT - Exemplar - Disha Experts
No ratings yet
Chapter-Wise NCERT - Exemplar - Disha Experts
616 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
STAT502
No ratings yet
STAT502
13 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Program 3
No ratings yet
Program 3
7 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
DR Pca
No ratings yet
DR Pca
22 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
Pca
No ratings yet
Pca
18 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Grade 12 LM General Physics 1 Module4
No ratings yet
Grade 12 LM General Physics 1 Module4
16 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Basic Calculus: Implicit Differentiation
100% (1)
Basic Calculus: Implicit Differentiation
13 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Summary PCA by Atta Mohammad 26040
No ratings yet
Summary PCA by Atta Mohammad 26040
2 pages
Chapter 2 Differentiation
No ratings yet
Chapter 2 Differentiation
48 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Things To Remember - Principal Component Analysis
No ratings yet
Things To Remember - Principal Component Analysis
2 pages
PCA Tutorial: Instructor: Forbes Burkowski
No ratings yet
PCA Tutorial: Instructor: Forbes Burkowski
12 pages
B7 Term 2 Maths All Weeks
No ratings yet
B7 Term 2 Maths All Weeks
51 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
PCA
100% (1)
PCA
33 pages
Diophantus
No ratings yet
Diophantus
650 pages
gh3962 Sat Math Digitalworkbook
100% (1)
gh3962 Sat Math Digitalworkbook
44 pages
Chapter 3
No ratings yet
Chapter 3
58 pages
Dynamical Systems and Geometric Mechanics An Introduction by Jared Michael Maruskin
No ratings yet
Dynamical Systems and Geometric Mechanics An Introduction by Jared Michael Maruskin
352 pages
MATH9 Q1 M9 W9 Revised Final
No ratings yet
MATH9 Q1 M9 W9 Revised Final
17 pages
Non Linear Systems A
No ratings yet
Non Linear Systems A
40 pages
5.1 Mathematical Systems
No ratings yet
5.1 Mathematical Systems
44 pages
Discrete Systems
No ratings yet
Discrete Systems
12 pages
Pole Placement: Dr. Sadeq Al-Majidi
No ratings yet
Pole Placement: Dr. Sadeq Al-Majidi
19 pages
Check Stability of A: Appendix MATLAB Code
No ratings yet
Check Stability of A: Appendix MATLAB Code
6 pages
Advance Engineering Mathematics
No ratings yet
Advance Engineering Mathematics
62 pages
Skema Matematik Tambahan K1 Terengganu MPP2 SPM 2020
100% (2)
Skema Matematik Tambahan K1 Terengganu MPP2 SPM 2020
5 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
Math-6-DLP-Quarter-3-Week-4 Day 5
No ratings yet
Math-6-DLP-Quarter-3-Week-4 Day 5
4 pages
Lecture 2 Power Planning
No ratings yet
Lecture 2 Power Planning
35 pages
Assignment Classxiicbsemathematicsmockpaper1qp 20241203153326
No ratings yet
Assignment Classxiicbsemathematicsmockpaper1qp 20241203153326
7 pages
BSC (Maths) V-Sem
No ratings yet
BSC (Maths) V-Sem
10 pages
1 Straight
No ratings yet
1 Straight
22 pages
ACM-ICPC Programming Problems
No ratings yet
ACM-ICPC Programming Problems
7 pages
Maths SP-20
No ratings yet
Maths SP-20
13 pages
Par br5 U7
No ratings yet
Par br5 U7
2 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
Mcr3u Course Outline
No ratings yet
Mcr3u Course Outline
3 pages
Solutions For Math Circle Event - Infinity
No ratings yet
Solutions For Math Circle Event - Infinity
2 pages
Practical Earned Value Analysis: 25 Project Indicators from 5 Measurements
From Everand
Practical Earned Value Analysis: 25 Project Indicators from 5 Measurements
Akram Najjar
No ratings yet

Principal Component Analysis

Uploaded by

Principal Component Analysis

Uploaded by

What is PCA

Common Interview Questions Applications of PCA:

10. Can we implement PCA for Regression?

North New direction

CHANGE OF BASIS 2-DIMENSION

Height (ft) Weight (lbs) Height (ft) Weight (lbs)

The ideal basis vectors required has the following properties:

● They explain the directions of maximum variance

from sklearn.decomposition import PCA

Important points to remember while using PCA:

Important shortcomings of PCA :

You might also like