Principal Component Analysis

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction, transforming correlated variables into uncorrelated variables while retaining maximum variance. It helps simplify data analysis, improve performance, and visualize high-dimensional data effectively. However, PCA has limitations, including potential information loss, sensitivity to data scaling, and challenges in interpreting principal components.

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views27 pages

Principal Component Analysis

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Principal Component Analysis

• Principal component analysis, or PCA, is a
statistical procedure that allows you to
summarize the information content in large
data tables by means of a smaller set of
“summary indices” that can be more easily
visualized and analyzed.
Principal Component Analysis
• As the number of features or dimensions in a
dataset increases, the amount of data required to
obtain a statistically significant result increases
exponentially.
• This can lead to issues such as overfitting,
increased computation time, and reduced
accuracy of machine learning models this is
known as the curse of dimensionality problems
that arise while working with high-dimensional
data.
Principal Component Analysis
• As the number of dimensions increases, the number
of possible combinations of features increases
exponentially, which makes it computationally
difficult to obtain a representative sample of the
data and it becomes expensive to perform tasks
such as clustering or classification.
• Additionally, some machine learning algorithms can
be sensitive to the number of dimensions, requiring
more data to achieve the same level of accuracy as
lower-dimensional data.
Principal Component Analysis
• To address the curse of dimensionality,
Feature engineering techniques are used
which include feature selection and feature
extraction.
• Dimensionality reduction is a type of feature
extraction technique that aims to reduce the
number of input features while retaining as
much of the original information as possible.
What is Principal Component
Analysis(PCA)?
• Principal Component Analysis(PCA) technique
was introduced by the mathematician Karl
Pearson in 1901.
• It works on the condition that while the data
in a higher dimensional space is mapped to
data in a lower dimension space, the variance
of the data in the lower dimensional space
should be maximum.
Principal Component Analysis
• Principal Component Analysis (PCA) is a
statistical procedure that uses an orthogonal
transformation that converts a set of
correlated variables to a set of uncorrelated
variables.
• PCA is the most widely used tool in
exploratory data analysis and in machine
learning for predictive models.
Principal Component Analysis
• Principal Component Analysis (PCA) is an
unsupervised learning algorithm technique used to
examine the interrelations among a set of variables.
• It is also known as a general factor analysis where
regression determines a line of best fit.
• The main goal of Principal Component Analysis (PCA)
is to reduce the dimensionality of a dataset while
preserving the most important patterns or
relationships between the variables without any
prior knowledge of the target variables.
Principal Component Analysis
• Principal Component Analysis (PCA) is used to
reduce the dimensionality of a data set by
finding a new set of variables, smaller than the
original set of variables, retaining most of the
sample’s information, and useful for the
regression and classification of data.
Principal Component Analysis
Principal Component Analysis
• Principal Component Analysis (PCA) is a technique
for dimensionality reduction that identifies a set
of orthogonal axes, called principal components,
that capture the maximum variance in the data.
• The principal components are linear combinations
of the original variables in the dataset and are
ordered in decreasing order of importance.
• The total variance captured by all the principal
components is equal to the total variance in the
original dataset.
Principal Component Analysis
• The first principal component captures the
most variation in the data, but the second
principal component captures the maximum
variance that is orthogonal to the first
principal component, and so on.
Principal Component Analysis
• Principal Component Analysis can be used for a
variety of purposes, including data visualization,
feature selection, and data compression.
• In data visualization, PCA can be used to plot high-
dimensional data in two or three dimensions, making
it easier to interpret.
• In feature selection, PCA can be used to identify the
most important variables in a dataset.
• In data compression, PCA can be used to reduce the
size of a dataset without losing important information.
Principal Component Analysis
• In Principal Component Analysis, it is assumed
that the information is carried in the variance
of the features, that is, the higher the
variation in a feature, the more information
that features carries.
• Overall, PCA is a powerful tool for data
analysis and can help to simplify complex
datasets, making them easier to understand
and work with.
Step-By-Step Explanation of PCA (Principal
Component Analysis)
• Step 1: Standardization
– First, we need to standardize our dataset to
ensure that each variable has a mean of 0 and a
standard deviation of 1.
• Here, µ is the mean of independent features.

• σ sigma is the standard deviation of

independent features.
Step2: Covariance Matrix Computation
• Covariance measures the strength of joint
variability between two or more variables,
indicating how much they change in relation
to each other. To find the covariance we can
use the formula:
• The value of covariance can be
positive, negative, or zeros.
–Positive: As the x1 increases x2 also
increases.
–Negative: As the x1 increases x2
also decreases.
–Zeros: No direct relation
Step 3: Compute Eigenvalues and Eigenvectors of
Covariance Matrix to Identify Principal Components

• Let A be a square nXn matrix and X be a non-

zero vector for which
– for some scalar values λ.
• Then λ is known as the eigenvalue of matrix A
and X is known as the eigenvector of matrix A
for the corresponding eigenvalue.
• It can also be written as :
• where I is the identity matrix of the same
shape as matrix A.
• conditions will be true only if (A-λI) will be
non-invertible (i.e. singular matrix). That
means,

• From the above equation, we can find the

eigenvalues λ, and therefore corresponding
eigenvector can be found using the equation
AX-λX
Advantages of Principal Component Analysis
• Dimensionality Reduction
– Principal Component Analysis is a popular technique used for
dimensionality reduction, which is the process of reducing the
number of variables in a dataset.
– By reducing the number of variables, PCA simplifies data
analysis, improves performance, and makes it easier to
visualize data.
• Feature Selection
– Principal Component Analysis can be used for feature selection,
which is the process of selecting the most important variables
in a dataset.
– This is useful in machine learning, where the number of
variables can be very large, and it is difficult to identify the
most important variables.
Advantages of Principal Component Analysis
• Data Visualization
– Principal Component Analysis can be used for data
visualization.
– By reducing the number of variables, PCA can plot high-
dimensional data in two or three dimensions, making it easier
to interpret.
• Multicollinearity:
– Principal Component Analysis can be used to deal with
multicollinearity, which is a common problem in a regression
analysis where two or more independent variables are highly
correlated.
– PCA can help identify the underlying structure in the data and
create new, uncorrelated variables that can be used in the
regression model.
Advantages of Principal Component Analysis
• Noise Reduction
– Principal Component Analysis can be used to reduce the noise in
data.
– By removing the principal components with low variance, which
are assumed to represent noise, Principal Component Analysis
can improve the signal-to-noise ratio and make it easier to
identify the underlying structure in the data.
• Data Compression
– Principal Component Analysis can be used for data compression.
– By representing the data using a smaller number of principal
components, which capture most of the variation in the data,
PCA can reduce the storage requirements and speed up
processing.
Advantages of Principal Component Analysis

• Outlier Detection
– Principal Component Analysis can be used for
outlier detection.
– Outliers are data points that are significantly
different from the other data points in the dataset.
– Principal Component Analysis can identify these
outliers by looking for data points that are far from
the other points in the principal component space.
Disadvantages of Principal Component Analysis
• Interpretation of Principal Components
– The principal components created by Principal
Component Analysis are linear combinations of the
original variables, and it is often difficult to interpret
them in terms of the original variables.
– This can make it difficult to explain the results of PCA to
others.
• Data Scaling
– Principal Component Analysis is sensitive to the scale of
the data. If the data is not properly scaled, then PCA
may not work well.
– Therefore, it is important to scale the data before
applying Principal Component Analysis.
Disadvantages of Principal Component Analysis
• Information Loss
– Principal Component Analysis can result in information loss.
– While Principal Component Analysis reduces the number of
variables, it can also lead to loss of information.
– The degree of information loss depends on the number of
principal components selected.
– Therefore, it is important to carefully select the number of
principal components to retain.
• Non-linear Relationships
– Principal Component Analysis assumes that the relationships
between variables are linear.
– However, if there are non-linear relationships between
variables, Principal Component Analysis may not work well.
Disadvantages of Principal Component Analysis
• Computational Complexity
– Computing Principal Component Analysis can be
computationally expensive for large datasets.
– This is especially true if the number of variables in the
dataset is large.
• Overfitting
– Principal Component Analysis can sometimes result in
overfitting, which is when the model fits the training
data too well and performs poorly on new data.
– This can happen if too many principal components
are used or if the model is trained on a small dataset.

1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Multivariate Statistical Analysis
No ratings yet
Multivariate Statistical Analysis
12 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Vip 1500
No ratings yet
Vip 1500
192 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
DNC 15 User Manual - EN PDF
100% (1)
DNC 15 User Manual - EN PDF
64 pages
Unit 3
No ratings yet
Unit 3
31 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Module 3
No ratings yet
Module 3
41 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Pen Tool CheatSheet Photoshop CC
No ratings yet
Pen Tool CheatSheet Photoshop CC
1 page
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Chapter 7 Re
100% (1)
Chapter 7 Re
18 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Program 3
No ratings yet
Program 3
7 pages
STAT502
No ratings yet
STAT502
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
DR Pca
No ratings yet
DR Pca
22 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Pca 1
No ratings yet
Pca 1
3 pages
Simatic Net: Rugged Ethernet Switches
No ratings yet
Simatic Net: Rugged Ethernet Switches
48 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Things To Remember - Principal Component Analysis
No ratings yet
Things To Remember - Principal Component Analysis
2 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Summary PCA by Atta Mohammad 26040
No ratings yet
Summary PCA by Atta Mohammad 26040
2 pages
WV202 WorldView Slideshow User's Guide
No ratings yet
WV202 WorldView Slideshow User's Guide
18 pages
Class 8 - Year Plan: Term 1 Month Topics
No ratings yet
Class 8 - Year Plan: Term 1 Month Topics
4 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
MSTest Vs NUnit
No ratings yet
MSTest Vs NUnit
4 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Easy Excel
No ratings yet
Easy Excel
29 pages
BRM Unit-4
No ratings yet
BRM Unit-4
18 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
61 pages
IOT Based On Multilevel Fluid & Air-Cooling System For Battery Protection
No ratings yet
IOT Based On Multilevel Fluid & Air-Cooling System For Battery Protection
46 pages
Website and Technology Integration - Globestar Edutech. Pvt. Ltd. - Intern JD
No ratings yet
Website and Technology Integration - Globestar Edutech. Pvt. Ltd. - Intern JD
3 pages
OceanofPDF - Com Grid - Hamid R Arabnia
No ratings yet
OceanofPDF - Com Grid - Hamid R Arabnia
30 pages
Bethany Christian School of Tarlac Inc.: First Quarterly Examination
No ratings yet
Bethany Christian School of Tarlac Inc.: First Quarterly Examination
4 pages
Online Vehicle Rental Management System-Mern
No ratings yet
Online Vehicle Rental Management System-Mern
5 pages
Pipeline Log
No ratings yet
Pipeline Log
10 pages
The Magic of Prince: #4: HTTP Support
No ratings yet
The Magic of Prince: #4: HTTP Support
2 pages
Memory MGMT 1
No ratings yet
Memory MGMT 1
17 pages
Users Perception of Cloud Based Accounting Software
No ratings yet
Users Perception of Cloud Based Accounting Software
19 pages
Lecture3 Deadlock
No ratings yet
Lecture3 Deadlock
14 pages
Manual Ezcad3 Installation
No ratings yet
Manual Ezcad3 Installation
14 pages
Lecture5 Deadlock
No ratings yet
Lecture5 Deadlock
13 pages
Lecture1 Deadlock
No ratings yet
Lecture1 Deadlock
13 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
ITT04101-Computer Generations
No ratings yet
ITT04101-Computer Generations
5 pages
Basic Logitech Mouse For Gaming
No ratings yet
Basic Logitech Mouse For Gaming
1 page
Monitor Your Industrial Plant From Anywhere: The World'S #1-Selling Industrial Alarm Notification Software
No ratings yet
Monitor Your Industrial Plant From Anywhere: The World'S #1-Selling Industrial Alarm Notification Software
2 pages
Array Leetcode PDF
No ratings yet
Array Leetcode PDF
4 pages
Dk30a2dhu Datasheet
No ratings yet
Dk30a2dhu Datasheet
5 pages
Column Security
No ratings yet
Column Security
3 pages
Profile ABHAY
No ratings yet
Profile ABHAY
8 pages
Tuk-Mobile Computing
No ratings yet
Tuk-Mobile Computing
3 pages
Shortlisting - Acropolis
No ratings yet
Shortlisting - Acropolis
7 pages
Rationals Review 8 - Practice Test
No ratings yet
Rationals Review 8 - Practice Test
2 pages
KHAIRUN NISA - Product Owner
No ratings yet
KHAIRUN NISA - Product Owner
4 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet