Dimensionality Reduction: Principal Component Analysis (PCA)

PCA is a technique used to reduce dimensionality in data. It transforms the data by projecting it onto a set of orthogonal principal components or axes that account for maximum variance in the data. The first principal component accounts for as much variation in the data as possible, and each succeeding component accounts for as much remaining variation as possible. PCA is useful for reducing the size of datasets for analysis and speeding up machine learning algorithms. However, it has limitations when variables are not linearly related or there are outliers in the data.

Uploaded by

tanmayi nandiraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views11 pages

Dimensionality Reduction: Principal Component Analysis (PCA)

Uploaded by

tanmayi nandiraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Dimensionality

Reduction
Principal Component Analysis (PCA)

1
Contents
• Dimensionality Reduction
• Curse of Dimensionality
• Principal Component Analysis
• Intution behind PCA
• Uses
• Limitations
• Example

2
Dimensionality Reduction

• 1-d, 5 data points

• 2-d, 25 data points • 3-d, 25 data points

3
Dimensionality Reduction
Hughes Phenomenon
• As the number of features
increases, the classifier’s
performance increases as well
until we reach the optimal
number of features.
• Adding more features based on
the same size as the training set
will then degrade the classifier’s
performance.

4
Dimensionality Reduction
• Dimensionality - in statistics refers to how many attributes a
dataset has.
• Need for reduction  ‘Curse of dimensionality’.
• Curse of dimensionality refers to an exponential increase in the
size of data caused by a large number of dimensions.
• As the number of dimensions of a data increases, it becomes more
and more difficult to process it.
• Dimensionality Reduction is a solution - reduce the size of data by
extracting relevant information and disposing rest of data as
noise.

5
Principal Component Analysis
• PCA is one of the most popular linear dimension reduction.
• It’s a projection based method.
• Transforms the data by projecting it onto a set of orthogonal
(involving in right angles, perpendicular) axes.
• PCA creates new variables from old ones.

6
Principal Component Analysis

7
Principal Component Analysis
• Understanding PCA through animation.

• Each blue dot on the plot represents a point from data given by its x & y coordinate.
• A line P (red line) is drawn from the center of the dataset i.e. from the mean of x & y.
• Every point on the graph is projected on this line shown by two sets of points red & green.
• The spread or variance of data along line p is given by the distance between the two big red
points.
• As the line p rotates the distance between the two red points changes according to the angle
created by line p with the x-axis.
• The purple lines which join a point and its projection represent the error which arises when
we approximate a point by its projection.

8
Principal Component Analysis
• The approximation error should be small, when the new variables closely approximate the
old variables.
• The squared sum of the lengths of all purple lines gives the total error in approximation.
• The angle which minimizes the squared sum of errors also maximizes the distance between
the red points.
• The direction of maximum spread is called the principal axis.
• We apply the same procedure to find the next principal axis, which must be orthogonal to
the other principal axes.
• Once, we get all the principal axes, the dataset is projected onto these axes. The columns in
the projected or transformed dataset are called principal components.

9
When should you use PCA?
• Reducing the dimensionality of the dataset reduces the size.
• If your learning algorithm is too slow because the input dimension
is too high, then using PCA to speed it up.

10
Limitations of PCA
• If the number of variables is large, it becomes hard to interpret the
principal components.
• PCA is most suitable when variables have a linear relationship
among them.
• PCA is influenced to big outliers.

SST Class9 Disaster Management Project
87% (693)
SST Class9 Disaster Management Project
50 pages
Remote Sensing and GIS: Press
67% (3)
Remote Sensing and GIS: Press
48 pages
Environment Geography (Savindra Singh)
77% (22)
Environment Geography (Savindra Singh)
633 pages
Module 1 (Introduction To Land Administration and Management)
100% (5)
Module 1 (Introduction To Land Administration and Management)
10 pages
Disaster Management Multiple Choice Question With Answers
100% (3)
Disaster Management Multiple Choice Question With Answers
226 pages
Disaster Management MCQs
83% (6)
Disaster Management MCQs
10 pages
Arihant NCERT Notes Indian History Class 6-12 - Janmejay
93% (15)
Arihant NCERT Notes Indian History Class 6-12 - Janmejay
286 pages
Sustainable Development
75% (12)
Sustainable Development
20 pages
Project Development and Management PDF
90% (10)
Project Development and Management PDF
94 pages
Disaster Management (Question Bank)
76% (49)
Disaster Management (Question Bank)
20 pages
Risk MGMT - MCQ's
94% (17)
Risk MGMT - MCQ's
56 pages
General Climatology
60% (5)
General Climatology
472 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Indian Geography (Advanced)
100% (11)
Indian Geography (Advanced)
283 pages
Unit 3
No ratings yet
Unit 3
102 pages
Research Methodology Full Notes
85% (253)
Research Methodology Full Notes
87 pages
Arcgis Assignments PDF
No ratings yet
Arcgis Assignments PDF
96 pages
Disaster Management - Handout PDF
50% (2)
Disaster Management - Handout PDF
104 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Mathematics - I (MATH F111)
100% (2)
Mathematics - I (MATH F111)
70 pages
20 Pca
No ratings yet
20 Pca
50 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
GE8071 - Disaster Management
64% (11)
GE8071 - Disaster Management
232 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
Research Design and Approachs
100% (2)
Research Design and Approachs
69 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Agriculture in India
No ratings yet
Agriculture in India
30 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
An Introduction To Physical Geography and The Environment PDF
100% (13)
An Introduction To Physical Geography and The Environment PDF
841 pages
Slide Presentation - Sustainable Development
95% (22)
Slide Presentation - Sustainable Development
19 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
National Law Institute University Bhopal: Political Science Third Trimester Project On John Austin Theory of Sovereignty
No ratings yet
National Law Institute University Bhopal: Political Science Third Trimester Project On John Austin Theory of Sovereignty
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
Module 3
No ratings yet
Module 3
41 pages
Ieee 493
No ratings yet
Ieee 493
9 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Design by Root Locus PDF
No ratings yet
Design by Root Locus PDF
26 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Unix
No ratings yet
Unix
97 pages
Remote Sensing Images Preprocessing and Processing Techniques, Enhancement Techniques, Filtering.
0% (1)
Remote Sensing Images Preprocessing and Processing Techniques, Enhancement Techniques, Filtering.
30 pages
Unit 3
No ratings yet
Unit 3
31 pages
Basic Statistics PDF
100% (10)
Basic Statistics PDF
262 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
DSILYTC
No ratings yet
DSILYTC
2 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
PCA Slides
No ratings yet
PCA Slides
11 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
28 pages
Lecture 33: Optimization in Matlab: Exercise: Formulate The Following As An Optimization Problem (In Standard Form)
No ratings yet
Lecture 33: Optimization in Matlab: Exercise: Formulate The Following As An Optimization Problem (In Standard Form)
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
13 pages
Cambridge IGCSE Geography Coursebook With CD-ROM PDF
91% (32)
Cambridge IGCSE Geography Coursebook With CD-ROM PDF
381 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Tutors: Prof. Dr. Tanka Nath Dhamala & Mr. Ram Chandra Dhungana
No ratings yet
Tutors: Prof. Dr. Tanka Nath Dhamala & Mr. Ram Chandra Dhungana
55 pages
Dbms
No ratings yet
Dbms
95 pages
Program 3
No ratings yet
Program 3
7 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Love Report
No ratings yet
Love Report
7 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Chapter One: Introduction of Statistics
No ratings yet
Chapter One: Introduction of Statistics
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Introduction To ABAQUS
No ratings yet
Introduction To ABAQUS
10 pages
Weighted Moving Average
No ratings yet
Weighted Moving Average
8 pages
Pca 1
No ratings yet
Pca 1
3 pages
ILM Assessment Terminology PDF
No ratings yet
ILM Assessment Terminology PDF
2 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Cal - 2 Prelim: Answer Key
No ratings yet
Cal - 2 Prelim: Answer Key
16 pages
Important Questions of GIS For Exam and Viva
100% (3)
Important Questions of GIS For Exam and Viva
25 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Iterative Newton Raphson Method inverseAdmittivityProblem
No ratings yet
Iterative Newton Raphson Method inverseAdmittivityProblem
10 pages
Research Design
No ratings yet
Research Design
64 pages
Operations Management: Sustainability and Supply Chain Management
No ratings yet
Operations Management: Sustainability and Supply Chain Management
72 pages
Assignment No. 1 Research Methodology: Submitted by Anju Rani REG. NO. 1311004
No ratings yet
Assignment No. 1 Research Methodology: Submitted by Anju Rani REG. NO. 1311004
11 pages
MATB42 s1
100% (1)
MATB42 s1
6 pages
Fin333 Spring 2022 WMT Valuation
No ratings yet
Fin333 Spring 2022 WMT Valuation
5 pages
Disaster Management
100% (6)
Disaster Management
77 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Precision and Accuracy - Fall2017
No ratings yet
Precision and Accuracy - Fall2017
6 pages
GDP Estimation and Its Relevance As A Measure of Growth of Indian Economy
100% (1)
GDP Estimation and Its Relevance As A Measure of Growth of Indian Economy
21 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
BRE4281 - Linear Programming and Applications 2019-20
No ratings yet
BRE4281 - Linear Programming and Applications 2019-20
28 pages
Random Forest
No ratings yet
Random Forest
16 pages
PLSQL
No ratings yet
PLSQL
49 pages
Keys Assignment4
No ratings yet
Keys Assignment4
3 pages
Dynamical Equations For Optimal Nonlinear Filtering: DX F (X, T) DT + W2 (X, T) DZ
No ratings yet
Dynamical Equations For Optimal Nonlinear Filtering: DX F (X, T) DT + W2 (X, T) DZ
12 pages
Conjugate Gradient Method Report
No ratings yet
Conjugate Gradient Method Report
17 pages
Unit 8 Infrential Statistics Anova
No ratings yet
Unit 8 Infrential Statistics Anova
10 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Arc Length
No ratings yet
Arc Length
6 pages
2007 Ajc h2 Prelims Paper 1 Solutions
No ratings yet
2007 Ajc h2 Prelims Paper 1 Solutions
9 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
A Strict Lyapunov Funtion For The Simple Pendulum - R.Kelly - V.Santibañez
No ratings yet
A Strict Lyapunov Funtion For The Simple Pendulum - R.Kelly - V.Santibañez
5 pages
PCA
100% (1)
PCA
33 pages
Uniform Boundedness (Gliding Hump)
No ratings yet
Uniform Boundedness (Gliding Hump)
6 pages
Sreenath Vemula's Answer To What Must Be My Strategy To Score Full Marks in The GATE Mathematics (Also Suggest Best Book For Practicing Prev. Year's Questions) - Quora
No ratings yet
Sreenath Vemula's Answer To What Must Be My Strategy To Score Full Marks in The GATE Mathematics (Also Suggest Best Book For Practicing Prev. Year's Questions) - Quora
8 pages
Augmented Matrix & System of Linear Equations
No ratings yet
Augmented Matrix & System of Linear Equations
14 pages
IEQ-05 Geographic Information Systems Notes
No ratings yet
IEQ-05 Geographic Information Systems Notes
16 pages
The Sustainable Development Goals Targets and Indicators
100% (19)
The Sustainable Development Goals Targets and Indicators
64 pages
Biol701 Refresh Stats Course Content
No ratings yet
Biol701 Refresh Stats Course Content
4 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Dimensionality Reduction: Principal Component Analysis (PCA)

Uploaded by

Dimensionality Reduction: Principal Component Analysis (PCA)

Uploaded by

Dimensionality

• 1-d, 5 data points

• 2-d, 25 data points • 3-d, 25 data points

You might also like