0% found this document useful (0 votes)

5 views4 pages

Principle Component Analysis

PCA is an unsupervised machine learning algorithm commonly used for data visualization and dimensionality reduction. It reduces features into a smaller number of uncorrelated variables called principal components while retaining most of the variation in the data. PCA works by finding the directions with the most variance in high-dimensional data and projecting it onto a lower dimensional space.

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Principle Component Analysis

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Principle Component Analysis

Principle component analysis (PCA) is an unsupervised machine learning algorithm,

which is commonly used for data visualization. Specifically, if a model has a dataset
with a lot of features, it is near impossible to plot this data after 5-10 dimensions. The
algorithm reduces these features to just a few, so they can be easily plotted and
visualized.

In addition to data visualization, PCA is also sometimes used for data compression
(though this is being becoming less often with data storage rising). It has also been
used for speeding up training of a supervised learning model (typically support
vector machines), though this has gotten less effective over time too.

Reducing the Number of Features

To start reducing the number of features, the first thing that can be done is
combining or eliminating redundant features. Say a dataset has a feature that is
more or less relatively consistent, it may be worth taking out all together. For
example, using car dimensions, the width of a car between models is relatively
the same when compared to the length. Car width ends up being not as
important for most applications anyways. Eliminating the width altogether may be
worthwhile here. Additionally, the width and length could be combined into one
area feature, if the width of the car is still important.

What if instead of combining the features together, a third axis could be created,
based on the line function of the two features plotted against each other? This

Principle Component Analysis 1

new axis is a function of the two features, and its accuracy depends on how well
the axis fits the data. By continuously taking this third axis of two features plotted
against each other, all features can be reduced to a smaller amount of features.

Defining the Algorithm

The key to the PCA algorithm is to pick the best new axis at each feature
reduction step. When given two features, the first task is to preprocess them.
This is done by normalizing the features to have a zero mean and scaling the
features relative to each other. These actions center the data points around the
origin and ensure the value ranges aren’t too far apart, respectively.

Projecting data points

After preprocessing the data, the next step is to
project each data point onto a new (z) axis. The
term “project” means the original points are
being translated to new points on the z-axis via
a perpendicular line. This new axis is called the
principle component, and the key to its
effectiveness is the variance of the projected
data points on it.

If the data points are spread out, the axis fits well. Conversely, if they are all
clumped together, the axis does not fit well. Variance is important because it
means the data has still captured its information.

After a new z-axis has been defined, which is in the form of a two-unit vector
(z1 , z2 ), coordinates are projected onto the axis by first taking a dot product of
the coordinate vector (x1 , x2 ) and the axis vector.

z = [ 1] ⋅ [ 1 ]
x z
x2 z2

This dot product results in a scaler value, which is a distance value. Since the z-
axis is a vector from the origin, the distance value is translated onto the vector to
give the final coordinate points.

coordinates = z ⋅ [z1 z2 ]

Observations and misconceptions

Principle Component Analysis 2

As new axes are created in subsequent iterations, they actually end up always
being perpendicular to all axes before it.

Additionally, it is important to note PCA is not linear regression, even though they
may look similar. Since both algorithms try to minimize the distance between the
coordinates and the axis, the key difference is the axis the distance is minimized
from. Linear regression minimizes distance along the y-axis, while PCA always
uses the perpendicular line.
Also, linear regression only has two features, while PCA can have many features,
and multiple axis are used to retain the information (variance). They are both very
different algorithms used for different purposes, and it becomes more apparent
as the PCA algorithm has more features.

Approximating original data

Since the PCA algorithm works in linear steps and utilizes a simple vector dot
product, it is actually possible to reverse engineer original data by reversing the
algorithm. Given a distance value z (it is important to have this value first), the
old coordinates can be projected by the following equation,

[ 1] = z ∗ [ 1 ]
x z
x2 z2

Note, this is a multiplication operation being performed and not a dot product.

Implementing the Algorithm

The scikit-learn library provides some useful functions for implementing the PCA
algorithm, including mean normalization for fitting data to new axes (principle
components). The library also has explained_variance_ratio_ for examining how

Principle Component Analysis 3

much variance is explained by each principle component. This is optional, but it
can be useful for determining the effectiveness of the new axis.

# given an input array of features X, fit the data

# to n principle components
pca_1 = PCA(n_components=1)
pca_1.fit(X)

# captures percent of variability from original data;

# one value per principle component
pca_1.explained_variance_ratio_

# project each training example to a single value

# for the final, single feature
X_trans_1 = pca_1.transform(X)
X_reduced_1 = pca.inverse_transform(X_trans_1)

Principle Component Analysis 4

Time Estimates
100% (1)
Time Estimates
15 pages
PCA How To.1
No ratings yet
PCA How To.1
13 pages
Information For Yogoda Satsanga Students On Continuing Your Studies
No ratings yet
Information For Yogoda Satsanga Students On Continuing Your Studies
2 pages
Dimensionality Reduction: Principal Component Analysis (PCA)
No ratings yet
Dimensionality Reduction: Principal Component Analysis (PCA)
11 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
13 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
30 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Unit 3
No ratings yet
Unit 3
102 pages
Module 3
No ratings yet
Module 3
41 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
PCA
100% (1)
PCA
33 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
Principal Component Analysis PCA in Machine Learning
No ratings yet
Principal Component Analysis PCA in Machine Learning
20 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Pca 1
No ratings yet
Pca 1
3 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
3 pages
Love Report
No ratings yet
Love Report
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
ML Co3 Session 21 Pca
No ratings yet
ML Co3 Session 21 Pca
12 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
Project LA
No ratings yet
Project LA
13 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Program 3
No ratings yet
Program 3
7 pages
IAPM Exam Doc 7-8: Session 7 - MCQ Hack
No ratings yet
IAPM Exam Doc 7-8: Session 7 - MCQ Hack
56 pages
CAPM
No ratings yet
CAPM
2 pages
Article Summary
No ratings yet
Article Summary
8 pages
CCD Summary
No ratings yet
CCD Summary
2 pages
Nokia
No ratings yet
Nokia
2 pages
Coke Summary
No ratings yet
Coke Summary
2 pages
Agile Teamwork - Minimize Handoffs
No ratings yet
Agile Teamwork - Minimize Handoffs
3 pages
Tiss VC Advt
No ratings yet
Tiss VC Advt
2 pages
Computational and Experimental Analysis of Advanced Materials and Its Processing
No ratings yet
Computational and Experimental Analysis of Advanced Materials and Its Processing
2 pages
Solved Papers JEE 2024
No ratings yet
Solved Papers JEE 2024
30 pages
Psychoanalysis and Ethics The Necessity of Perspective 1st Edition Black - Quickly Download The Ebook To Start Your Content Journey
No ratings yet
Psychoanalysis and Ethics The Necessity of Perspective 1st Edition Black - Quickly Download The Ebook To Start Your Content Journey
63 pages
Final Output in 3is
No ratings yet
Final Output in 3is
2 pages
Abstract Nouns
No ratings yet
Abstract Nouns
8 pages
AlecCouros CV Jan2009
No ratings yet
AlecCouros CV Jan2009
17 pages
Prof Ed 1
No ratings yet
Prof Ed 1
9 pages
Software Engineering: Dr.N.D.Kodikara, UCSC
No ratings yet
Software Engineering: Dr.N.D.Kodikara, UCSC
36 pages
Important Dynasties and Kingdoms of Ancient India UPSC IAS Prelims Examination
No ratings yet
Important Dynasties and Kingdoms of Ancient India UPSC IAS Prelims Examination
1 page
Conditions of Successful Degradation Ceremonies
No ratings yet
Conditions of Successful Degradation Ceremonies
6 pages
Classroom Behavior Evaluation Fillable
No ratings yet
Classroom Behavior Evaluation Fillable
1 page
Adventures in Paleontology
92% (12)
Adventures in Paleontology
140 pages
Hung 2007
No ratings yet
Hung 2007
6 pages
Widens The Vocabulary of The Students
No ratings yet
Widens The Vocabulary of The Students
2 pages
The Zen Critique of Pure Land Buddhism
No ratings yet
The Zen Critique of Pure Land Buddhism
18 pages
HRD-FO-005 - Applicants Information Sheet - Rev3. As of 4.19.24.for Printing
No ratings yet
HRD-FO-005 - Applicants Information Sheet - Rev3. As of 4.19.24.for Printing
3 pages
Mid-Year Review Form (MRF) For Teacher I-Iii
No ratings yet
Mid-Year Review Form (MRF) For Teacher I-Iii
13 pages
Read Aloud Prek
No ratings yet
Read Aloud Prek
4 pages
Book Talk Rubric 1 Page
No ratings yet
Book Talk Rubric 1 Page
1 page
The Impact of Music
No ratings yet
The Impact of Music
5 pages
FS2 - Week 10-11
100% (1)
FS2 - Week 10-11
15 pages
03 05 Worksheet
No ratings yet
03 05 Worksheet
3 pages
Cost Benefit Analysis
No ratings yet
Cost Benefit Analysis
5 pages
Tips: How To Answer Common MBA Admission Questions
No ratings yet
Tips: How To Answer Common MBA Admission Questions
1 page
Implementing Cisco Secure Access Control System (ACS) v5.2: Course Objectives Associated Certifications
No ratings yet
Implementing Cisco Secure Access Control System (ACS) v5.2: Course Objectives Associated Certifications
2 pages
Steam Stem School Posters 2
No ratings yet
Steam Stem School Posters 2
7 pages

Principle Component Analysis

Uploaded by

Principle Component Analysis

Uploaded by

Principle Component Analysis

Principle component analysis (PCA) is an unsupervised machine learning algorithm,

Reducing the Number of Features

Principle Component Analysis 1

Defining the Algorithm

Projecting data points

Observations and misconceptions

Principle Component Analysis 2

Approximating original data

Implementing the Algorithm

Principle Component Analysis 3

# given an input array of features X, fit the data

# captures percent of variability from original data;

# project each training example to a single value

Principle Component Analysis 4

You might also like