Principal Component Analysis: Jifry Issadeen

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

PRINCIPAL COMPONENT ANALYSIS

Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction

Principal Component Analysis is a technique for feature-


extraction. It combines the input variables in a specific way to
drop the "least important" variables while still retaining the most
valuable parts of all of the variables.

Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction
Fea1 Fea2 Fea3 Fea4 Fea5 Fea6 Fea7 Fea8 Fea9

• PCA is a dimensionality-
reduction method, that is often
used to reduce the
dimensionality of large data
sets.

• This is done by transforming a


large set of variables into a Fea-A Fea-B

smaller one that still contains


most of the information in the
large data set.
Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction
• PCA is a method to reduce the number of variables of a data set,
while preserving as much information as possible.

• PCA emphasizes variation and bring out strong patterns in a


dataset.

• It's often used to make data easy to explore and visualize.

• Reducing the number of variables of a data set naturally comes


at the expense of accuracy, but the trick in dimensionality
reduction is to trade a little accuracy for simplicity.
Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction

• Data with only few features/ smaller data sets are easier to
explore and visualize.

• This also makes analyzing data much easier and faster for
machine learning algorithms without extraneous variables
to process.

Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction

• For example, if the dataset


contains only two features, we
can easily visualize them and
understand the data.

Feature - 2
• If the dataset has many
features, it is very difficult to
visualize it.

Feature - 1

Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction
• Even though the subjects are 3D,
the movies are almost always 2D.

• We accept it, because 3rd Dimension


doesn’t add much to the story
anyway.

• Generally, a movie camera captures


3D information and flattens it to 2D,
without loosing too much
information.
Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Introduction

• PCA takes a dataset with a lot


of features / Dimensions,
flattens it to 2 or 3
dimensions.

PC2
• This way, we can visualize and
understand the data.

PC1

Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Advantages
• It is easier to visualize and understand the data in a lower
dimension than in higher dimensions.

• PCA is a very common way to speed up the Machine Learning


algorithm by getting rid of correlated variables which don't
contribute in any decision making.

• The training time of the algorithms reduces significantly with


less number of features.

• Having so many features in the dataset may lead to overfitting.


PCA reduces this by reducing them.
Jifry Issadeen
PRINCIPAL COMPONENT ANALYSIS Disadvantages

• After implementing PCA on the dataset, the original features


will turn into Principal Components.
• Principal Components are the linear combination of our
original features which are not as readable and interpretable as
original features.

• Losing a little bit of accuracy when reducing the dimensions.

• We must select the number of principal components with care


as it may miss some information compared to the original list
of features.
Jifry Issadeen

You might also like