0% found this document useful (0 votes)

147 views20 pages

Principal Component Analysis

This document provides an introduction to principal component analysis (PCA) for dimensionality reduction. It discusses some prerequisites for understanding PCA, including the origins of PCA in statistics and its goal of making sense of data through an iterative process of cleaning, reducing, and transforming data. It also covers the importance of variance in data and information entropy, what is meant by dimensions in data analysis, and why dimensionality needs to be reduced. The key principles of PCA are that it finds the most important dimensions that contain the most information, allowing less important dimensions that do not contribute much to be discarded while retaining most of the information. This is done by determining the eigenvectors and eigenvalues of the data's covariance matrix and sorting the eigenvectors by eigenvalue from highest to

Uploaded by

Shahnawaz sahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views20 pages

Principal Component Analysis

Uploaded by

Shahnawaz sahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Principal

Component
Analysis
An Introduction to dimensionality reduction.

-Sahil Imani
Some prerequisites before
getting into PCA.
 Origins of PCA
 Importance of Variance in data and information Entropy.
 What do we mean by dimensions?
 Why do we need it to reduce dimensions?
 The logic Behind and A visual explaination
PCA: Origins
 Comes from Statistics, a part of factor analysis and
Dimensionality reduction (Feature Extraction).
 Is NOT a Machine Learning technique by itself.
 Goal Of data analysis is to generally make “sense” of
the data.
 Is done in 3 iteratively steps (Clean, Reduce, Transform)
until we get to an acceptable level.
Video Taken from Computerphile Youtube Channel
Importance of Variance in data
and information Entropy.

 Information Entropy Basically tells the rate of

information generation from a stochastic process.
 Basically, it gives us a relation between
 The Information Gain vs The Uncertainty.
 The more the uncertainty the more the information
Transfer/gained.
Dimensionality
 In terms of data analysis, the number of attributes or
features that determines the final output of a data
driven decision is known as its dimensionality.
 The more attributes we use to better define something
the more “dimensions” it has.
 For dimensions greater than 4 however it becomes
impossible to visualize on a 2d plane. Which is why we
need to reduce/project the data to a lower dimension
while at the same time trying to retain most of the
information.
Need to reduce
Dimensionality
 Helps in data Visualization
 Makes Calculations faster and the upcoming machine
learning stage needs less data to work with for the same
amount of information
 Reduces the data set so we can start drawing
Conclusions
 Optimizes the data for use in Actual machine learning or
statistical modelling.
PCA: The logic behind it and
visual explanation
 A common Example of Dimensionality reduction in
everyday life.
 Some Dimensions/factors contains much more
information than others.
 If we can find the principle or the “important”
dimensions we can discard the ones that doesn’t
contribute much or some highly correlated dimensions.
 This is the logical basis for PCA.
 Visually, we can see it (for 2 dimensions) as trying to fit
in a line along the direction of maximum Variance.
 It will be a linear combination of both the dimensions
A 2D visualization of a data
set having two attributes.
The Math Powering it.
Programming Implementation

 The Basic Flow is:

 To find the Eigen Values and Eigen Vectors Of the
Covariance matrix of the attributes.
 Sort the Eigen Vectors according to the eigen Values(from
max to Min).
 Discard as many Principal components as long as we are
within the Amount of information we need.
 Reproject The data using the reduced Dimension.
Thank You

Chapter 3
No ratings yet
Chapter 3
100 pages
Unit 3
No ratings yet
Unit 3
113 pages
SEM:Confirmatory Factor Analysis (CFA)
No ratings yet
SEM:Confirmatory Factor Analysis (CFA)
28 pages
Conditional Statements in Python
No ratings yet
Conditional Statements in Python
5 pages
Time Series Analysis
100% (1)
Time Series Analysis
15 pages
Copy Geriatrics at Your Fingertips 2018
100% (1)
Copy Geriatrics at Your Fingertips 2018
370 pages
Predictive Analytics Updated
No ratings yet
Predictive Analytics Updated
30 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Chapter# 14 Database Design Theory and Normalization
No ratings yet
Chapter# 14 Database Design Theory and Normalization
54 pages
Principal Component Analysis 4 Dummies
100% (1)
Principal Component Analysis 4 Dummies
8 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
Topological Data Analysis
No ratings yet
Topological Data Analysis
26 pages
Relational Database Design by ER - To-Relational Mapping
No ratings yet
Relational Database Design by ER - To-Relational Mapping
16 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Romantizam
No ratings yet
Romantizam
283 pages
Mechanical Resonance Type
No ratings yet
Mechanical Resonance Type
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
167 pages
PCA
100% (1)
PCA
33 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
15 Mvue
100% (1)
15 Mvue
28 pages
The Singular Value Decomposition.
No ratings yet
The Singular Value Decomposition.
88 pages
Probability & Statistics
No ratings yet
Probability & Statistics
351 pages
Pattern and Classification
No ratings yet
Pattern and Classification
20 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
36 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
13 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
UNIT 5 Time Series Analysis
No ratings yet
UNIT 5 Time Series Analysis
17 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Python Pandas
No ratings yet
Python Pandas
19 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Normality Skewness Kurtosis
No ratings yet
Normality Skewness Kurtosis
7 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
DataScience With R (Assignment 5-Report)
No ratings yet
DataScience With R (Assignment 5-Report)
9 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Case Problem R.C. Coleman
100% (1)
Case Problem R.C. Coleman
5 pages
Mathematics Advanced: Homebush Boys High School
No ratings yet
Mathematics Advanced: Homebush Boys High School
24 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
S 99a E - 10 18 PDF
No ratings yet
S 99a E - 10 18 PDF
3 pages
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
No ratings yet
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
15 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Tickler PDF
No ratings yet
Tickler PDF
177 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Cluster
100% (1)
Cluster
72 pages
Dr. Anwar Al Assaf
No ratings yet
Dr. Anwar Al Assaf
24 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
An Introduction To The Bootstrap: Teaching Statistics May 2001
No ratings yet
An Introduction To The Bootstrap: Teaching Statistics May 2001
7 pages
Orems Self Care Deficit Theory
No ratings yet
Orems Self Care Deficit Theory
22 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Operational Research
No ratings yet
Operational Research
13 pages
Text
No ratings yet
Text
131 pages
Machine Learning Mastery Notes
No ratings yet
Machine Learning Mastery Notes
4 pages
Further Mathematics: Trial Exam 2012
No ratings yet
Further Mathematics: Trial Exam 2012
45 pages
Hamilton v. Snohomish County, Case No. 18-2-575598-SEA
No ratings yet
Hamilton v. Snohomish County, Case No. 18-2-575598-SEA
17 pages
Lecture 10 Tensor and Tensor Algebra 2 PDF
No ratings yet
Lecture 10 Tensor and Tensor Algebra 2 PDF
14 pages
C-80-80 PHQ SBC Report PDF
No ratings yet
C-80-80 PHQ SBC Report PDF
24 pages
Van Cleve Three Versions of The Bundle Theory PDF
No ratings yet
Van Cleve Three Versions of The Bundle Theory PDF
13 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Applied Multivariate Statistical Analysis Solution Manual PDF
No ratings yet
Applied Multivariate Statistical Analysis Solution Manual PDF
18 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Characteristics of The 21st Century Assessment AutoRecovered 1
No ratings yet
Characteristics of The 21st Century Assessment AutoRecovered 1
6 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
How To Write Conclusion in Thesis Sample
100% (3)
How To Write Conclusion in Thesis Sample
8 pages
Shanitsagar Ji Paper-6!9!24 (Responses)
0% (1)
Shanitsagar Ji Paper-6!9!24 (Responses)
8 pages
SHAW, GEORGE BERNARD. A Genius of The 20th Century and British Dramatist
No ratings yet
SHAW, GEORGE BERNARD. A Genius of The 20th Century and British Dramatist
115 pages
Patricia Benner PP Nurs 324
No ratings yet
Patricia Benner PP Nurs 324
31 pages
LASA Blood Sampling
No ratings yet
LASA Blood Sampling
4 pages
Finalprospectus 21 22
No ratings yet
Finalprospectus 21 22
24 pages
Fundamentals of Anthropology: Lecture # 10 (Part: A) Language and Communication
No ratings yet
Fundamentals of Anthropology: Lecture # 10 (Part: A) Language and Communication
24 pages
I Had Long Been Conquered. I Had Long Been Placed in The Position of The Admiring Child.
No ratings yet
I Had Long Been Conquered. I Had Long Been Placed in The Position of The Admiring Child.
7 pages
LP Recercadas From The Tratado de Glosas Diego Ortiz
No ratings yet
LP Recercadas From The Tratado de Glosas Diego Ortiz
7 pages
Greek Deity Cornell Notes (Nabeela Ghani)
No ratings yet
Greek Deity Cornell Notes (Nabeela Ghani)
4 pages
Mechanical Thrombectomy For Acute Ischemic Stroke - UpToDate
No ratings yet
Mechanical Thrombectomy For Acute Ischemic Stroke - UpToDate
20 pages
Competency Based Training
No ratings yet
Competency Based Training
14 pages
Validación Del SHAFT98 y SAHFTspt
No ratings yet
Validación Del SHAFT98 y SAHFTspt
16 pages
Clinical Care of Persons With Dementia in The Emergency Department: A Review of The Literature and Agenda For Research
No ratings yet
Clinical Care of Persons With Dementia in The Emergency Department: A Review of The Literature and Agenda For Research
8 pages
Hebrews 4 12-13
No ratings yet
Hebrews 4 12-13
6 pages
Final Program
No ratings yet
Final Program
4 pages
The Takbirs of Tashriq
No ratings yet
The Takbirs of Tashriq
1 page

Principal Component Analysis

Uploaded by

Principal Component Analysis

Uploaded by

Principal

 Information Entropy Basically tells the rate of

 The Basic Flow is:

You might also like