0% found this document useful (0 votes)

291 views23 pages

Principal Components Analysis (PCA) Final

PCA is a technique used to simplify complex datasets. It seeks linear combinations of variables that explain maximum variance in the data. The main steps of PCA are: (1) calculating covariance matrix of dataset; (2) determining eigenvalues and eigenvectors of covariance matrix; (3) selecting principal components with highest eigenvalues; and (4) transforming original dataset onto new coordinate system defined by selected principal components. PCA is useful for reducing dimensionality, identifying patterns in data, and removing noise.

Uploaded by

endale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

291 views23 pages

Principal Components Analysis (PCA) Final

Uploaded by

endale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Principal components analysis (PCA)

PREPARED BY:
MISRAK BIRHANU
DAWIT TESEMA
SAMRAWIT ABERA
Outline
•Introduction
•Why Principal Component Analysis (PCA)?
•Mathematical Background & Example of PCA
•How it Works of PCA?
•Application of Principal component analysis ?
•References
What is Principal Component Analysis (PCA)
PCA is a method of extracting important variables (in form of components) from a large
set of variables available in a data set.

It extracts low dimensional set of features from a high dimensional data set with a motive
to capture as much information as possible.

PCA is used to extract the important information from a multivariate data table and to
express this information as a set of few new variables called principal components.

◦ These new variables correspond to a linear combination of the originals. The number of
principal components is less than or equal to the number of original variables.
What…
PCA is a technique that can be used to simplify a dataset
PCA seeks a linear combination of variables such that the maximum variance is
extracted from the variables.
◦ It then removes this variance and seeks a second linear combination which explains
the maximum proportion of the remaining variance, and so on. This is called the
principal axis method and results in orthogonal (uncorrelated) factors. PCA analyzes
total (common and unique) variance.
Why Principal Component Analysis (PCA)?
The main goal of PCA is:
1. To Identify hidden Patterns in the dataset
2. To reduce the dimensionality of the data by removing the noise and redundancy in
the data.
3. To identify correlated variables
 PCA method is particular useful when the variables with in the data set are highly
correlated.
 Correlation indicates that there is redundancy in the data. Due to redundancy, PCA
can be used to reduce the original variables into a smaller number of new variable is
called Principal component.
Applications of Principal Component
Analysis
 Computer vision:-matrix techniques in computer vision, we must consider representation of
images. A square, N by N image can be expressed as an dimensional vector.

 To find patterns:- create an image vector and put all the images together in one big image-
matrix in order to analyze. Eg. Face recognition system.

 Image compression:- new image is reproduced with lost of some information.

Mathematical Background of PCA
Standard Deviation
1. Assume a data set

2. Calculate Mean

 Notice the symbol (said “X bar”) to indicate the mean of the set . All this formula says is “Add up
all the numbers and then divide by how many there are”.

Unfortunately, the mean doesn’t tell us a lot about the data except for a sort of middle point. For example,
these two data sets have exactly the same mean, but are obviously quite different:
Mathematical Background of PCA
Cont..
So what is different about these two sets? It is the spread of the data.

3. Calculate Standard Deviation

The Standard Deviation (SD) of a data set is a measure of how spread out the data is.

The average distance from the mean of the data set to a point”. The way to calculate it is to
compute the squares of the distance from each data point to the mean of the set, add them all
up, divide by n-1, and take the positive square root. As a formula:
Mathematical Background of PCA
Cont..
Dataset 1 Dataset 2

 And so, as expected, the first set has a much larger standard deviation due to the fact that the
data is much more spread out from the mean
 also has a mean of 10, but its standard deviation is 0, because all the numbers are the same.
None of them deviate from the mean.
Mathematical Background of PCA
Cont..
Variance is another measure of the spread of data in a data set. In fact it is almost
identical to the standard deviation. The formula is this:

Standard deviation and variance only operate on 1 dimension, so that you could
only calculate the standard deviation for each dimension of the data set independently
of the other dimensions.

Covariance is always measured between 2 dimensions. If you calculate the covariance

between one dimension and itself, you get the variance.
Mathematical Background & example
Cont..
Covariance
N(number of dataset)=12
Mathematical Background …Matrices
The mathematical concept of a matrix refers to a set of numbers, variables or functions ordered
in rows and columns. Then it can be manipulated as a whole according to some basic
mathematical rules.

Two matrices can be multiplied together if the number of columns in the first matrix [A] must
be equal to the number of rows in the second matrix [B]. The resulting matrix [C] will have the
same number of rows as [A] and the same number of columns as [B].
Mathematical Background …Matrices
Let A be an nXn matrix. A scalar  is called an eigenvalue of A if there is a nonzero vector
such that A = . Such a vector is called an eigenvector of A corresponding to 

 2 3 2 
Show that =   is an eigenvector of A=   corresponding to =4
1 3  2

3 2   2  2
     
 3  2  1 1

Note: X-bar is Eigenvector

How it works PCA
Step 2: PCA…cont.
2. Calculate the covariance matrix: used to compute two dimension dataset
using the formula
Step 3: PCA…cont.
3. Calculate the (unit) eigenvectors and eigenvalues of the covariance matrix:

Steps to execute: 1. compute I (multiply lamda with identity matrix )
 
2. Compute A- I (Subtract the given matrix A from product of I)
3. Compute determinant and find eigenvalue
Eigenvectors : from PCA
reflect both common and 4. using eigenvalue compute eigenvector with matrices reduction technique
unique variance of variables 5. Check A = 
Eigenvalues : measure the
amount of variation in the
total sample.
Step 4: PCA…cont.
4. Once eigenvectors are found from the covariance matrix, the next step is to order them by
eigenvalue, highest to lowest. This gives you the components in order of significance.

 Ignore the components of lesser significance and we might lose some information, but if the
eigenvalues are small, we don’t lose much information. If we leave out some components, the
final data set will have less dimensions than the original.

 To be precise, if we originally have N dimensions in our data and so we calculate N

eigenvectors and eigenvalues, and then we choose only the first p eigenvectors, then the final
data set has only p dimensions.
Step 5: PCA…cont.
5. Choosing components and forming a feature vector
 Construct new feature vector by taking the eigenvectors that you want to keep from the list
of eigenvectors, and forming a matrix with these eigenvectors in the columns.

  .677873399  .735178956 
FeatureVector1   
  .735178956 .677873399  With in the example : we have two
choices. We can either form a feature
or reduced dimension feature vector : vector with both of the eigenvectors or, we
can choose to leave out the smaller, less
significant component and only have a
  .677873399  single column.
FeatureVector 2   
  .735178956 
Step 6: PCA…cont.
6. Derive the new data set : This the final step in PCA.
 Once we have chosen the components (eigenvectors) that we wish to keep in our data and
formed a feature vector, we simply take the transpose of the vector and multiply it on the left
of the original data set, transposed.

Row Feature Vector is the matrix with the eigenvectors in the columns transposed so that the
eigenvectors are now in the rows, with the most significant eigenvector at the top.

Row Data Adjust is the mean-adjusted data transposed, ie. the data items are in each column, with
each row holding a separate dimension.
Step 6: PCA…cont.
TransformedData = RowFeatureVector  RowDataAdjust

RowFeatureVector 2   .677873399  .735178956

 .69  1.31 .39 .09 1.29 .49 .19  .81  .31  .71 
RowDataAdjust   
 .49  1.21 .99 .29 1.09 .79  .31  .81  .31  1.01
Final result of original data restored using
only a single eigenvector
This single eigenvector are
plotted as diagonal doted lines
on the plot. Which gives us
important information and
shows patters in the data.
References
 A tutorial on Principal Components Analysis Lindsay I Smith February 26, 2002
 Basic matrices concepts
 Finding Egin Value and Egin vector Video tutorials
 PRINCIPAL COMPONENT ANALYSIS IN IMAGE PROCESSING M. Mudrov´a, A.
Proch´azka
 A Tutorial on Principal Component Analysis Jonathon Shlens
Thank You !

JU Publications List
100% (5)
JU Publications List
1,017 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning: Abstract
No ratings yet
Machine Learning: Abstract
11 pages
Signals and Systems With MATLAB Computing and Simulink Modeling, Fifth Edition
100% (2)
Signals and Systems With MATLAB Computing and Simulink Modeling, Fifth Edition
68 pages
Case Study Guideline
0% (1)
Case Study Guideline
2 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Eigen Architecture PDF
No ratings yet
Eigen Architecture PDF
129 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Paper Review On Security Feature For Cloud Computing
No ratings yet
Paper Review On Security Feature For Cloud Computing
32 pages
Imp - Questions Engg - Mathematics BAS103 With Solution
No ratings yet
Imp - Questions Engg - Mathematics BAS103 With Solution
62 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Special Edition On Irreecha Oct. 2016 2
100% (1)
Special Edition On Irreecha Oct. 2016 2
21 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
R2023-AI&DS-Curriculum & Syllabus Batch 2024-2025
No ratings yet
R2023-AI&DS-Curriculum & Syllabus Batch 2024-2025
65 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
Özet Kitapçığı Syf 368-369
No ratings yet
Özet Kitapçığı Syf 368-369
676 pages
Time - Series Machine Learning
No ratings yet
Time - Series Machine Learning
132 pages
Image Filtering Images by Pawan Sinha
No ratings yet
Image Filtering Images by Pawan Sinha
6 pages
MAT4144 5158 LieGroups
No ratings yet
MAT4144 5158 LieGroups
111 pages
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Auto Regressive Models
No ratings yet
Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Auto Regressive Models
31 pages
Project 1
No ratings yet
Project 1
11 pages
BS Syllabus 02-10-2018
No ratings yet
BS Syllabus 02-10-2018
17 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
81 pages
Mtech Ai ML
No ratings yet
Mtech Ai ML
32 pages
Plastic Buckling Analysis of Thick Plates Using P-Ritz Method
No ratings yet
Plastic Buckling Analysis of Thick Plates Using P-Ritz Method
17 pages
Computer Vision and Robotics Notes
No ratings yet
Computer Vision and Robotics Notes
4 pages
Logistic Regression
100% (3)
Logistic Regression
30 pages
Zlib - Pub Quantum Mechanics An Introduction
No ratings yet
Zlib - Pub Quantum Mechanics An Introduction
569 pages
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
CBM342 BCI Unit III
No ratings yet
CBM342 BCI Unit III
16 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
12122020
No ratings yet
12122020
1 page
Oromia Credit and Saving Share Company Information System Department SDT Individual Weekely Plan
100% (1)
Oromia Credit and Saving Share Company Information System Department SDT Individual Weekely Plan
2 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
What Are The Mean and Median Filters
No ratings yet
What Are The Mean and Median Filters
6 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
K-Means Clustering Tutorial - Matlab Code
No ratings yet
K-Means Clustering Tutorial - Matlab Code
3 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Temesgen Dawit
No ratings yet
Temesgen Dawit
90 pages
Modern Spectral Analysis Presentation
No ratings yet
Modern Spectral Analysis Presentation
26 pages
Reading A .TXT File in A Project in Eclipse
No ratings yet
Reading A .TXT File in A Project in Eclipse
11 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
CLEF2011wn QA4MRE PakrayEt2011
No ratings yet
CLEF2011wn QA4MRE PakrayEt2011
16 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Hierarchical Afaan Oromoo News Text Classification
No ratings yet
Hierarchical Afaan Oromoo News Text Classification
11 pages
Introduction To Probability
No ratings yet
Introduction To Probability
88 pages
Rmi and Corba: Why Both Are Valuable Tools
No ratings yet
Rmi and Corba: Why Both Are Valuable Tools
40 pages
10.34 Numerical Methods Applied To Chemical Engineering 1
No ratings yet
10.34 Numerical Methods Applied To Chemical Engineering 1
7 pages
Markov Chain and Markov Processes
No ratings yet
Markov Chain and Markov Processes
9 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Artificial Intelligence and Machine Learning in Financial Services
No ratings yet
Artificial Intelligence and Machine Learning in Financial Services
45 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
10 pages
Will Tasks Identifying To Achieve Objectives
No ratings yet
Will Tasks Identifying To Achieve Objectives
2 pages
Guyyaa 17/07/2019 To: IT Director Dhaaf
No ratings yet
Guyyaa 17/07/2019 To: IT Director Dhaaf
2 pages
Comparative Study of Effectiveness of Different Var Compensation Devices in Large-Scale Power Networks
No ratings yet
Comparative Study of Effectiveness of Different Var Compensation Devices in Large-Scale Power Networks
9 pages
Zone Cell Phone
No ratings yet
Zone Cell Phone
1 page
Time Series Analysis - Economics
100% (1)
Time Series Analysis - Economics
48 pages
Understanding Random Forest
100% (1)
Understanding Random Forest
12 pages
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
No ratings yet
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
11 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Data Mining
No ratings yet
Data Mining
49 pages
Complexity Theory
No ratings yet
Complexity Theory
13 pages
Cluster
100% (1)
Cluster
72 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Notes On Time Series Analysis
No ratings yet
Notes On Time Series Analysis
111 pages
Ch.6 Further Matrix Algebra 2
No ratings yet
Ch.6 Further Matrix Algebra 2
24 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
OPDR PENAL ISSUE How To Do Peyment If System Show OPDR PENAL Error
No ratings yet
OPDR PENAL ISSUE How To Do Peyment If System Show OPDR PENAL Error
3 pages
Feature Engineering / Feature Selection
No ratings yet
Feature Engineering / Feature Selection
33 pages
Krzanowski - Sensitivity of Principal Components - 1984
No ratings yet
Krzanowski - Sensitivity of Principal Components - 1984
7 pages
AI Project Report
No ratings yet
AI Project Report
3 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
MATEX NTH Power of A Square Matrix
No ratings yet
MATEX NTH Power of A Square Matrix
5 pages
GATE Electronics and Communication Engineering Syllabus
No ratings yet
GATE Electronics and Communication Engineering Syllabus
3 pages
Roadmap To Build A Machine Learning Model
No ratings yet
Roadmap To Build A Machine Learning Model
12 pages
Statement of Saving Account: Error: Subreport Could Not Be Shown
No ratings yet
Statement of Saving Account: Error: Subreport Could Not Be Shown
4 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Ai Project: Water Jug Problem
No ratings yet
Ai Project: Water Jug Problem
5 pages
ML Unsupervised Notes
No ratings yet
ML Unsupervised Notes
26 pages
An Algorithm For Solving Non-Linear Equations Based On The Secant Method
No ratings yet
An Algorithm For Solving Non-Linear Equations Based On The Secant Method
7 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Quadric Hypersurfaces in Euclidean Space
No ratings yet
Quadric Hypersurfaces in Euclidean Space
16 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Introductory Concepts of Probabability & Statistics
No ratings yet
Introductory Concepts of Probabability & Statistics
6 pages
TF Idf Algorithm
No ratings yet
TF Idf Algorithm
4 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
No ratings yet
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
15 pages
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
No ratings yet
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
23 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
The Physics PG
No ratings yet
The Physics PG
23 pages
Mc9280 Data Mining and Data Warehousing
No ratings yet
Mc9280 Data Mining and Data Warehousing
1 page
Phys 2041
No ratings yet
Phys 2041
7 pages
Syllabusjuly 2024
No ratings yet
Syllabusjuly 2024
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
M327 Handbook
0% (1)
M327 Handbook
72 pages
MCS-013: Discrete Mathematics
From Everand
MCS-013: Discrete Mathematics
Dr. DK Sukhani
No ratings yet

Principal Components Analysis (PCA) Final

Uploaded by

Principal Components Analysis (PCA) Final

Uploaded by

Principal components analysis (PCA)

 Image compression:- new image is reproduced with lost of some information.

3. Calculate Standard Deviation

Covariance is always measured between 2 dimensions. If you calculate the covariance

Note: X-bar is Eigenvector

 To be precise, if we originally have N dimensions in our data and so we calculate N

RowFeatureVector 2   .677873399  .735178956

You might also like