0% found this document useful (0 votes)

32 views20 pages

MDA PrincipalComponentAnalysis

Uploaded by

radhika.khandelwal1616

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views20 pages

MDA PrincipalComponentAnalysis

Uploaded by

radhika.khandelwal1616

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Mathematics for Data Analysis

– Principal Component Analysis –

Matteo Gorgone

University of Messina, Department MIFT

email: [email protected]

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 1/20
What is PCA?
Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to
reduce the dimensionality of large datasets, by transforming a large set of variables into a smaller one
that still contains most of the information in the large set. Reducing the number of variables of a
dataset naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade
a little accuracy for simplicity. The data are projected into the Principal Components!

When apply PCA?

PCA is most commonly used when many of the variables are highly correlated with each other and it is
desirable to reduce their number to an independent set.

PCA: applications
PCA is used in exploratory data analysis, making predictive models, measuring components of human
intelligence, summarise data on variation in human gene frequencies across different regions, market
research and finance, neuroscience.
The idea of PCA is simple: reduce the number of variables of a dataset, while preserving as much of the
data’s variation as possible.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 2/20
Principal components in Data Analysis
Principal Components are new variables that are constructed as linear combinations of the initial
variables. These combinations are done in such a way that the new variables are uncorrelated and most
of the information within the initial variables is compressed into the first components.
Organizing information in principal components, will allow you to reduce dimensionality without losing
much information. This is achieved by discarding the components with low information and considering
the remaining components as your new variables!

Remark
Principal Components are less interpretable and don’t have any real meaning since they are constructed
as linear combinations of the initial variables.

Intepretation of Principal Components

The first Principal Component of a set of p variables is the derived variable formed as a linear
combination of the original variables that explains the largest possible variance in the dataset. The
second principal component explains the largest variance in what is left once the effect of the first
component is removed, and we may proceed through p iterations until all the variance is explained.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 3/20
Example
Suppose we have 10-dimensional data (we obtain 10 principal components).
PCA tries to put maximum possible information in the first component, then maximum remaining
information in the second and so on, until having something like this:

Percentage of variance (information) for each principal component.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 4/20
Principal Components in Geometry
The Principal Components of a collection of points in a real vector space are a sequence of p unit
vectors, where the i-th vector is the direction of a line that best fits the data while being orthogonal to
the first (i − 1) vectors. In this case, a best-fitting line is defined as one that minimizes the mean
squared Euclidean distance from the points to the line.
The directions constitute an orthonormal basis in which different individual dimensions of the data are
linearly uncorrelated.
Geometrically speaking, Principal Components represent the directions of the data that explain a
maximal amount of variance, that is to say, the lines that capture most information of the data. The
relationship between variance and information here, is that, the larger the variance carried by a line, and
the larger the dispersion of the data points along it, the more the information it has.

How to compute Principal Components

The first Principal Component can be defined as a direction that maximizes the variance of the projected
data. The i-th Principal Component can be computed with the condition that it is uncorrelated with the
first (i − 1) Principal Components, with direction orthogonal to them, that maximizes the variance of
the projected data.
We obtain a number of Principal Components equal to the original number p of variables.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 5/20
Aim of PCA
Principal Component Analysis is the process of computing the Principal Components and using them to
perform a change of basis on the data, sometimes using only the first few principal components and
ignoring the rest.
Such dimensionality reduction can be a very useful step for visualizing and processing high-dimensional
datasets, while still retaining as much of the variance in the dataset as possible.
Essentially, PCA is defined as a linear transformation that transforms the data to a new coordinate
system such that the greatest variance of the projected data lies on the first principal component, the
second greatest variance on the second principal component, and so on.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 6/20
PCA: starting point
Suppose we have a data set with N numerical data points xi measuring p characters of a population, say

xi = {xi1 , xi2 , . . . , xip }, i = 1, . . . , N.

Then, we consider a N × p matrix containing our data set such that:

N is the number of the vector data (each row represents a different repetition of the experiment);
p is the number of features we measured for our phenomenon (thus each measure is a vector xi
with p components).

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 7/20
PCA: step by step
1 Standardization.
The aim of this step is to standardize the range of the continuous initial variables so that each one
of them contributes equally to the analysis.
More specifically, the reason why it is critical to perform standardization prior to PCA, is that the
latter is quite sensitive regarding the variances of the initial variables. That is, if there are large
differences between the ranges of initial variables, those variables with larger ranges will dominate
over those with small ranges, which will lead to biased results. So, transforming the data to
comparable scales can prevent this problem.
Mathematically, in order to trasform all the variables to the same scale, we have to subtract the
mean and dividing by the standard deviation for each value of each variable:
xij − µj
zij = , i = 1, . . . , N, j = 1, . . . , p,
σj

where xij is the generic element of the i-th row and j-th column, µj and σj the mean and the
standard deviation of the j-th column (that is the j-feature of the data set).

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 8/20
PCA: step by step
2 Covariance matrix computation.
The aim of this step is to understand how the variables of the input data set are varying from the
mean with respect to each other, or in other words, to see if there is any relationship between them.
Because sometimes, variables are highly correlated in such a way that they contain redundant
information. So, in order to identify these correlations, we compute the covariance matrix.
The covariance matrix is a p × p symmetric matrix (where p is the number of dimensions) that has
as entries the covariances associated with all possible pairs of the initial variables, i.e.,
N
1 X
cov(xi , xj ) = (xki − µi )(xkj − µj ), i, j = 1, . . . , p.
N −1
k=1

Since the covariance of a variable with itself is its variance (cov(xi , xi ) = σ 2 (xi )), in the main
diagonal we actually have the variances of each initial variable.
The sign of a covariance is important:
if positive then the two variables increase or decrease together (positive correlated);
if negative then one increases when the other one decreases (negative, or inversely, correlated).
Therefore, the covariance matrix is not more than a table that summaries the correlations between
all the possible pairs of variables.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 9/20
PCA: step by step
3 Identify Principal Components.
Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the
covariance matrix in order to determine the Principal Components of the data.
In fact:
the eigenvectors of the Covariance matrix are the directions of the axes where there is the
largest variance (most information) and that we call Principal Components;
the eigenvalues, corresponding to each eigenvectors, represent the amount of variance carried
in each Principal Component.
By ranking the eigenvectors in order of their associated eigenvalues, highest to lowest, the Principal
Components in order of significance are obtained.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 10/20
PCA: step by step
4 Feature Vector.
Once we computed the eigenvectors and ordered them by their eigenvalues in descending order, we
can choose whether to keep all these components or discard those of lesser significance (of low
eigenvalues), and form with the remaining ones a matrix of vectors that we call Feature Vector.
Then, Feature Vector is simply a matrix that has as columns the eigenvectors of the components
that we decide to keep. This makes it the first step towards dimensionality reduction, because if we
choose to keep only q eigenvectors (the Principal Components) out of p, the final dataset will have
only q dimensions (q measured features).
The Feature Vector has size p × q!

Criterion for Principal Components

Remembering that the eigenvalues are sorted in descending order, a criterion to choose the number q of
Principal Components is that
λ1 + · · · + λq
≥ 0.8,
λ1 + · · · + λp
i.e., the Principal Components capture at least the 80% of variance of our dataset.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 11/20
PCA: step by step
5 Recast the data along the principal component axes.
In the previous steps, apart from standardization, we did not make any changes on the data. We
have just selected the Principal Components and formed the Feature Vector, but the input data set
are expressed always in terms of the original axes (i.e., in terms of the initial variables).
In this step, the aim is to use the Feature Vector formed using the eigenvectors of the covariance
matrix to reorient the data from the original axes to the ones represented by the Principal
Components. This can be done by multiplying the transpose of the original data set by the
transpose of the Feature Vector:

FinalDataSetT = FeatureVectorT · StandardizedOriginalDataSetT ,

i.e., we have
FinalDataSet = StandardizedOriginalDataSet · FeatureVector.
So doing we compressed the dimension of the data which along the principal axes are represented by
a N × q (with q < p) matrix.

Remark
In this step, we are projecting each vector of original data set into the vector space whose basis is the
Feature Vector!
Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 12/20
Considerations about recasting the data along the principal component axes
In Step 5, we are performing a change of basis and this can have many interpretations:
FeatureVectorT is the change of basis matrix that transforms StandardizedOriginalDataSetT into
FinalDataSetT ;
geometrically, FeatureVectorT is a rotation and a stretch which transforms
StandardizedOriginalDataSetT into FinalDataSetT ;
the rows of FeatureVectorT are a set of new basis vectors (the chosen eigenvectors of the covariance
matrix) for expressing the columns of StandardizedOriginalDataSetT .

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 13/20
Limitations of PCA
The applicability of PCA is limited to linear correlations between the features but fails when this
assumption is violated. Linearity is essential for the change of basis! There are current
investigations about nonlinearity, extending the PCA algorithm to that called kernel PCA.
Both the strength and weakness of PCA is that it is a non-parametric analysis;
The stardardization process before constructing the covariance matrix could be a limitation. In fact,
in fields such as astronomy, all the signals are non-negative, and the stardardization process will
force the mean of some astrophysical exposures to be zero, which consequently creates unphysical
negative fluxes.
If PCA is not performed properly, there is a high likelihood of information loss.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 14/20
Application: Iris flower dataset
The Iris flower dataset is a dataset where you can find collected the sizes of some elements of an Iris
flower and a code for three different Iris species. Each flower is described by a 5-dimensional vector: the
first component is the sepal length, the second one the sepal width, the third one the petal length, the
fourth one the petal width; the last component can be 0 (corresponding to the species Iris setosa), 1
(corresponding to the species Iris versicolor) or 2 (corresponding to the species Iris virginica). We have
150 observations, 50 for each species. Therefore, the dataset is a matrix of order 150 × 5 (see the
external file).
Let us consider only the first 4 columns, and compute their mean values and standard deviations:

µ1 = 5.84333, σ1 = 0.828066,
µ2 = 3.054, σ2 = 0.433594,
µ3 = 3.75867, σ3 = 1.76442,
µ4 = 1.19867, σ4 = 0.763161.

Therefore, the generic element xij of the i-th row and j-th column is standardized into
xij − µj
zij = , i = 1, . . . , 150, j = 1, . . . , 4.
σj

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 15/20
Application: Iris flower dataset
Now, we can construct the 4 × 4 covariance matrix of the standardized data:
 
1 −0.109369 0.871754 0.817954
 −0.109369 1 −0.420516 −0.356544 
C =  0.871754 −0.420516
.
1 0.962757 
0.817954 −0.356544 0.962757 1

The eigenvalues are

λ1 = 2.91082, λ2 = 0.921221, λ3 = 0.147353, λ4 = 0.0206077,

sorted in descending order.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 16/20
Application: Iris flower dataset

The above picture is of a screen plot that help to interpret the PCA and decide how many components
to retain. The start of the bending in the line (point of inflexion) should indicate how many components
are retained. In this case, two components should be retained.

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 17/20
Application: Iris flower dataset
The corresponding eigenvectors are
   
−0.522372 0.372318
 0.263355   0.925556 
p1 = 
 , p2 = 
 0.0210948  ,

−0.581254 
−0.565611 0.0654158
   
−0.721017 0.261996
 0.242033   −0.124135 
p3 = 
 0.140892  ,
 p4 = 
 −0.801154  .


0.633801 0.523546
If we define the matrix P whose columns are the four eigenvectors, which diagonalizes the covariance
matrix C , then  
2.91082 0 0 0
 0 0.921221 0 0
P T CP = 

,
 0 0 0.147353 0 
0 0 0 0.0206077
i.e., we obtain the diagonal matrix whose entries are the eigenvalues of C .

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 18/20
Application: Iris flower dataset
Since
λ1 + λ2
= 0.95801,
λ1 + λ2 + λ3 + λ4
we have the the first two principal components (along the directions of vectors p1 and p2 ) capture more
than 95% of the total variance of Iris data.
Therefore, defining the 4 × 2 matrix Q (the Feature Vector) whose columns are the two eigenvectors p1
and p2 , i.e.,  
−0.522372 0.372318
 0.263355 0.925556 
Q=  −0.581254 0.0210948  ,


−0.565611 0.0654158

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 19/20
Application: Iris flower dataset
and the 150 × 4 matrix A of the standardized data (without the last column identifying the species), we
can obtain the reduce dataset represented by the 150 × 2 matrix B such that:

B T = Q T AT ,

i.e.,
B = AQ.
Denoting by zi1 , zi2 , zi3 and zi4 the four elements of the i-th row of the standardized data matrix A, the
generic elements bij of the i-th row of the reduced matrix B are:

bi1 = −0.522372 zi1 + 0.263355 zi2 − 0.581254 zi3 − 0.565611 zi4 ,

bi2 = 0.372318 zi1 + 0.925556 zi2 + 0.0210948 zi3 + 0.0654158 zi4 .

Therefore, the reduced standardized data set, according to PCA, is represented by a 150 × 2 matrix (see
the external file).

Matteo Gorgone Mathematics for Data Analysis: Principal Component Analysis 20/20

A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
DLL g7 Sci Micros
100% (2)
DLL g7 Sci Micros
3 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
ASME B31J B31J Essentials Why These Are Useful in Piping Stress Analysis
No ratings yet
ASME B31J B31J Essentials Why These Are Useful in Piping Stress Analysis
4 pages
Pre Cal
100% (1)
Pre Cal
532 pages
Collins Ks3 Science Homework Book 1
50% (2)
Collins Ks3 Science Homework Book 1
7 pages
History of Integrated Pest Management
No ratings yet
History of Integrated Pest Management
13 pages
Chandrayaan 3 MCQ PDF
100% (1)
Chandrayaan 3 MCQ PDF
2 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
X5000 Safety Manual
No ratings yet
X5000 Safety Manual
24 pages
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
No ratings yet
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
38 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
PCA - Principal Component Analysis: Step by Step Computation of PCA
No ratings yet
PCA - Principal Component Analysis: Step by Step Computation of PCA
2 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
PCA Analysis
No ratings yet
PCA Analysis
28 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
2SC5200/FJL4315 NPN Epitaxial Silicon Transistor: Applications
No ratings yet
2SC5200/FJL4315 NPN Epitaxial Silicon Transistor: Applications
7 pages
PRY 6 English Ist Term
No ratings yet
PRY 6 English Ist Term
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Learning Episode 03
No ratings yet
Learning Episode 03
4 pages
PCA Tutorial: Instructor: Forbes Burkowski
No ratings yet
PCA Tutorial: Instructor: Forbes Burkowski
12 pages
Iso 4463 1 1989
No ratings yet
Iso 4463 1 1989
11 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
Doc-20240330-Wa0002 240330 194818
No ratings yet
Doc-20240330-Wa0002 240330 194818
10 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Full Download A Walk Through Combinatorics An Introduction To Enumeration and Graph Theory 4th Edition Miklós Bóna PDF
No ratings yet
Full Download A Walk Through Combinatorics An Introduction To Enumeration and Graph Theory 4th Edition Miklós Bóna PDF
34 pages
Chapter 7 Exponential and Log Equations and Inequations (Lecture 1)
No ratings yet
Chapter 7 Exponential and Log Equations and Inequations (Lecture 1)
10 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
PCA Term Structure
No ratings yet
PCA Term Structure
28 pages
Masters Thesis Timeline
100% (3)
Masters Thesis Timeline
7 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
2 - 4 Principal Component Analysis (PCA)
No ratings yet
2 - 4 Principal Component Analysis (PCA)
15 pages
Analysis of Radiative (2022 Only For Intro)
No ratings yet
Analysis of Radiative (2022 Only For Intro)
9 pages
L4 Slides - Representations - From Clay To Silicon - Y8
No ratings yet
L4 Slides - Representations - From Clay To Silicon - Y8
51 pages
Unit 3
No ratings yet
Unit 3
28 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
6 1+Day+2+Worksheet
No ratings yet
6 1+Day+2+Worksheet
4 pages
Brain Works
No ratings yet
Brain Works
2 pages
Inclined Sakiadis Flow 2021 Add in Tro (For New Work)
No ratings yet
Inclined Sakiadis Flow 2021 Add in Tro (For New Work)
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Combined Effect of Suctioninjection 2016 (Add in Intro)
No ratings yet
Combined Effect of Suctioninjection 2016 (Add in Intro)
20 pages
DLL - English 4 - Q1 - W8
No ratings yet
DLL - English 4 - Q1 - W8
4 pages
Inclined For New Work 2022 (Add Intro)
No ratings yet
Inclined For New Work 2022 (Add Intro)
14 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Module 3
No ratings yet
Module 3
41 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Tablas Elucidacion Estructural (Protoì - N y Carbono)
No ratings yet
Tablas Elucidacion Estructural (Protoì - N y Carbono)
54 pages
DR Pca
No ratings yet
DR Pca
22 pages
Environmental Studies
No ratings yet
Environmental Studies
3 pages
Eyring-Powell Fluid (Add in Intro)
No ratings yet
Eyring-Powell Fluid (Add in Intro)
10 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
NCERT Solution Relation and Function 3-4
No ratings yet
NCERT Solution Relation and Function 3-4
33 pages
6 1+day+1+worksheet
No ratings yet
6 1+day+1+worksheet
7 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
CVATSFriendly 1706242751 582645 344192
No ratings yet
CVATSFriendly 1706242751 582645 344192
1 page
FDT Crusherun L 1
No ratings yet
FDT Crusherun L 1
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
5 Min Company Intro DZ CS+en
No ratings yet
5 Min Company Intro DZ CS+en
14 pages
STAT502
No ratings yet
STAT502
13 pages
4 2+day+1+worksheet
No ratings yet
4 2+day+1+worksheet
5 pages
Herpetology Notes, Volume 4 219-224 (2011) (Published Online On 27 May 2011) - New Locality Records For Chelonians
No ratings yet
Herpetology Notes, Volume 4 219-224 (2011) (Published Online On 27 May 2011) - New Locality Records For Chelonians
6 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Ieee Mas Awards - 2024
No ratings yet
Ieee Mas Awards - 2024
17 pages
Program 3
No ratings yet
Program 3
7 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
YE Example
No ratings yet
YE Example
4 pages
1 s2.0 S0022509624002333 Main
No ratings yet
1 s2.0 S0022509624002333 Main
25 pages
Total Specifications of Products - en
No ratings yet
Total Specifications of Products - en
15 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Science 7 q3 Balanced and Unbalanced Forces Week 1
No ratings yet
Science 7 q3 Balanced and Unbalanced Forces Week 1
10 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
3 Lesson 1 - Worksheet
No ratings yet
3 Lesson 1 - Worksheet
3 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Pca 1
No ratings yet
Pca 1
3 pages
Multivariate Statistical Analysis
No ratings yet
Multivariate Statistical Analysis
12 pages
2025 Grandiose Mock - Science 2
No ratings yet
2025 Grandiose Mock - Science 2
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

MDA PrincipalComponentAnalysis

Uploaded by

MDA PrincipalComponentAnalysis

Uploaded by

Mathematics for Data Analysis

– Principal Component Analysis –

University of Messina, Department MIFT

When apply PCA?

Intepretation of Principal Components

Percentage of variance (information) for each principal component.

How to compute Principal Components

xi = {xi1 , xi2 , . . . , xip }, i = 1, . . . , N.

Then, we consider a N × p matrix containing our data set such that:

Criterion for Principal Components

FinalDataSetT = FeatureVectorT · StandardizedOriginalDataSetT ,

The eigenvalues are

λ1 = 2.91082, λ2 = 0.921221, λ3 = 0.147353, λ4 = 0.0206077,

sorted in descending order.

bi1 = −0.522372 zi1 + 0.263355 zi2 − 0.581254 zi3 − 0.565611 zi4 ,

You might also like