Dimensionality Reduction
(Principal Component Analysis)
Pre-requisites
Basic Statistics
Elementary Linear Algebra
Lagrange Optimization
A Start - up Story
The Bottleneck !!
Too few customers, too many KYC & Transaction Features
i.e Few observations & Many explanatory variables.
That’s causing a problem while predicting defaults & segmenting/
clustering customers.
But why? Why is it a problem?
Curse of Dimensionality
Supervised Learning (Prediction): Increasing the
number of features will not always improve
prediction accuracy.
Low Bias High Variance
Captures the noise / idiosyncrasies of training
data.
Overfitting on training data , poor predictive
performance on test – data
Unsupervised Learning: Clustering algorithms fail
in high dimensions.
Dimensionality Reduction
What is the objective?
Choose an optimum set of features of lower dimensionality to improve
classification accuracy or efficient clustering
Different methods can be used to reduce dimensionality:
Feature extraction
Feature selection
6
Dimensionality Reduction – Possible Strategies
Feature selection:
Feature extraction: finds a set of chooses a subset of the
new features (i.e., through some original features.
mapping f(.) from the existing
features.)
The mapping f() x1
x1 could be linear or x
x non-linear 2 xi1
2 .
y1
. y xi2
.
2 x y .
x
.
f (x)
y . .
. .
. .
. yK . xiK
.
xN
xN K<<N K<<N
7
Feature Extraction
Linear combinations / Linear Maps.
Every 𝑦𝑗 is a linear function of all the 𝑥𝑖′ s (i.e the original features)
8
Feature Extraction
Linear combinations are particularly attractive because they are simpler to
compute and analytically tractable.
Every 𝑦𝑗 is a linear function of all the 𝑥𝑖′ s (i.e the original features)
Given x ϵ RN, find an K x N matrix T such that:
y = Tx ϵ RK where K<<N
x1
x T This is a projection from the
2 y1 N-dimensional space to a K-
. y
2 dimensional space.
.
x
f (x)
y .
.
.
.
. yK
xN
9
Linear Transformation : Example
In a bank, for any customer ‘c’
𝑠𝑝𝑒𝑛𝑑𝑖𝑛𝑔
𝐴𝑑𝑣. 𝑃𝑎𝑦𝑚𝑒𝑛𝑡
𝑃𝑎𝑦𝑚𝑒𝑛𝑡 𝐷𝑒𝑙𝑎𝑦
(𝑐)
Original features 𝑥 = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝐵𝑎𝑙𝑎𝑛𝑐𝑒
𝐶𝑟𝑒𝑑𝑖𝑡 𝐿𝑖𝑚𝑖𝑡
𝑀𝑖𝑛 𝑃𝑎𝑦 𝐴𝑚𝑜𝑢𝑛𝑡
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑆𝑖𝑛𝑔𝑙𝑒 𝑆𝑝𝑒𝑛𝑑
0.5 0 0 0.21 0.34 0 0.07
T=
0.2 0.1 0 0.3 0.14 0.1 0
Linear Transformation : Example
𝑠𝑝𝑒𝑛𝑑𝑖𝑛𝑔
𝐴𝑑𝑣. 𝑃𝑎𝑦𝑚𝑒𝑛𝑡
𝑃𝑎𝑦𝑚𝑒𝑛𝑡 𝐷𝑒𝑙𝑎𝑦
0.5 0 0 0.21 0.34 0 0.07
𝑦 (𝑐) = T. 𝑥 (𝑐) = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝐵𝑎𝑙𝑎𝑛𝑐𝑒
0.2 0.1 0 0.3 0.14 0.1 0
𝐶𝑟𝑒𝑑𝑖𝑡 𝐿𝑖𝑚𝑖𝑡
𝑀𝑖𝑛 𝑃𝑎𝑦 𝐴𝑚𝑜𝑢𝑛𝑡
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑆𝑖𝑛𝑔𝑙𝑒 𝑆𝑝𝑒𝑛𝑑
• 𝑦 (𝑐) =
0.5 ∗ 𝑠𝑝𝑒𝑛𝑑𝑖𝑛𝑔 + 0 ∗ 𝐴𝑑𝑣. 𝑃𝑎𝑦𝑚𝑛𝑡 + 0 ∗ 𝑃𝑎𝑦 𝐷𝑒𝑙𝑎𝑦 + 0.21 ∗ 𝑐𝑢𝑟𝑟 𝐵𝑎𝑙 + 0.34 ∗ 𝐶𝑟𝑒𝑑 𝐿𝑖𝑚 + 0 ∗ 𝑀𝑃𝐴 + 0.07 ∗ 𝑀𝑎𝑥 𝑆𝑖𝑛𝑔𝑙𝑒
0.2 ∗ 𝑠𝑝𝑒𝑛𝑑𝑖𝑛𝑔 + 0.1 ∗ 𝐴𝑑𝑣. 𝑃𝑎𝑦𝑚𝑛𝑡 + 0 ∗ 𝑃𝑎𝑦 𝐷𝑒𝑙𝑎𝑦 + 0.3 ∗ 𝑐𝑢𝑟𝑟 𝐵𝑎𝑙 + 0.14 ∗ 𝐶𝑟𝑒𝑑 𝐿𝑖𝑚 + 0.1 ∗ 𝑀𝑃𝐴 + 0 ∗ 𝑀𝑎𝑥 𝑆𝑖𝑛𝑔𝑙𝑒
Feature Extraction : objective
Finding an optimum linear mapping y=𝑓(x) that minimizes
information loss.
Criterion of Minimizing Information Loss: represent the data as
accurately as possible in the lower-dimensional space.
Principal Component Analysis uses exactly this criterion to
optimally reduce dimensions of the data.
12
Let’s Think 2 – D !!
Linear Feature Extraction Geometry: New axes
Original Variable: 𝒙𝟐
PC 2
PC 1
Original Variable : 𝒙𝟏
Projections along PC1 explain the data most along any one axis
New Feature creation ≡ 𝑅𝑜𝑡𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐴𝑥𝑒𝑠
New Axes Transformation (2 – D)
𝑥1 Cos𝜃 − Sin𝜃 𝑧1
𝑥2 = 𝑧2
S𝑖𝑛𝜃 Cos𝜃
𝑥1 Cos𝜃 − Sin𝜃 𝑧1
𝑥2 = 𝑧2
S𝑖𝑛𝜃 Cos𝜃
Cos𝜃 − Sin𝜃 Cos𝜃 Sin𝜃 1 0
=
S𝑖𝑛𝜃 Cos𝜃 −S𝑖𝑛𝜃 Cos𝜃 0 1
𝑥1 Cos𝜃 − Sin𝜃 𝑧1
𝑥 = 𝑧2
2 S𝑖𝑛𝜃 Cos𝜃
Cos𝜃 − Sin𝜃 Cos𝜃 Sin𝜃 1 0
=
S𝑖𝑛𝜃 Cos𝜃 −S𝑖𝑛𝜃 Cos𝜃 0 1
𝑧1 Cos𝜃 Sin𝜃 𝑥1
𝑧 = 𝑥2
2 −S𝑖𝑛𝜃 Cos𝜃
Cos𝜃 Sin𝜃
Z = AX ; A =
−S𝑖𝑛𝜃 Cos𝜃
𝑧1 = 𝑎1 𝑇 X where 𝑎1 𝑇 = Cos𝜃 Sin𝜃 & thus 𝑎1 𝑇 𝑎1 = 1
𝑧2 = 𝑎2 𝑇 X where 𝑎2 𝑇 = − 𝑆𝑖𝑛𝜃 Cos𝜃 & thus 𝑎2 𝑇 𝑎2 = 1
Cos𝜃 Sin𝜃
Z = AX ; A =
−S𝑖𝑛𝜃 Cos𝜃
𝑧1 = 𝑎1 𝑇 X where 𝑎1 𝑇 = Cos𝜃 Sin𝜃 & thus 𝑎1 𝑇 𝑎1 = 1
𝑧2 = 𝑎2 𝑇 X where 𝑎2 𝑇 = − 𝑆𝑖𝑛𝜃 Cos𝜃 & thus 𝑎2 𝑇 𝑎2 = 1
Rotation of axes yields new features which are linear comb. of original
features & combining weights form an unit vector.
Next Obvious Question !
What rotation(s) is (are) optimal?
Which directions to choose ? How to choose the “Principal
Components” ?
The Principal Components Rationale
1st Principal component points in the direction of the largest variance.
2nd Principal component also points in the direction of the largest
variance conditional on it being orthogonal to 1st Principal Comp.
Each subsequent principal component…
• is orthogonal to the previous ones, and
• points in the directions of the largest variance of the residual
subspace
2D Gaussian dataset
1st PCA axis
2nd PCA axis
PCA Algorithm - Formalized
Step 1: Choose 𝑃𝐶1 ≡ 𝑧1 = 𝑎1 𝑇 X such that Var(𝑧1 ) is maximum
i.e 𝑚𝑎𝑥𝑎1 Var(𝑧1 ) such that 𝑎1 𝑇 𝑎1 = 1
PCA Algorithm - Formalized
Step 1: Choose𝑃𝐶1 ≡ 𝑧1 = 𝑎1 𝑇 X such that Var(𝑧1 ) is maximum
i.e 𝑚𝑎𝑥𝑎1 Var(𝑧1 ) such that 𝑎1 𝑇 𝑎1 = 1
Step 2: Choose𝑃𝐶2 ≡ 𝑧2 = 𝑎2 𝑇 X such that Var(𝑧2 ) is maximum
& 𝑧2 ⊥ 𝑧1
i.e 𝑚𝑎𝑥𝑎2 Var(𝑧2 ) such that 𝑎2 𝑇 𝑎2 = 1 & 𝑧2 ⊥ 𝑧1
PCA Algorithm - Formalized
Step 1: Choose 𝑃𝐶1 ≡ 𝑧1 = 𝑎1 𝑇 X such that Var(𝑧1 ) is maximum
i.e 𝑚𝑎𝑥𝑎1 Var(𝑧1 ) such that 𝑎1 𝑇 𝑎1 = 1
Step 2: Choose 𝑃𝐶2 ≡ 𝑧2 = 𝑎2 𝑇 X such that 𝑧2 ⊥ 𝑧1 & Var(𝑧2 ) is maximum
i.e 𝑚𝑎𝑥𝑎2 Var(𝑧2 ) such that 𝑎2 𝑇 𝑎2 = 1 & 𝑧2 ⊥ 𝑧1
Step 3: Choose 𝑃𝐶3 ≡ 𝑧3 = 𝑎3 𝑇 X such that Var(𝑧3 ) is maximum & 𝑧3 ⊥ 𝑧1 , 𝑧2
i.e 𝑚𝑎𝑥𝑎3 Var(𝑧3 ) such that 𝑎3 𝑇 𝑎3 = 1 & 𝑧3 ⊥ 𝑧1 , 𝑧2
⋮
Optimization Problem
𝑃𝐶𝑗 ≡ 𝑧𝑗 = 𝑎𝑗 𝑇 X
Optimization Problem
𝑃𝐶𝑗 ≡ 𝑧𝑗 = 𝑎𝑗 𝑇 X
Var(𝑧𝑗 ) = E[ 𝑎𝑗 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑗 𝑇 𝐸(𝑋𝑋 𝑇 ) 𝑎𝑗 = 𝑎𝑗 𝑇 𝑉𝑎𝑟(𝑋) 𝑎𝑗 = 𝑎𝑗 𝑇 ∑ 𝑎𝑗
Optimization Problem
𝑃𝐶𝑗 ≡ 𝑧𝑗 = 𝑎𝑗 𝑇 X
Var(𝑧𝑗 ) = E[ 𝑎𝑗 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑗 𝑇 𝐸(𝑋𝑋 𝑇 ) 𝑎𝑗 = 𝑎𝑗 𝑇 𝑉𝑎𝑟(𝑋) 𝑎𝑗 = 𝑎𝑗 𝑇 ∑ 𝑎𝑗
Optimization problem:
𝑚𝑎𝑥𝑎𝑗 𝑎𝑗 𝑇 ∑ 𝑎𝑗 Such that 𝑎𝑗 𝑇 𝑎𝑗 = 1
Optimization Problem
𝑃𝐶𝑗 ≡ 𝑧𝑗 = 𝑎𝑗 𝑇 X
Var(𝑧𝑗 ) = E[ 𝑎𝑗 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑗 𝑇 𝐸(𝑋𝑋 𝑇 ) 𝑎𝑗 = 𝑎𝑗 𝑇 𝑉𝑎𝑟(𝑋) 𝑎𝑗 = 𝑎𝑗 𝑇 ∑ 𝑎𝑗
Optimization problem:
𝑚𝑎𝑥𝑎𝑗 𝑎𝑗 𝑇 ∑ 𝑎𝑗 Such that 𝑎𝑗 𝑇 𝑎𝑗 = 1 ∀ 𝑗
Optimization problem redefined:
𝑎𝑗 Such that 𝑎𝑗 𝑇 𝑎𝑗 = 1 ∀ 𝑗
𝑚𝑎𝑥𝑎𝑗 𝑎𝑗 𝑇 ∑
Missing Orthogonality Constraints !!
In the optimization problem described above how come I haven’t incorporated
the fact that I desire Cov (𝑧𝑖 , 𝑧𝑗 ) = 0 ∀ 𝑖 ≠ 𝑗
𝑎𝑗
Cov (𝑧𝑖 , 𝑧𝑗 ) = Cov (𝑎𝑖 𝑇 X , 𝑎𝑗 𝑇 X) = E[ 𝑎𝑖 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑖 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑖 𝑇 ∑
𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗 while
So shouldn’t we impose the additional restrictions 𝑎𝑖 𝑇 ∑
solving the optimization problem?
Optimization Solution
The Lagrangian L = 𝑎𝑗 𝑇 ∑
𝑎𝑗 + 𝛽𝑗 ( 1 − 𝑎𝑗 𝑇 𝑎𝑗 )
𝝏𝑳
𝑎𝑗 – 2 𝛽𝑗 𝑎𝑗 = 0
= 0 => 2 ∑
𝝏 𝑎𝑗
− 𝜷𝒋 𝑰) 𝒂𝒋 = 0
(∑
Therefore 𝛽𝑗 are the eigen values and 𝑎𝑗 are the corresponding unit eigen
vectors of ∑
Optimization Solution
𝑎𝑗 + 𝛽𝑗 ( 1 − 𝑎𝑗 𝑇 𝑎𝑗 )
The Lagrangian L = 𝑎𝑗 𝑇 ∑
𝝏𝑳
𝑎𝑗 – 2 𝛽𝑗 𝑎𝑗 = 0
= 0 => 2 ∑
𝝏 𝑎𝑗
− 𝜷𝒋 𝑰) 𝒂𝒋 = 0
(∑
Therefore 𝛽𝑗 are the eigen values and 𝑎𝑗 are the corresponding unit eigen
vectors of ∑
Aah !!...So we have found all the PC’s effectively, since 𝑎𝑗′ s are the combining
weights of the PC’s
Symmetric Matrix – Our Saviour !!!
Symmetric Matrix – Our Saviour !!!
Theorem: Eigen vectors corresponding to different eigen values of a
symmetric matrix are orthogonal
Symmetric Matrix – Our Saviour !!!
Theorem: Eigen vectors corresponding to different eigen values of a
symmetric matrix are orthogonal
Proof: Let’s say A is a symmetric matrix.
Let A𝑋1 = 𝛽1 𝑋1 & A𝑋2 = 𝛽2 𝑋2 with 𝛽1 ≠ 𝛽2
Symmetric Matrix – Our Saviour !!!
Theorem: Eigen vectors corresponding to different eigen values of a
symmetric matrix are orthogonal
Proof: Let’s say A is a symmetric matrix.
Let A𝑋1 = 𝛽1 𝑋1 & A𝑋2 = 𝛽2 𝑋2 with 𝛽1 ≠ 𝛽2
𝑋2𝑇 A𝑋1 = 𝛽1 𝑋2𝑇 𝑋1 ………………… (i)
𝑋1𝑇 A𝑋2 = 𝛽2 𝑋1𝑇 𝑋2 ………………… (ii)
Symmetric Matrix – Our Saviour !!!
Theorem: Eigen vectors corresponding to different eigen values of a
symmetric matrix are orthogonal
Proof: Let’s say A is a symmetric matrix.
Let A𝑋1 = 𝛽1 𝑋1 & A𝑋2 = 𝛽2 𝑋2 with 𝛽1 ≠ 𝛽2
𝑋2𝑇 A𝑋1 = 𝛽1 𝑋2𝑇 𝑋1 …………………. (i)
𝑋1𝑇 A𝑋2 = 𝛽2 𝑋1𝑇 𝑋2 …………………. (ii)
(𝑋2𝑇 A𝑋1 )1𝑋1 = (𝑋2𝑇 A𝑋1 )𝑇 = 𝑋1𝑇 𝐴𝑇 𝑋2 = 𝑋1𝑇 A 𝑋2 (since A is symmetric)
Symmetric Matrix – Our Saviour !!!
Theorem: Eigen vectors corresponding to different eigen values of a
symmetric matrix are orthogonal
Proof: Let’s say A is a symmetric matrix.
Let A𝑋1 = 𝛽1 𝑋1 & A𝑋2 = 𝛽2 𝑋2 with 𝛽1 ≠ 𝛽2
𝑋2𝑇 A𝑋1 = 𝛽1 𝑋2𝑇 𝑋1 ……………….. (ii)
𝑋1𝑇 A𝑋2 = 𝛽2 𝑋1𝑇 𝑋2 ……………….. (ii)
(𝑋2𝑇 A𝑋1 )1𝑋1 = (𝑋2𝑇 A𝑋1 )𝑇 = 𝑋1𝑇 𝐴𝑇 𝑋2 = 𝑋1𝑇 A 𝑋2 (since A is symmetric)
∴ 𝛽1 𝑋2𝑇 𝑋1 = 𝛽2 𝑋1𝑇 𝑋2 => (𝛽1 − 𝛽2 ) 𝑋1𝑇 𝑋2 = 0
∴ 𝛽1 ≠ 𝛽2 => 𝑋1𝑇 𝑋2 = 0
Glitch Solved !
is the variance matrix & hence symmetric.
∑
are orthogonal.
Thus the eigen vectors of ∑
Thus in the optimization problem solution 𝑎𝑖 𝑇 𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗
Glitch Solved !
is the variance matrix & hence symmetric.
∑
are orthogonal.
Thus the eigen vectors of ∑
Thus in the optimization problem solution 𝑎𝑖 𝑇 𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗
𝑎𝑗
Cov(𝑧𝑖 , 𝑧𝑗 ) = Cov (𝑎𝑖 𝑇 X , 𝑎𝑗 𝑇 X) = E[ 𝑎𝑖 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑖 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑖 𝑇 ∑
Glitch Solved !
is the variance matrix & hence symmetric.
∑
are orthogonal.
Thus the eigen vectors of ∑
Thus in the optimization problem solution 𝑎𝑖 𝑇 𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗
𝑎𝑗
Cov(𝑧𝑖 , 𝑧𝑗 ) = Cov (𝑎𝑖 𝑇 X , 𝑎𝑗 𝑇 X) = E[ 𝑎𝑖 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑖 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑖 𝑇 ∑
𝑎𝑗 ) = 𝑎𝑖 𝑇 ( 𝛽𝑗 𝑎𝑗 ) = 𝛽𝑗 𝑎𝑖 𝑇 𝑎𝑗 = 0
𝑎𝑖 𝑇 ( ∑
Glitch Solved !
is the variance matrix & hence symmetric.
∑
are orthogonal.
Thus the eigen vectors of ∑
Thus in the optimization problem solution 𝑎𝑖 𝑇 𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗
𝑎𝑗
Cov(𝑧𝑖 , 𝑧𝑗 ) = Cov (𝑎𝑖 𝑇 X , 𝑎𝑗 𝑇 X) = E[ 𝑎𝑖 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑖 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑖 𝑇 ∑
𝑎𝑗 ) = 𝑎𝑖 𝑇 ( 𝛽𝑗 𝑎𝑗 ) = 𝛽𝑗 𝑎𝑖 𝑇 𝑎𝑗 = 0
𝑎𝑖 𝑇 ( ∑
𝑎𝑗
Var(𝑧𝑗 ) = 𝑎𝑗 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑗 𝑇 ∑
Glitch Solved !
is the variance matrix & hence symmetric.
∑
are orthogonal.
Thus the eigen vectors of ∑
Thus in the optimization problem solution 𝑎𝑖 𝑇 𝑎𝑗 = 0 ∀ 𝑖 ≠ 𝑗
𝑎𝑗
Cov(𝑧𝑖 , 𝑧𝑗 ) = Cov (𝑎𝑖 𝑇 X , 𝑎𝑗 𝑇 X) = E[ 𝑎𝑖 𝑇 X (𝑎𝑗 𝑇 X )𝑇 ] = 𝑎𝑖 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑖 𝑇 ∑
𝑎𝑗 ) = 𝑎𝑖 𝑇 ( 𝛽𝑗 𝑎𝑗 ) = 𝛽𝑗 𝑎𝑖 𝑇 𝑎𝑗 = 0
𝑎𝑖 𝑇 ( ∑
𝑎𝑗 = 𝑎𝑗 𝑇 ( ∑
Var(𝑧𝑗 ) = 𝑎𝑗 𝑇 ∑ 𝑎𝑗 ≈ 𝑎𝑗 𝑇 ∑ 𝑎𝑗 ) = 𝑎𝑗 𝑇 ( 𝛽𝑗 𝑎𝑗 ) = 𝛽𝑗 𝑎𝑗 𝑇 𝑎𝑗 = 𝛽𝑗
PCA – Computational Steps
Given the data set, compute the estimated covariance matrix of the
original features ∑
Compute the eigen values 𝛽𝑗 & the corresponding eigen vectors 𝑎𝑗
𝑇 𝑎𝑗
The jth Principal Component : 𝑧𝑗 = 𝑎ഥ𝑗 X ; 𝑎ഥ𝑗 =
𝑎𝑗
Var(𝑧𝑗 ) = 𝛽𝑗
How do we choose K ?
K is typically chosen based on how much information (variance) we want
to preserve:
∑𝐾
𝑗=1 𝛽𝑗
We choose the minimum K such that ∑𝑁
> T (where T is a threshold
𝑗=1 𝛽𝑗
e.g 0.9)
If T=0.9, for example, we “preserve” 90% of the information (variance) in
the data.
If K=N, then we “preserve” 100% of the information in the data (i.e., just a
change of basis). But that’ll be useless.
Computation - Example
1 2
3 3
3 5
5 4
5 6
6 5
8 7
9 8
The Scatter Plot
Y
9
0
0 1 2 3 4 5 6 7 8 9 10
Covariance Matrix
1
Var(X) = ∑𝑛𝑗=1(𝑋𝑗 − 𝑋ത )2 = 6.25
𝑛
1
Var(Y) = ∑𝑛𝑗=1(𝑌𝑗 − 𝑌ത )2 = 3.5
𝑛
1
Cov (X,Y) = ∑𝑛𝑗=1(𝑋𝑗 − 𝑋ത )(𝑌𝑗 − 𝑌ത ) = 4.25
𝑛
= 6.25 4.25
∑
4.25 3.5
Eigen Values
𝑎 = 𝛽𝑎 =>
∑. − 𝛽𝐼 𝑎 = 0
∑
6.25 − 𝛽 4.25
For 𝑣 ≠ 0 , ∑ − 𝛽𝐼 = 0 => =0
4.25 3.5 − 𝛽
𝛽1 = 9.34 & 𝛽2 = 0.41
9.34
Explained Variance of PC1 = = 96%
9.34+0.41
0.41
Explained Variance of PC2 = = 4%
9.34+0.41
Eigen Vectors
0.81
𝑎1 =
0.59
−0.59
𝑎2 =
0.81
𝑎1𝑇 . 𝑎2 = 0
𝑎1 0.8066
𝑎1 = =
𝑎1 0.587
𝑋
PC1 = 𝑎1 𝑇
𝑌
PC1 = 0.8066. X + 0.587. Y
The Principal Component