0% found this document useful (0 votes)

11 views28 pages

Lec 15

The document discusses Principal Component Analysis (PCA) and its applications, including dimensionality reduction, noise removal, and data representation. It highlights the differences between PCA and regression, the importance of mean centering, and various use cases such as image processing and recommendation systems. Additionally, it covers probabilistic PCA and its variants, emphasizing the ability to recover latent variables and handle missing data.

Uploaded by

myalternativemail6803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views28 pages

Lec 15

Uploaded by

myalternativemail6803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Applications of PCA

PCA – the inside story

Recap: given a matrix with
Rows of not SVD , find top singular
triplets of the SVD necessarily
In other words, findorthonormal
such that have orthonormal cols and
contain the largest singular vectors and is diagonal and
contains the largest singular values
Note: this gives which has rank ( has only non-zero entries)
Turns out that this matrix has many other nice properties too
Storing requires only space ( requires space)
is the best approximation to among all rank- matrices i.e. it
is the global optimum to the following problem:
For a matrix , the Frobenius norm is obtained by either
stretching into a long vector and taking its L2 norm or else
taking the L2 norm of the vector formed out of the singular
values of i.e.
Low dimensional Modelling
PCA can help you solve this problem. Find and set
and . As noted earlier, it will give us the best
possible approximation any with only columns
We may suspect that our
could havedata
given features, although
presented as -dimensional vectors, are really lying
on/close to some -dim
Given subspace
, can we recover In
other words, given ,
recover and such that

𝐱 𝑖
𝑊 𝐳 𝑖

Dictionary/
Factor Loading
=
matrix
PCA vs Regression
The previous setup may seem like regression where
“labels” are vectors and model is a matrix instead of a
vector
Linear Regression Low-rank Modelling
•, •,

• Observed data • Observed data

The most important difference is that in linear

regression, “features” are visible, in low-rank modelling
setting, they are absent (latent)
Shortcomings of PCA What PCA will
give us

PCA will reveal hidden

structure within data if that
hidden structure is a linear
subspace
PCA fails to reveal hidden
structure in data if data is lying
“Swiss Roll”
on curved (hyper) surfaces
data – used to
PCA may also fail if data is What we really be very
lying on an affine subspace want popular in ML
However, this can be easily
overcome by mean centering
data i.e. find and do PCA with
Mean centering removes
displacement
Mean centering and PCA

First (leading) component

Second component
Applications of PCAIsfindthere
–whether
Space
are quick way to
is small or
Savings not?
Given features , takes space to store, takes time to
apply a linear model to all data points i.e. compute
Perform PCA and approximate using top singular pairs
Find using power + peeling method
Approx as a rank matrix. PCA gives the best such rank approximation
Takes space to store
Warning: the above benefit is lost if we compute and store (instead,
store )
Applying linear model takes time
Warning: the above benefit is lost if we compute then compute
How to choose ? Various considerations that pose a
tradeoff
Time/Space Budget: choose small enough s.t. is
FindingSo,value
the right value of
first find . Then, once you have a PCA for some
of , compute the squared sum of singular
For , we have values
and you have got i.e. . Use this to get
Thus, ignoring peeling errors, we get
Fact:
Often, where
if you is the
plot how theerror
tracegoes
(sum of diagonal)
down as

‖ 𝑋 − ^𝑋 ‖2𝐹
“knee
, you will find that after some golden value of , error ” point
drops much
Now more
are slowly. This and
orthonormal “knee”
so point
are is a good
i.e. and place
to stop to get good bang-for-buck in terms of the
accuracy-speed tradeoff 𝑘
Used linearity of trace and to show this
Similarly, can show that
A prominent knee point, if one exists, gives us a good idea of
the true intrinsic dimensionality of the data
Applications Treat imagesof PCA
as matrix and – Noise
“smoothen” the image by taking a
Removal low-rank approximation
If data features are -dim but data really lying
close to a dim subspace then it may be
noise that is making the data -dim
PCA can extract the important (hidden) data
features – can then learn ML models
Given , compute PCA and use as a set of new -
dimensional features for the data points
Training algos would speed up if used with -dim features instead
of -dim
Testing may not speed up since for a given test point , we will
first have to find out its -dim representation – would need to
compute
Notice that we have (since is orthonormal) so even for training
features we can compute -dim rep by just hitting with
http://personales.upv.es/jmanjon/
Foreground Background
Separation

¿
+¿
10
Note: here we are treating images as vectors and a

Denoising, Foreground Extraction

group of images (say a video) is treated as a matrix –
this is different from the noise removal example where
every image was treated as a matrix itself
Make every frame a vector where is number of pixels
Thus, we have frames represented as a matrix
Background is a constant vector . Let
Foreground treated as noise . Let
This gives us i.e. if noise is not too much then is
approximately rank 1 
We can do PCA and recover as noise

Netrapalli et al, Non-convex Robust PCA, NIPS 2014

Applications of PCA – Learning
Prototypes
Given , compute PCA
Notice that i.e. we can think of as a dataset of prototypes
All points in dataset can be approximated well as a linear
combination of these prototypes – the linear combinations
are given by
Specifically, the -th data point (the -th row of ) can be
approximated as

Thus, PCA gives us a new way to get good prototypes to

explain data
GMMs earlier had given us one way to get good
prototypes
However, GMM did not assure us that data features could be
Note: here again, we are

Eigenfaces
treating images as vectors
[Sirovich
and a group of images is and Kirby, Turk and Pentland]
treated as a matrix
An iconic application of PCA – given face images, the
prototypes given by the leading few right singular
vectors are called “eigenfaces”
Images are treated as vectors for this (and many other)
application
Each topic here is represented by a prototypical
Latent Semantic Analysis
document for that topic. All documents are linear
representations of the topics. The amount of weight a
(LSA/LSI)
document places on a certain topic in its representation
Used totells
be usa avery popular
lot about what iscourse project
that document topic
talking 
about
e.g. if
Word of then the -th
caution: topic
PCA is really
itself will core
not tell to thewhether
you -th
Given documents, each document as bag-of-words representation
blah prototype is about sports or bleh prototype is
with dictionary size as very large , discover “topics”
about politics. You have to take a look at the prototype
about which the documents are talking
vector, also look at documents that place lots of
Topics could
weightbe
onsports, education,
that prototype politics,
and make thesescience,
deductions
entertainmentwords topics
separately words
topics
≈𝑛
docs

𝑋 ^^ ^
Word of caution: A recommendation
system where users have features, is
Recommendation Systems called content-based filtering. However,
in the setting in this slide, users have no
A popular techniquefeatures.
is RecSys is Collaborative
A drawback of collaborative Filtering
Have data for usersfiltering is that
and their it gets morewith
interactions difficult to
items
For these users, predictadd newitems
other users to thethey
that system would also
like
Done by discovering “user types” and representing each
user as a combination of these user types
items types items
types
≈𝑛
users
The Many Faces of PCA
Has been “discovered” several times
[Pearson, 1901; Hotelling, 1930] This line captures most of the
Gives us best possible (in terms of information in the data Why not
Frob. norm error) low-rank approx of get rid of the other information?
our data Save space, reduce noise!
Can be thought of as giving low-dim
reps of our feature vectors so that
pairwise L2 distances among them is
preserved – see multidimensional
scaling (MDS)
Also, gives us new basis with smaller
dim ( is orthonormal) s.t. in that basis,
data can be reconstructed with little
error
Can also be thought of as giving us the
directions along which the data has
maximum variance
PCA Minimizes Reconstruction
We can show that for as well, PCA offers
the best reconstruction error. In that case,
Error instead of optimizing over a unit vector ,
we would have to optimize over an
Already seen that for any matrix , its best rank approx. is
orthonormal matrix
obtained by by taking leading singular triplets
Proof for : want best rank approx. for i.e. want to fit all
data points on a 1D subspace i.e. find a unit vector s.t.
data is best represented along subspace spanned by
than any other vector
Claim: any vector is best represented in as
Proof: we want . Apply first order optimality now
Thus, reconstruction error for entire dataset is

This matches exactly the definition of the leading eigenvector of

PCA Preserves Maximum Data
Variance Projecting onto
Data may take similar values this line throws
along one direction, varied away a lot of
values along another information
Directional variance: given a about the data 
unit vector , directional var. along
is defined as variance of the r.v.
where we choose each data point
w.p.
We Assume
can show data
Directions with is centered
more
that for i.e.
directional
as well, PCA variance preserve more info
Thus,
offers
PCA ai.e.
does set of orthonormal
give us orthonormal directions with largest directional
directions
variance such that the total
directional variance captured
along
Thus, those directions
finding is the offering the maximum directional
the direction
variancemaximum.
is the same as finding the leading right singular vector
Probabilistic PCA
The real data was actually sampled from a -dim
standard Gaussian,Given
but ,got
can we recovermapped
linearly In to a -dim
other words, given ,
space and some noise got added
recover and such that

𝐱 𝑖
𝑊 𝐳 𝑖

Dictionary/
Factor Loading
=
matrix
Probabilistic
It is veryPCA
unusual [Tipping and Bishop,
for latent variables to simply
integrate out like this leaving behind a nice
1999] Gaussian density. We got very lucky here .
Given samples , we wish
Usually to recover
latent variables mean AltOpt/EM
Note: this is a generative problem, i.e. deals with generation of
feature vectors
As discussed before, the original data are “latent” – not seen
Also clear from the noise model that
More flexible models possible e.g. Factor Analysis – will see later
Will first see how to recover and then head into recovering
Some mildly painful integrals later we can get

The above also assumes

Thus, to simplify life, we can pretend for a moment that our
samples were really generated from and there are no in the
picture.
An MLE For in PPCA
If we decide to set (and not estimate it either) then we get
which, apart from the scaling with the eigenvalues, is just in
WePCA!
getThus, PCA PPCA (apart from a scaling factor) under the
samples from . The log-likelihood is
noiseless assumption 

where and
Note: is always invertible because of (imp. since )
Can apply first order optimality to get MLE (painful
derivatives though)
Thankfully, end result is very familiar. Let be
eigendecomposition of where and with
An ED always exists for since it is square symmetric
where
PPCA Variants
We can tweak several moving parts in the PPCA generative
process
Can instead assume that and estimate
Can assume non-spherical noise and estimate
A technique called Factor Analysis actually uses non-spherical noise
model
Since PPCA is a generative model, it can model missing data too
Suppose we have already found out using clean training data
If test data has missing features , use fact that marginals of Gaussians
are Gaussian. Since we know , we can see ( has only observed rows)
Missing test data is easier to handle, missing training data more
challenging
Need to apply the above trick when defining the likelihood of training data
points
Each training data point may have different coordinates missing. If we do not
Dimensionality Reduction using
PPCA
With PCA we got low-dim feat. easily
These made sense because these are -dim features which are just a
rotation away (using ) from features which we know approximate very
well
With PPCA too we can we recover the original low-dim features by
treating them as latent variables and applying AltOpt or EM
Need to be careful since these latent variables are (continuous) vectors
now!
Earlier, we used a shortcut to get MLE for . To get hold of we
need proper AltOpt/EM

AltOpt will approximate integral using a single term (a single value for )
EM will lower bound the integral using another (easier to compute)
integral
Need to replace “sum” over possible values of with “integral” since
These derivations are routine but tedious. You

PPCA – Alternating Optimization

can check [BIS] Chapter 12 for details.
Warning: equation 12.42 in that book has an
error. The correct expression is given in this
We wish to get slide
Same old Because
peskythis the mean of (actually
sum-log-sum conditioned on . It is the
sum-log-integral) form
marginal
– difficult  (unconditional) mean of that is . Suppose
we know
Approximate that is aby
integral vector far offdominant
its most from the origin.
termThe
it is likely that as well which immediately tells that
with
Approx. integral by
Weaknow
single term
that may
. Then hownot be ?bad if the dist.
come
has small variance – advantage of being cheaper than EM in
Because
computation of some beautiful coincidences: 1) it turns
time
out that is a Gaussian, 2) for Gaussians, mean is
Thus, wemode
wishwhich
to solve
means that 3) is the MLE solution to the
where and problem which is indeed a vector least squares
regression
Since for Gaussians, mode problem
is mean, we have
ThisWhy
gives usthe
does one of the alternating
expression updates,
look like least lets derive the
squares?
other
PPCA – Alternating Optimization
Thus, if are fixed, we can obtain using first order
optimality
where and
See [BIS] Chap 12 for detailed derivations - check that
dimensionalities match
Apply first order optimality to get
Once is known, can also be found using first order
optimality
PPCA – Alternating Optimization
Total of time taken per iteration. In contrast, PCA using
power + peeling method takes only time if we do power
updates as which takes time per power step
ALTERNATING OPTIMIZATION time to calculate the
1. Initialize inverse term and time to
2. For calculate all terms
afterward
1. Update fixing In practice, PCA is usually
1. Let , for faster than AltOpt PPCA –
2. Update fixing no inverses required to
solve PCA, just simple
1. Calculate and mat-vec multiplication
2. Update steps 
3. Calculate time for , time for and
4. Update finally time for
PPCA – Expectation Maximization
We wish to get
As before, given a model , let and lower bound the integral
using Jensen’s inequality (the term does not depend on )

Some simple (but non-trivial) calculations show that the

resulting EM algorithm looks very similar to the AltOpt with a
few simple changes
Replace with
This is the same as we used in AltOpt since for Gaussian, mode is the
same as mean
Replace with

Rest of the algorithm (can be shown to) remain the same (see
[BIS] Chap 12)
Time complexity of
PPCA – Expectation MaximizationPPCA using EM
roughly same as that
EXPECTATION MAXIMIZATION
of PPCA using AltOpt.
1. Initialize
2. For
1. Update fixing
1. Let , for
2. Let
2. Update fixing
1. Update
2. Calculate
3. Update

List of National Anthems
No ratings yet
List of National Anthems
28 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
PCA Basics
No ratings yet
PCA Basics
1 page
PCA
100% (1)
PCA
33 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Week 1
No ratings yet
Week 1
19 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
Pca
No ratings yet
Pca
6 pages
Question and Answer PCA
No ratings yet
Question and Answer PCA
4 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Computer Vision and Image Processing - Fundamentals and Applications
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
Presentation
No ratings yet
Presentation
31 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
7 pages
20 Pca
No ratings yet
20 Pca
50 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
DimensionalityReduction Pca
No ratings yet
DimensionalityReduction Pca
24 pages
PCALDAICA
No ratings yet
PCALDAICA
28 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Matrix Principal Component Analysis For Image Compression and Recognition
No ratings yet
Matrix Principal Component Analysis For Image Compression and Recognition
6 pages
Love Report
No ratings yet
Love Report
7 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Unit 3
No ratings yet
Unit 3
102 pages
Lecture8 2015
No ratings yet
Lecture8 2015
51 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
UploadFile 9116
No ratings yet
UploadFile 9116
21 pages
3 - Feature Extraction
No ratings yet
3 - Feature Extraction
22 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
PCA and Sparse PCA Principal Component Analysis
No ratings yet
PCA and Sparse PCA Principal Component Analysis
2 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
08 Biometrics Lecture 8 Part3 2009-11-09
No ratings yet
08 Biometrics Lecture 8 Part3 2009-11-09
24 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Project LA
No ratings yet
Project LA
13 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
PR ELO NelsonMandela Worksheet2
No ratings yet
PR ELO NelsonMandela Worksheet2
7 pages
CSE2005 Lab Da1
No ratings yet
CSE2005 Lab Da1
25 pages
Literary Devices English 4
No ratings yet
Literary Devices English 4
3 pages
Computer1 - Lesson 2 - SELECTING AND FORMATTING TEXT - 000
No ratings yet
Computer1 - Lesson 2 - SELECTING AND FORMATTING TEXT - 000
2 pages
Concept Map
No ratings yet
Concept Map
1 page
Defining - Non-Defining New Version
No ratings yet
Defining - Non-Defining New Version
6 pages
(Lecture 1) Andrew Marvell, To His Coy Mistress - Explanation
No ratings yet
(Lecture 1) Andrew Marvell, To His Coy Mistress - Explanation
5 pages
Chapter III CONTRASTIVE SYNTAX ANALYSIS
No ratings yet
Chapter III CONTRASTIVE SYNTAX ANALYSIS
39 pages
Haskell and Yesod
100% (1)
Haskell and Yesod
265 pages
Introduction Course
No ratings yet
Introduction Course
6 pages
DocumentType Search
No ratings yet
DocumentType Search
7 pages
Windows XP Visual Guidelines
No ratings yet
Windows XP Visual Guidelines
49 pages
Consolidated Grading Sheets For 2nd Quarter 2022 2023
No ratings yet
Consolidated Grading Sheets For 2nd Quarter 2022 2023
6 pages
MP Board 10th Result 2024 - MPBSE Class 10 Result, Check and Download Madhya Pradesh Board of Secondary Education HSC Result - एमपी बोर्ड 10वीं का रिजल्ट - AajTak
No ratings yet
MP Board 10th Result 2024 - MPBSE Class 10 Result, Check and Download Madhya Pradesh Board of Secondary Education HSC Result - एमपी बोर्ड 10वीं का रिजल्ट - AajTak
1 page
Learn These 4 Word Stress Rules To Improve Your Pronunciation
No ratings yet
Learn These 4 Word Stress Rules To Improve Your Pronunciation
5 pages
Teaching by Principles: Group 1
No ratings yet
Teaching by Principles: Group 1
32 pages
History Y5 2019 3rd Term
No ratings yet
History Y5 2019 3rd Term
11 pages
EditDistance
No ratings yet
EditDistance
28 pages
MAT 200 Course Notes On Geometry 5
No ratings yet
MAT 200 Course Notes On Geometry 5
7 pages
1) Functions in C/C++
No ratings yet
1) Functions in C/C++
40 pages
2002 Amc 10B
No ratings yet
2002 Amc 10B
6 pages
Trung Tâm Anh NG Nhung PH M 27N7A KĐT Trung Hòa Nhân Chính - 0944 225 191
No ratings yet
Trung Tâm Anh NG Nhung PH M 27N7A KĐT Trung Hòa Nhân Chính - 0944 225 191
2 pages
CSSE 3113: Software Engineering: Assignment 2
No ratings yet
CSSE 3113: Software Engineering: Assignment 2
3 pages
Activity Microcurricular - Planning - 1 - First - Baccalaureate
No ratings yet
Activity Microcurricular - Planning - 1 - First - Baccalaureate
8 pages
Bios User Guide
No ratings yet
Bios User Guide
263 pages
IBM AIX7 官方培训文档
No ratings yet
IBM AIX7 官方培训文档
495 pages
Sentence Stems For Comprehension Strategies
No ratings yet
Sentence Stems For Comprehension Strategies
10 pages
Design and Software Development For Vaccine Management System Using Java
No ratings yet
Design and Software Development For Vaccine Management System Using Java
3 pages
Build Your Own Memory-Powered Chatbot With Google Generative AI, LangChain, and Gradio - by Vinod Pillai - Nov, 2024 - Medium
No ratings yet
Build Your Own Memory-Powered Chatbot With Google Generative AI, LangChain, and Gradio - by Vinod Pillai - Nov, 2024 - Medium
13 pages

Lec 15

Uploaded by

Lec 15

Uploaded by

Applications of PCA

PCA – the inside story

• Observed data • Observed data

The most important difference is that in linear

PCA will reveal hidden

First (leading) component

Denoising, Foreground Extraction

Netrapalli et al, Non-convex Robust PCA, NIPS 2014

Thus, PCA gives us a new way to get good prototypes to

This matches exactly the definition of the leading eigenvector of

The above also assumes

PPCA – Alternating Optimization

Some simple (but non-trivial) calculations show that the

You might also like