AE - Tema 2 - Principal Component Analysis
AE - Tema 2 - Principal Component Analysis
This technique reduces the dimensionality of a data set containing a large number
of (interrelated) variables. It was initially developed by Pearson (1901) although
it was not until 1933 that it obtained its algebraic formulation. To this end, the
original variables are transformed into a new set of variables (called principal
components) that are orthogonal and uncorrelated and that can be ordered
according to the variance they explain of the original variables.
α1 T X) = α1 T D(X)α
Var(α α1 = α1 T Σα
α1 ,
max α 1 T Σα
α1 ,
α1
s.t. α 1 T α 1 = 1.
Using Lagrange multipliers
max α 1 T Σα α1 T α 1 − 1),
α1 − λ(α
α1
Deriving
(Σ + ΣT )α
α1 − λ(I + I T )α α1 − 2λα
α1 = 2Σα α1 = 0 → Σα
α1 = λα
α1 ,
max α 1 T Σα
α1 = max α 1 T λα
α1 = max λ.
α1 α1 α 1 T α 1 =1 α1
1
Derivation. Let’s start with the second principal component:
α2 T X),
max Var(α
α2
s.t. α 2 T α 2 = 1,
α1 T X, α 2 T X) = 0.
cov(α
which is equivalent to
max α 2 T Σα
α2 ,
α2
s.t. α 2 T α 2 = 1,
α 1 T α 2 = 0.
α2 T X, α 1 T X) = α 2 T Σα
In fact, cov (α α1 = α 2 T λα α2 T α 1 = 0 ⇔ α 2 T α 1 = 0.
α1 = λα
Using the Lagrange multipliers
max α 2 T Σα α2 T α 2 − 1) − ϕ(α
α2 − λ(α α1 T α 2 − 0).
α2
α2 − 2λα
2Σα α2 − ϕα
α1 ,
multiplying by α 1 T
α −2λ α 1 T α 2 −ϕ α 1 T α 1 = 0 ⇒ ϕ = 0.
2 α 1 T Σα
| {z }2 | {z } | {z }
0 0 1
max α m T Σα
αm ,
αm
s.t. α m T α m = 1,
α 1 T α m = 0, ∀i = 1, · · · , m − 1.
Using the Lagrange multipliers
m−1
X
T T
αm − λ(α
max α m Σα αm α m − 1) − αm T α i − 0).
ϕi (α
α2
i=1
Multiplying by αj , j = 1, · · · , m − 1, then
2
m−1
X
αm −2λ αj T αm −
2 αj T Σα ϕiαj T αi = 0 ⇒ ϕj = 0.
| {z } | {z } i=1
0 0 | {z }
ϕj
αm = λα
As before Σα αm , λ is an eigenvalue associated to the eigenvector α m .
Some important observations:
Let λ1 , · · · , λp the eigenvalues of Σ, and α 1 , · · · , α p its eigenvectors, then the
principal components are:
T
Y1 =
.. α 1 X,
.
T
Ym =.. α m X,
.
Yp = α p T X.
α1 T X · · · α p T X] = P T X.
Y = [Y1 · · · Yp ] = [α
Theorem. The sum of the variances of the original variables is equal to the sum
of the variances of the principal components.
Proof.
p p p
X X X
T T
Var(X) = tr(Σ) = tr(P ΛP ) = tr(P P Λ) = tr(Λ) = λi = Var(Yi ).
i=1 | {z } i=1 i=1
trace is invariant by ciclic permutations
3
The above theorem establishes a very useful relationship between the variance
of the original variables and the variance of the principal components. However,
it is possible to establish a more general relationship that relates the dispersion
matrix of the original variables with the main axes.
Σ = λ1α 1α 1 T + · · · + λpα pα p T .
1/2
cov(Xi , Yi ) λj pij λj
corr(Xi , Yi ) = p p = √ p = pij .
Var(Xi ) Var(Yi ) σii λj σii
1
Σl = GT G, (l = long),
n−1
1
Σs = GGT , (s = sort),
n−1
and let’s Λs and Λl the non-zero eigenvalues, Λs = Λl = Λ. We have that
1
Σs ϕs = ϕs Λ ⇒ GGT ϕs = ϕs Λ,
n−1
1
⇒ GT GGT ϕs = GT ϕs Λ,
n−1
⇒ Σl (GT ϕs ) = (GT ϕs )Λ,
⇒ GT ϕs are the non-zero n − p eigenvectors of Σl .