0% found this document useful (0 votes)
18 views22 pages

Lecture5 Appendix

The document contains lecture notes on principal component analysis, including proofs of key results and examples illustrating the application of eigenvalues and eigenvectors in analyzing variance and covariance in datasets. It discusses the derivation of principal components, their properties, and how they relate to the total population variance. Several examples are provided to demonstrate the calculations and interpretations of the results in practical scenarios.

Uploaded by

Nasir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

Lecture5 Appendix

The document contains lecture notes on principal component analysis, including proofs of key results and examples illustrating the application of eigenvalues and eigenvectors in analyzing variance and covariance in datasets. It discusses the derivation of principal components, their properties, and how they relate to the total population variance. Several examples are provided to demonstrate the calculations and interpretations of the results in practical scenarios.

Uploaded by

Nasir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Math3806 Lecture Note 5 Appendix

Heng Peng

March 27, 2020


P5. Result 5.1. Proof: Let a∗ = a/kak, then
( p ) p
aT Σa ∗T ∗ ∗T
X ∗
X
= a Σa = a λ i e i eT
i a = λi (a∗T ei )2
aT a i=1 i=1
p
( p )
X ∗T 2
X ∗T T ∗
≤ λ1 (a ei ) = λ1 a ei ei a
i=1 i=1

= λ1 a∗T PP T a∗ = λ1 ka∗ k2 = λ1
Hence when a = e1
aT Σa
max = λ1
a6=0 aT a

Similar Let a∗ = a/kak with aT ei = 0, i = 1, . . . , k


( p ) p
aT Σa ∗T ∗ ∗T
X X
T
= a Σa = a λi ei ei a∗ =
T
λi (a∗T ei )2
a a i=1 i=1
p
( p )
X X ∗T T ∗
≤ λk+1 (a∗T ei )2 = λk+1 a ei ei a
i=k+1 i=1
∗T T ∗ ∗ 2
= λk+1 a PP a = λk+1 ka k = λk+1
So when a = ek+1
aT Σa
max = λk+1
a6=0,a⊥e1 ,...,ek aT a
I Result 5.1 continuous.

Var(Yi ) = Var(eTi X ) = eTi Var(X )ei = eTi Σei


= eTi (λi ei ) = λi eTi ei = λi , i = 1, . . . , p.

Cov(Yi , Yk ) = Cov(eTi X , eTk X ) = eTi Cov(X )ek


= eTi Σek = eTi (λk ek ) = λk eTi ek = 0, i 6= k.

P5. Result 5.2.


p
X
σ11 + · · · + σpp = Var(Xi ) = Trace(Σ) = Trace(PΛP T )
i=1

= Trace(ΛP T P) = Trace(Λ) = λ1 + · · · + λp .

Proportion of total population variance due to kth principle


component is
λk λk
=
λ1 + · · · + λp σ1 + · · · + σp
P6. Result 5.3. Let aTk = [0, . . . , 0, 1, 0, . . . , 0] so that Xk = aTk X and then

Cov(Xk , Yi ) = Cov(aTk X , eTi X ) = aTk (λi ei ) = λi aTk ei = λi eik

Hence

Cov(Yi , Xk ) λi eik eik λi
ρYi ,Xk = p p = √ √ = √
Var(Yi ) Var(Xk ) λi σkk σkk

I Example 5.1.
λ1 = 5.83, eT1 = [.383, −.924, 0]
λ2 = 2.00, eT2 = [0, 0, 1]
λ3 = 0.17, eT3 = [.924, .383, 0]
Therefore, the principal components are

Y1 = eT1 X = .383X1 − .924X2

Y2 = eT2 X = X3
Y3 = eT3 X = .924X1 + .383X2

Var(Y1 ) = (.383)2 Var(X1 ) + (−.924)2 Var(X2 ) + 2(.383)(−.924)Cov(X1 , X2 )


= .147(1) + .854(5) − .708(−2) = 5.83 = λ1
I Example 5.1 continuous.

Cov(Y1 , Y2 ) = Cov(.383X1 − .924X2 , X3 )


= .383Cov(X1 , X3 ) − .924Cov(X2 , X3 ) = 0

σ11 + σ22 + σ33 = 1 + 5 + 2 = λ1 + λ2 + λ3 = 5.83 + 2.00 + .17


λ1 λ1 + λ2
= .73, = .98
λ1 + λ2 + λ3 λ1 + λ2 + λ3
√ √
e11 λ1 .383 5.83
ρY1 ,X1 = √ = √ = .925
σ11 1
√ √
e12 λ1 −.924 5.83
ρY1 ,X2 = √ = √ = −.998
σ22 5
√ √
e13 λ1 0 5.83
ρY1 ,X3 = √ = √ =0
σ33 2
P9. Result 5.4. The result from the above result with Z1 , . . . , Zp in place of
X1 , . . . and ρ in place of Σ.
I Proportion of (standardized) population variance due to kth principle
λk
component is p
, k = 1, . . . , p where the λ0k s are the eigenvalue of ρ.
P10. Example 5.2. The eigenvalue-eigenvector pairs from Σ are
λ1 = 100.16, eT1 = [.040, .999]
λ2 = .84, eT2 = [.999, −.040]
λ1 100.16
= = .992
λ1 + λ2 101
The respective principal components based on Σ are
Y1 = .040X1 + .999X2 , Y2 = .999X1 − .040X2 .
Similarly, the eigenvalue-eigenvector pairs from ρ are
λ1 = 1 + ρ = 1.4, eT1 = [.707, .707]
λ2 = 1 − ρ = .6, eT2 = [.707, −.707]
Then the respective principal component based on ρ are
X1 − µ1 X2 − µ2
Y1 = .707Z1 + .707Z2 = .707 + 0707
1 10
= .707(X1 − µ1 ) + .0707(X2 − µ2 )
X1 − µ1 X2 − µ2
Y2 = .707Z1 − .707Z2 = .707 − 0707
1 10
= .707(X1 − µ1 ) − .0707(X2 − µ2 )
λ1 1.4
= = .7
p 2
P10. (1) For the diagonal covariance or correlation matrix, (σii , ei ) is the ith
eigenvalue-eigenvector pair with eTi X = Xi or eTi = [0, . . . , 1, . . . , 0].
Hence the set of principle components is just the original set of
uncorrelated random variables.
I (2) It is not difficult to show (as excercise)that the p eigenvalues of the
correlation matrix can be divided into two groups. When ρ is positive, the
largest is
λ1 = 1 + (p − 1)ρ
with associate eigenvector
 
1 1 1
eT1 = √ , √ · · · , √
p p p

The remaining p − 1 eigenvalues are

λ2 = λ3 = · · · = λp = 1 − ρ
I Continuous: and their eigenvectors are
 
1 −1
eT2 = √ ,√ , 0, . . . , 0
1×2 1×2
 
1 1 −2
eT3 = √ ,√ ,√ 0, . . . , 0
2×3 2×3 2×3
...... ......
" #
1 1 −(i − 1)
eTi = p ,··· , p ,p , 0, . . . , 0
(i − 1) × i (i − 1) × i (i − 1) × i
...... ......
" #
T 1 1 −(p − 1)
ep = p ,··· , p ,p
(p − 1) × p (p − 1) × p (p − 1) × p
The first principal component
p
1 X
Y1 = eT1 Z = √ Zi
p i=1

explaining a proportion
λ1 1 + (p − 1)ρ 1−ρ
= =ρ+
p p p
of the total population variation.
P16. Example 5.4. The natural logarithms of the dimension of 24 male turtles
have sample mean vector x̄T = [4.725, 4.478, 3.703] and covariance matri
 
11.072 8.019 8.160
−3 
S = 10 8.019 6.417 6.005 
8.160 6.005 6.773

The first principal component

ŷ1 = .683 ln(length) + .510 ln(width) + .523 ln(height)

which explains 96% of the total variance.


P23. Example 5.5. Let x1 , x2 , . . . , x5 denote the observed weekly rates of return
for JP Morgan, Citibank, Well Fargo, Royal Dutch Shell, and
ExxonMobil, respectively. Then

x̄T = [.0011, .0007, .0016, .0040, .0040]

and  
1.000 .632 .511 .115 .155

 .632 1.000 .574 .322 .213 

R=
 .511 .574 1.000 .183 .146 

 .115 .322 .183 1.000 .683 
.155 .213 .146 .683 1.000
We note that R is the covariance matrix of the standardized observations
x1 − x̄1 x2 − x̄2 x5 − x̄5
z1 = √ , z2 = √ , . . . , z1 = √
s11 s22 s55
I Example 5.5. Continuous. The eigenvalues and corresponding normalized
eigenvectors of R, determined by a computer, are

λ̂1 = 2.437, êT1 = [.469, .532, .465, .387, .361]


λ̂2 = 1.407, êT2 = [−.368, −.236, −.315, .585, .606]
λ̂3 = .501, êT3 = [−.604, −.136, .772, .093, −.109]
λ̂4 = .400, êT4 = [.363, −.629, .289, −.381, .493]
λ̂5 = .255, êT5 = [.384, −.496, .071, .595, −.498]

Under the standardized variables, we obtain the first two sample principal
components:

ŷ1 = eT1 z = .469z1 + .532z2 + .465z3 + .387z4 + .361z5


ŷ2 = eT2 z = −.368z1 − .236z2 − .315z3 + .585z4 + .606z5
I These components, which account for
!  
λ̂1 + λ̂2 2.437 + 1.407
100% = 100% = 77%
p 5

I The first principle component: a general stock-market component, The


second principle component: an industry component.
P24. Example 5.6. The eigenvalues of the covariance matrix are
λ̂1 = 3.085, λ̂2 = .383, λ̂3 = .342, λ̂4 = .217
We note that the first eigenvalue is nearly equal to
1 + (p − 1)¯
r = 1 + (4 − 1)(.6854) = 3.056
where r¯ is the arithmetic average of the off-diagonal elements of R.
I The remaining eigenvalues are small and about equal, though λ̂4 is
somewhat smaller than λ̂2 and λ̂3 . So there is some evidence that the
corresponding population correlation matrix ρ may be of the
“equal-correlation” form.
I The first component

ŷ1 = ê1 z = .49z1 + .52z2 + .49z3 + .50z4


accounts for
100(λ̂1 /p)% = 100(3.058)/4% = 76%
of the total variance.
*** Comment: Although “large” eigenvalues and the corresponding
eigenvectors are important in a principle component analysis, eigenvalues
very close to zero should not be routinely ignored. The eigenvectors
associated with these latter eigenvalues may point out linear
dependencies in the data set that can cause interpretive and
computational problems in a subsequent analysis.
P27.

E(X − µ)(X − µ)T = E(LF + ε)(LF + ε)T


= ELF(LF)T + Eε(LF)T + ELFεT + EεεT
= LE(FFT )LT + E(εFT )LT + LE(FεT ) + EεεT
= LLT + Ψ

I Communality:

hi2 = `2i1 + `2i2 + · · · + `2im

Specific variance: ψi
P27. Example 5.7.
     
19 30 2 12 4 1 2 0 0 0
 30 57 5 23   7 2   0 4 0 0 
Σ= = +  = LLT +Ψ
 2 5 38 47   −1 6   0 0 1 0 
12 23 47 68 1 8 0 0 0 3

The Communality of X1 is

h12 = `211 + `212 = 42 + 12 = 17

The special variance is ψ1 = 2. Hence

19 = 42 + 12 + 2 = 17 + 2
P28. Example 5.8. If Σ can be factored by a factor analysis model with
m = 1, then
X1 − µ1 = `11 F1 + ε1
X2 − µ2 = `21 F1 + ε2
X3 − µ3 = `31 F1 + ε3
or
1 = `211 + ψ1 , .90 = `11 `21 , .70 = `11 `31
1 = `221 + ψ2 , .40 = `21 `31 , 1 = `231 + ψ3
The pair of equations
.70 = `11 `31 , .40 = `21 `31
implies that  
.40
`21 = `11
.70
Substituting this result for `21 in the equation .90 = `11 `21 , yields
`211 = 1.575

or `11 = ±1.255.
I Example 5.8. Continuous.
Since Var(F1 ) = 1 by assumption and Var(X1 ) = 1,
`11 = Cov(X1 , F1 ) = Corr(X1 , F1 ) which cannot be greater than
unity. So from this point of view |`11 | = 1.225 is too large. Also,
the equation
1 = `211 + ψ1
gives
ψ1 = 1 − 1.575 = −.575
which is unsatisfactory, since it gives a negative value for
Var(ε) = ψ1 . So the solution is not consistent, and is not a proper
solution.
P29. if λ̂1 , . . . , λ̂m are relative large compared to λ̂m+1 , . . . , λ̂p

L̃L̃T = λ̂1 ê1 êT1 + · · · + λ̂m êm êTm ≈ S

m
X
ψ̃i = sii − `˜2ij
j=1

P30. q q
`˜211 + `˜221 + · · · + `˜2p1 = ( λ̂1 ê1 )T ( λ̂1 ê1 ) = λ̂1 .
P31 Example 5.9.
 
.56 .82
 .78 −.53 
 .56

T

 .65 .78 .65 .94 .80
L̃L̃ + Ψ̃ = .75 
  .82 −.53 .75 −.10 −.54
 .94 −.10 
.80 −.54
   
.02 0 0 0 0 1 .01 .97 .44 .00
 0 .12 0 0 0   1 .11 .79 .91 
   
+  0 0 .02 0 0 = 1 .53 .11 
   
 0 0 0 .11 0   1 .81 
0 0 0 0 .07 1
P33. Example 5.10.
 
0 −.099 −.185 −.025 .056
 −.099 0 −.134 .014 −.054 
R − L̃L̃T − Ψ̃ = 
 
 −.185 −.134 0 .003 .006 

 −.025 .014 .003 0 −.156 
.056 −.054 .006 −.156 0

P35. Corrected:
L̂z = V̂−1/2 L̂, Ψ̂z = V̂−1/2 Ψ̂V̂−1/2
Or given the estimated loadings L̂z and specific variance Ψ̂z
obtained from R, the resulting maximum likelihood estimates for a
factor analysis of the covariance matrix [(n − 1)/n]S are

L̂ = V̂1/2 L̂z , Ψ̂ = V̂1/2 Ψ̂z V̂1/2

or
`ˆij = `ˆz,ij
p
σ̂ij and ψ̂i = ψ̂z,i σ̂ii
P33. Example 5.11.
 
0 .001 −.002 .000 .052
 .001 0 .002 .000 −.033 
T
 
R − L̃L̃ − Ψ̃ = 
 −.002 .002 0 .000 .001 

 .000 .000 .000 0 .000 
.052 −.033 .001 .000 0

Factor 1, market factor, Factor 2, banking factor


P40. Example 5.13. Factor 1: General Intelligence, Factor 2: bipolar
factor . After Rotation, Factor 1: mathematical ability , Factor 2:
verbal ability
P43. Example 5.14. After rotation, Factor 1: nutritional factor, Factor 2:
taste factor
P45. Example 5.15. After rotation, Factor 1: Unique economic forces
that cause bank stocks to move together, Factor 2: economic
conditions affecting oil stocks.
P46. Example 5.12 and 5.16. principle component estimate: Factor 1:
general athletic ability, Factor 2, running endurance factor, The
remaining factors cannot be easily interpreted to our minds.

Maximum likelihood estimate: Before rotation: Factor 1: General


athletic ability , Factor 2, strength ability , Factor 3, running
endurance ability, Factor 4, Jumping ability or leg ability ???

After rotation, Factor 1: explosive arm strength, Factor 2: explosive


leg ability. Factor 3, running speed, Factor 4, running endurance
P50. The joint distribution of (X − µ) and F is

Σ = LLT + Ψ L
 

Σ =
LT I

Then

E(F|x) = LT Σ−1 (x − µ) = LT (LLT + Ψ)−1 (x − µ)

and

Cov(F|x) = I − LT Σ−1 L = I − LT Σ−1 L = I − LT (LLT + Ψ)−1 L

I If rotated loadings L̂∗ = L̂T are used in place of the original


loadings, the subsequent factor scores f̂j∗ are related to f̂j by

f̂j∗ = Tf̂j , j = 1, 2, . . . , n.
P51. Example 5.16.
   
.763 .024 .42 0 0 0 0
 .821 .227   0 .27 0 0 0 
L̂∗z = 
   
 .669 .104 and Ψ̂z =  0 0 .54 0 0
 
  
 .118 .993   0 0 0 .00 0 
.113 .675 0 0 0 0 .53

The vector of standardized observations,

zT = [.50, −1.40, −.20, −.70, 1.40]

Yield the following scores on factor 1 and factor 2


 
−.61
f̂ = (L̂∗T
z Ψ̂ −1 ∗ −1 ∗T −1
z L̂z ) L̂z Ψ̂ z z =
−.61

You might also like