Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke
Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke
OF
AGRICULTURE & TECHNOLOGY
JKUAT
tion/answer.
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 2
Back Close
Table of Contents
3. Factor Analysis
1. Introduction to factor Analysis
1.1. Factor analysis versus principal component
JKUAT
analysis
SODeL
trix
3.4. Common factors are uncorrelated with the
residuals
3.5. But the uniquenesses aren’t independent
• A “residual” matrix
3.6. Proportion of variance explained
JJ II
• Exercise
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 4
Back Close
DBA 6434 Statistics for Business Sciences
4. Maximum likelihood solutions
4.1. Developing the maximum likelihood solu-
tion
• The log likelihood • Maximising the log-likelihood
(subject to diagonality constraint)
4.2. Hypothesis testing
JKUAT
5. Rotation
6. Scoring
6.1. The weighted least squares estimates are:
6.2. Or
6.3. Bayesian methods
7. Summary
JJ II
7.1. Revision questions or guidelines
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 5
Back Close
DBA 6434 Statistics for Business Sciences
JKUAT Solutions to Exercises
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 6
Back Close
DBA 6434 Statistics for Business Sciences
LESSON 3
Factor Analysis
Learning outcomes
Upon completing this topic, you should be able to:
JKUAT
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 7
Back Close
DBA 6434 Statistics for Business Sciences
1. Introduction to factor Analysis
• Most development outside the statistical community, es-
pecially Psychometrics
• Aim is to produce a small number of “latent variables”
which have some substantive interpretation
JKUAT
we don’t cover
Back Close
DBA 6434 Statistics for Business Sciences
Exercise 1. Reification: if a number of observed correlated
variables are all manifestations of some underlying phenomenon,
the claim is that factor analysis recovers this underlying struc-
ture
Back Close
DBA 6434 Statistics for Business Sciences
2.1. Idealised factor analysis model?
X3
*
f1
H
@HH
@ H
H
JKUAT
@ HH
@ HH
X2
SODeL
@ j
H
@ *
@
@
@@
f2
HH
R
HH
HH
HH
Hj
H X1
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 10
Back Close
DBA 6434 Statistics for Business Sciences
Some key words
• We have p observed or manifest variables
• We wish to represent these by q < p mutually uncorrelated
common factors.
• We have some uncorrelated residuals specific to each of the
JKUAT
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 11
Back Close
DBA 6434 Statistics for Business Sciences
2.2. The factor model
The orthogonal model underlying Factor Analysis can be de-
scribed as follows:
x = µ + Γφ + ζ
JKUAT
JJ II • Σφ = I.
J I • ζ is 1 × p unobserved random error vector having mean 0
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 12
Back Close
DBA 6434 Statistics for Business Sciences
and by assumption a diagonal covariance ψ referred to as
the uniqueness or specific variance.
• Consequences
With the above assumptions:
• , cov(φ, ζ) = 0,
JKUAT
• if Σφ = I then cov(x, φ) = Γ.
SODeL
x ∼ N ormal(µ, ΓΓT + ψ)
Back Close
DBA 6434 Statistics for Business Sciences
q
X
x1 = µ 1 + γ1k φk + ζ1
k=1
q
X
x2 = µ 2 + γ2k φk + ζ2
k=1
JKUAT
..
.
SODeL
q
X
xp = µ p + γpk φk + ζp
k=1
• A key result
2 2 2
var(xj ) = γj1 + γj2 + . . . + γjq + var(ζj ) (3.1)
| {z } | {z }
communalities U niqueness
JJ II
A lot of parameters for the amount of data!
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 14
Back Close
DBA 6434 Statistics for Business Sciences
• The covariance matrix Σ has p(p + 1)/2 parameters,
• The factor model ΓΓT + ψ has qp − q(q − 1)/2 + p param-
eters.
So, p(p + 1)/2 ≥ qp − q(q − 1)/2 + p, or:
√
2p + 1 − 8p − 1
JKUAT
q≤ (3.2)
2
SODeL
• On a related note . . .
JJ II Degrees of freedom after fitting a q factor model:
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 15
Back Close
DBA 6434 Statistics for Business Sciences
q
X
xj − µ j = γk φk + ζj ; j = 1, . . . , p (3.4)
k=1
Back Close
DBA 6434 Statistics for Business Sciences
So, regardless of the data matrix used, factor analysis is es-
sentially a model for Σ, the covariance matrix of x,
Σ = ΓΓT + ψ
• Identifiability
SODeL
Back Close
DBA 6434 Statistics for Business Sciences
2.5. Strategy for factor analysis
To fit the model, we therefore need to:
• Estimate the number of common factors q.
• Estimate the factor loadings Γ
• Estimate the specific variances ψ 2
JKUAT
JJ II Σ = EΛE T
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 18
Back Close
DBA 6434 Statistics for Business Sciences
p
λp ep
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 19
Back Close
DBA 6434 Statistics for Business Sciences
Or in matrix notation:
Z = EX (3.6)
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 20
Back Close
DBA 6434 Statistics for Business Sciences
z1 X1 e11 e12 . . . e1p
z2 X2 e21 e22 . . . e2p
• Z= .
.
, X = . and E =
. .. .. . . . .. .
. . . . .
zp Xp ep1 ep2 . . . epp
So, multiplying both sides of 3.6 by E −1 gives:
JKUAT
E −1 Z = X
SODeL
(3.7)
3.1. Finding E −1
We know orthogonal matrices generally that E −1 = E T so we
can invert the transformation by using
JJ II X = ET Z (3.8)
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 21
Back Close
DBA 6434 Statistics for Business Sciences
which can be expanded as:
x1 = e11 z1 + e21 z2 + . . . + ep zp
x2 = e12 z2 + e22 z2 + . . . + ep zp
..
JKUAT
.
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 22
Back Close
DBA 6434 Statistics for Business Sciences
p z1 p z2 p zp
x1 = (e11 λ1 ) √ + (e21 λ2 ) √ + . . . + (ep1 λp ) p
λ1 λ2 λp
p z1 p z1 p zp
x2 = (e12 λ1 ) √ + (e12 λ1 ) √ + . . . + (ep2 λp ) p
λ1 λ1 λp
..
JKUAT
.
z1 z2 zp
SODeL
p p p
xp = (e1p λ1 ) √ + (e2p λ2 ) √ + . . . + (epp λp ) p
λ1 λ2 λp
• The model
p p
• Set γjk = (ejk λj ) and φj = zj / λj
A clear link with the factor analysis model. Our loadings matrix
p
Γ is the p × p matrix where the jth column is given by λj ej
JJ II
so that:
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 23
Back Close
DBA 6434 Statistics for Business Sciences
S = ΓΓT
..
.
xp = e1p z1 + e2p z2 + . . . + eqp zq + eq+1,p zq+1 + . . . + epp zp
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 24
Back Close
DBA 6434 Statistics for Business Sciences
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 25
Back Close
DBA 6434 Statistics for Business Sciences
p z1 p z2 p zq
x1 = (e11 λ1 ) √ + (e21 λ2 ) √ + . . . + (eq1 λq ) p + ζ1
λ1 λ2 λq
p z1 p z1 p zq
x2 = (e12 λ1 ) √ + (e12 λ1 ) √ + . . . + (eq2 λq ) p + ζ2
λ1 λ1 λq
..
JKUAT
.
z1 z2 zq
SODeL
p p p
xp = (e1p λ1 ) √ + (e2p λ2 ) √ + . . . + (eqp λq ) p + ζp
λ1 λ2 λq
p √
where γjk = (ejk λj ) and φi = zi / λi as before, notice as
stated at the outset that var(ζ) = ψ.
If we consider this in terms of the decomposition of the co-
variance matrix we have:
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 26
Back Close
DBA 6434 Statistics for Business Sciences
√
λ1 e1 ψ1 0 . . . 0
√
λ2 e2 0 ψ2 . . . 0
p
p p
Σ= λ1 e1 , λ2 e2 , . . . , λq eq + .
..
. .. . . ..
.
. . . .
p
λq eq 0 0 . . . ψp
JKUAT
(3.9)
SODeL
Pq 2
Where now ψj = var(ζj ) = σjj − k=1 γjk for k = 1, 2, . . . , q.
3.2. Uniquenesses
Estimates of the specific variances are given by diagonal ele-
T
ments of the matrix Σ̂ − Γ̂Γ̂ , i.e:
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 27
Back Close
DBA 6434 Statistics for Business Sciences
ψ1 0 ... 0
q
0 ψ2 ... 0
X
2
ψ̂ = .. .. .. withψj = σjj − γjk (3.10)
..
. . . .
k=1
0 0 . . . ψp
JKUAT
SODeL
Back Close
DBA 6434 Statistics for Business Sciences
p p p
Γ̂ = λ1 e1 , λ2 e 2 , . . . , λq e q
p
T
var(φ) = var Λ1 Γ1 (x − µ) = I q ,
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 29
Back Close
DBA 6434 Statistics for Business Sciences
3.5. But the uniquenesses aren’t independent
Each ζi contains the same zi so they are not mutually unrelated.
Hence the latent variables obtained using the principal compo-
nent method do not explain all the correlation structure in our
data X. The covariance matrix for the errors is now:
JKUAT
var(ζ) = Γ2 Λ2 ΓT2
SODeL
• A “residual” matrix
= S − LLT + ψ
(3.11)
Back Close
DBA 6434 Statistics for Business Sciences
ments. Rather conveniently, there is an inequality which gives
us:
h i
= Σ̂ − LLT + ψ ≤ λ̂2q+1 + · · · + λ̂2p (3.12)
Back Close
DBA 6434 Statistics for Business Sciences
portion of total sample variance:
λj
(3.13)
trace(S)
λj
which reduces to p
when using standardised variables (the
correlation matrix).
JKUAT
Kaiser criterion!
SODeL
• Exercise
Exercise 4. Consider the R data ability.cov, originally re-
ported by Smith, G. A. and Stanley G. (1983) “Clocking g: re-
lating intelligence and measures of timed performance” Intelli-
gence, 7:353-368 which measures p = 6 tests given to n = 112
JJ II individuals. The eigenvalues of the correlation matrix are: 3.08,
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 32
Back Close
DBA 6434 Statistics for Business Sciences
1.14, 0.82, 0.41, 0.36, 0.20. The corresponding eigenvectors of
the correlation matrix are given below:
e1 e2 e3 e4 e5 e6
General -0.47 0.00 0.07 0.86 0.04 -0.16
Picture -0.36 0.41 0.59 -0.27 0.53 0.00
Blocks -0.43 0.40 0.06 -0.20 -0.78 0.05
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
• Invariant to changes in scale i.e it doesn’t matter whether
the correlation or the covariance matrix are used, or indeed
whether any other scale changes are applied.
• There are a number of other advantages associated with
maximum likelihood fitting, but the problem of Heywood
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
4.1. Developing the maximum likelihood solution
np 1 −1
( n
L(x; µ, Σ) = (2π)− 2 |Σ|− 2 e− 2 tr(Σ i=1 (xi −x̄)(xi −x̄) +n(x̄−µ)(x̄−µ) ))
n T T
P
(3.15)
we wish to solve this in terms of our factor analysis model
and therefore need to find an expression for the likelihood of
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
• The log likelihood
Taking logs of 3.15, and collecting constant terms into c1 and c2
we can say that we wish to maximise:
p p
Γ̂ = ψE 1 (Λ1 − I) (3.17)
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 36
Back Close
DBA 6434 Statistics for Business Sciences
√ √
where Λ1 contains the q largest eigenvalues of ψS ψ, and
E 1 the corresponding eigenvectors. This is used to estimate Γ̂
given a value of ψ̂. Now, the log likelihood is maximised with
respect to ψ̂ given an estimate of Γ̂.
H0 : Σ = ΓΓT + ψ
H1 : Σ is any other positive def inite matrix
!
JJ II |Σ̂| −1
− 2 ln Λ = −2 ln + n tr(Σ̂ S) − p (3.18)
J I |S|
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 37
Back Close
DBA 6434 Statistics for Business Sciences
with 12 ((p − q)2 − p − q) degrees of freedom.
−1
It can be shown that tr(Σ̂ S) − p = 0 at the maximum
likelihood so this term can be removed and we can consider that
!
|Σ̂|
− 2 ln Λ = n ln (3.19)
|S|
JKUAT
2p + 5 2q
n−1− −
6 3
• Hence we are going to test: !
T
2p + 5 2q |Γ̂Γ̂ + ψ̂|
n−1− − ln > χ2((p−q)2 −p−q)/2,α (3.20)
6 3 |S|
JJ II
• Start with q small (anticipating the rejection of H0 ), and
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 38
Back Close
DBA 6434 Statistics for Business Sciences
increase q until H0 is no longer rejected.
• Do note, there are many reasons for rejecting H0 , not all
of these may concern us.
• Johnson and Wichern suggest that if n is large and q is
small relative to p, it will tend to reject H0 even though
JKUAT
5. Rotation
Back Close
DBA 6434 Statistics for Business Sciences
• Orthogonal rotations: two objective criteria are most com-
monly used to determine the optimal rotation: the Vari-
max procedure and the Quartimax procedure.
Varimax rotation seeks to maximise:
JKUAT
" p # 2
q p 2 4 2
1 X X γjk X γjk
SODeL
V = 2 p
2
− 2
(3.21)
p k=1 j=1
ξj j=1
ξj
Back Close
DBA 6434 Statistics for Business Sciences
and a much decreased loading in terms of the second factor (γ28
)is virtually zero. Thus we may feel that we have achieved some
simplification of our factor structure.
A promax rotation of the factor solutions has been carried
out, and gives the following loadings:
Factor1 Factor2
JKUAT
picture 0.671
blocks 0.932
maze 0.508
reading 1.023
vocab 0.811
Would you describe the first factor as a measure of:
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 41
Back Close
DBA 6434 Statistics for Business Sciences
6. Scoring
6.1. The weighted least squares estimates are:
φ̂i = (ΓT Ψ−1 Γ)ΓT Ψ(xi − x̄) (3.22)
6.2. Or
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
T T
z = Γ̂ (Γ̂)Γ̂ + ψ̂)−1 (xi − mu)
ˆ (3.23)
JKUAT
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 43
Back Close
DBA 6434 Statistics for Business Sciences
7. Summary
• A method for finding “latent variables” which explain the
dependence (correlation) structure in some data
• Unfortunately, solutions based on PCFA are still common
and you ought to know a little about them
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
appropriate) and all the usual eigen value procedures ap-
ply
JKUAT
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 45
Back Close
DBA 6434 Statistics for Business Sciences
7.1. Revision questions or guidelines
1. Given some loadings a and eigenvalues, calculate the value
of factor loadings (using PCFA)
2. Calculate and interpret the commonality and uniqueness
3. Explaining difference between PCFA and MLFA - why do
JKUAT
Back Close
DBA 6434 Statistics for Business Sciences
Learning Activities
1. Worked example of discriminant analysis with the wines
data - look at the effect of removing variables on the
APER, plot the discriminant functions
2. Worked example with the iris data - to look at the way
JKUAT
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 47
Back Close
DBA 6434 Statistics for Business Sciences
Solutions to Exercises
Exercise 2. Solution for the exercise Exercise 2
JKUAT
SODeL
JJ II
J I
J DocDoc I
JKUAT-SODeL
c JKUAT is ISO:2008 certified 48
Back Close