0% found this document useful (0 votes)
49 views48 pages

Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke

The document discusses statistics for business sciences and covers topics like factor analysis, principal component analysis, fitting factor analysis models, maximum likelihood solutions, rotation, and scoring. It provides learning outcomes, descriptions of key concepts, examples, and exercises related to factor analysis and distinguishing it from principal component analysis. The goal is to model the covariance structure between observed variables and uncover underlying latent factors.

Uploaded by

root parrot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views48 pages

Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke

The document discusses statistics for business sciences and covers topics like factor analysis, principal component analysis, fitting factor analysis models, maximum likelihood solutions, rotation, and scoring. It provides learning outcomes, descriptions of key concepts, examples, and exercises related to factor analysis and distinguishing it from principal component analysis. The goal is to model the covariance structure between observed variables and uncover underlying latent factors.

Uploaded by

root parrot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

JOMO KENYATTA UNIVERSITY

OF
AGRICULTURE & TECHNOLOGY
JKUAT

SCHOOL OF OPEN, DISTANCE AND eLEARNING


SODeL

P.O. Box 62000, 00200


Nairobi, Kenya
E-mail: [email protected]

DBA 6434 Statistics for Business Sciences

JJ II LAST REVISION ON September 14, 2012


J I
J DocDoc I
Back Close
DBA 6434 Statistics for Business Sciences
This presentation is intended to covered within one week.
The notes, examples and exercises should be supple-
mented with a good textbook. Most of the exercises have
solutions/answers appearing elsewhere and accessible by
clicking the green Exercise tag. To move back to the same
page click the same tag appearing at the end of the solu-
JKUAT

tion/answer.
SODeL

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 2

Back Close
Table of Contents

3. Factor Analysis
1. Introduction to factor Analysis
1.1. Factor analysis versus principal component
JKUAT

analysis
SODeL

2. Fitting the Factor analysis model


2.1. Idealised factor analysis model?
2.2. The factor model
• Consequences • A key result • On a related note
...
2.3. Centering or scaling
JJ II 2.4. Factor indeterminacy
J I • Identifiability
J DocDoc I
Back Close
DBA 6434 Statistics for Business Sciences
2.5. Strategy for factor analysis
3. Principal component extraction
3.1. Finding E −1
• The model • But . . . the problem
3.2. Uniquenesses
3.3. Common factors
JKUAT

• Common factors have identity covariance ma-


SODeL

trix
3.4. Common factors are uncorrelated with the
residuals
3.5. But the uniquenesses aren’t independent
• A “residual” matrix
3.6. Proportion of variance explained
JJ II
• Exercise
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 4

Back Close
DBA 6434 Statistics for Business Sciences
4. Maximum likelihood solutions
4.1. Developing the maximum likelihood solu-
tion
• The log likelihood • Maximising the log-likelihood
(subject to diagonality constraint)
4.2. Hypothesis testing
JKUAT

• Hence we are going to test:


SODeL

5. Rotation
6. Scoring
6.1. The weighted least squares estimates are:
6.2. Or
6.3. Bayesian methods
7. Summary
JJ II
7.1. Revision questions or guidelines
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 5

Back Close
DBA 6434 Statistics for Business Sciences
JKUAT Solutions to Exercises
SODeL

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 6

Back Close
DBA 6434 Statistics for Business Sciences
LESSON 3
Factor Analysis

Learning outcomes
Upon completing this topic, you should be able to:
JKUAT

• Describe “principal factoring” and “maximum likelihood


SODeL

factor analysis” process


• differentiate between between factor analysis and principal
component analysis
• Describe various roles of factors
• Apply factor rotation as an aid to interpretation of factors

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 7

Back Close
DBA 6434 Statistics for Business Sciences
1. Introduction to factor Analysis
• Most development outside the statistical community, es-
pecially Psychometrics
• Aim is to produce a small number of “latent variables”
which have some substantive interpretation
JKUAT

• There are related techniques for discrete responses which


SODeL

we don’t cover

1.1. Factor analysis versus principal component analysis


• The (one) aim of principal components analysis is to pro-
duce a projection explaining the greatest variance: factor
analysis is about modelling the covariance structure
JJ II
• Factor analysis is based on a probability model
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 8

Back Close
DBA 6434 Statistics for Business Sciences
Exercise 1. Reification: if a number of observed correlated
variables are all manifestations of some underlying phenomenon,
the claim is that factor analysis recovers this underlying struc-
ture

2. Fitting the Factor analysis model


JKUAT

None of these methods leads to a unique solution!


SODeL

Example. Principal component factor analysis


Solution: the answer is here 

Exercise 2. Principal factoring (iterative modification)


Exercise 3. Maximum likelihood is
JJ II Do not confuse PCFA with pca!
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 9

Back Close
DBA 6434 Statistics for Business Sciences
2.1. Idealised factor analysis model?
X3
*

 


f1  
H
@HH 
@ H
H
JKUAT

@ HH
@ HH
X2
SODeL

@ j
H
@  *

@

 @

 @@
f2 
HH
R

HH
HH
HH
Hj
H X1
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 10

Back Close
DBA 6434 Statistics for Business Sciences
Some key words
• We have p observed or manifest variables
• We wish to represent these by q < p mutually uncorrelated
common factors.
• We have some uncorrelated residuals specific to each of the
JKUAT

observed variables, which is not correlated with any of the


SODeL

remaining p − 1 variables. That part of the variance of the


manifest variable explained by the q < p latent variables
is known as the commonality.
• It is possible to rotate the q axes of common factors to
new orthogonal or oblique axes

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 11

Back Close
DBA 6434 Statistics for Business Sciences
2.2. The factor model
The orthogonal model underlying Factor Analysis can be de-
scribed as follows:

x = µ + Γφ + ζ
JKUAT

Definition 1. x is an 1 × p random vector.


SODeL

µ represents a vector of unknown constants (mean values),


Γ is an unknown p × q matrix of constants referred to as the
loadings.
• φ is a q × 1 unobserved random vector referred to as the
scores assumed to have mean 0 and covariance Σφ ,

JJ II • Σφ = I.
J I • ζ is 1 × p unobserved random error vector having mean 0
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 12

Back Close
DBA 6434 Statistics for Business Sciences
and by assumption a diagonal covariance ψ referred to as
the uniqueness or specific variance.

• Consequences
With the above assumptions:
• , cov(φ, ζ) = 0,
JKUAT

• if Σφ = I then cov(x, φ) = Γ.
SODeL

And our manifest variables have the following distributional


form:

x ∼ N ormal(µ, ΓΓT + ψ)

Theorem 1. It may be slightly clearer to consider the way a vec-


JJ II tor of observations x = x1 , . . . , xp are modelled in factor analy-
J I sis:
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 13

Back Close
DBA 6434 Statistics for Business Sciences

q
X
x1 = µ 1 + γ1k φk + ζ1
k=1
q
X
x2 = µ 2 + γ2k φk + ζ2
k=1
JKUAT

..
.
SODeL

q
X
xp = µ p + γpk φk + ζp
k=1

• A key result
2 2 2
var(xj ) = γj1 + γj2 + . . . + γjq + var(ζj ) (3.1)
| {z } | {z }
communalities U niqueness
JJ II
A lot of parameters for the amount of data!
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 14

Back Close
DBA 6434 Statistics for Business Sciences
• The covariance matrix Σ has p(p + 1)/2 parameters,
• The factor model ΓΓT + ψ has qp − q(q − 1)/2 + p param-
eters.
So, p(p + 1)/2 ≥ qp − q(q − 1)/2 + p, or:

2p + 1 − 8p − 1
JKUAT

q≤ (3.2)
2
SODeL

This gives some maximum values of q for given values of p:


p 1 2 3 4 5 6 7 8 9 10
max q 0 0 1 1 2 3 3 4 5 6
(any problems with the schematic put up earlier)

• On a related note . . .
JJ II Degrees of freedom after fitting a q factor model:
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 15

Back Close
DBA 6434 Statistics for Business Sciences

p(p + 1) q(q − 1) (p − q)2 − (d + m)


df = − qp + −p= (3.3)
2 2 2

2.3. Centering or scaling


Either centre the data:
JKUAT
SODeL

q
X
xj − µ j = γk φk + ζj ; j = 1, . . . , p (3.4)
k=1

Or even standardise (i.e. model the correlation matrix rather


than the covariance matrix):
q
xj − µ j X
= γk φk + ζj ; j = 1, . . . , p (3.5)
JJ II σjj k=1
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 16

Back Close
DBA 6434 Statistics for Business Sciences
So, regardless of the data matrix used, factor analysis is es-
sentially a model for Σ, the covariance matrix of x,

Σ = ΓΓT + ψ

2.4. Factor indeterminacy


JKUAT

• Identifiability
SODeL

• A very indeterminate model,


• specifically it is unchanged if we replace Γ by KΓ for any
orthogonal matrix K.
• This can be turned to our advantage, with sensible choice
of a suitable orthogonal matrix K we can achieve a rota-
JJ II tion that may yield a more interpretable answer.
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 17

Back Close
DBA 6434 Statistics for Business Sciences
2.5. Strategy for factor analysis
To fit the model, we therefore need to:
• Estimate the number of common factors q.
• Estimate the factor loadings Γ
• Estimate the specific variances ψ 2
JKUAT

• On occasion, estimate the factor scores φ


SODeL

3. Principal component extraction


We have already used the spectral decomposition to obtain one
possible factoring of the covariance matrix Σ.

JJ II Σ = EΛE T
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 18

Back Close
DBA 6434 Statistics for Business Sciences

Σ = λ1 e1 eT1 + λ2 e2 eT2 + . . . λp ep eTp


 √ 
λ1 e1
 √
 λ2 e2
p  
p p 
= λ1 e1 , λ2 e2 , . . . , λp ep  .. 
.
JKUAT

 
 
p
λp ep
SODeL

In practice we don’t know Σ and we use S (or we standardise


the variables and use R)
Spectral decomposition in full . . . yields linear principal com-
ponents as follows:

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 19

Back Close
DBA 6434 Statistics for Business Sciences

z1 = e11 x1 + e12 x2 + . . . + e1p xp ; var(z1 ) = λ1


z2 = e21 x2 + e22 x2 + . . . + e2p xp ; var(z2 ) = λ2
..
.
JKUAT

zp = ep1 x1 + ep2 x1 + . . . + e1p xp ; var(zp ) = λp


SODeL

Or in matrix notation:

Z = EX (3.6)

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 20

Back Close
DBA 6434 Statistics for Business Sciences
     
z1 X1 e11 e12 . . . e1p
 z2   X2  e21 e22 . . . e2p 
     

• Z= .
 . 
, X =  .  and E = 
 .  .. .. . . . .. .

 .   .  . . . 


zp Xp ep1 ep2 . . . epp
So, multiplying both sides of 3.6 by E −1 gives:
JKUAT

E −1 Z = X
SODeL

(3.7)

3.1. Finding E −1
We know orthogonal matrices generally that E −1 = E T so we
can invert the transformation by using

JJ II X = ET Z (3.8)
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 21

Back Close
DBA 6434 Statistics for Business Sciences
which can be expanded as:

x1 = e11 z1 + e21 z2 + . . . + ep zp
x2 = e12 z2 + e22 z2 + . . . + ep zp
..
JKUAT

.
SODeL

xp = e1p z1 + e2p z2 + . . . + epp zp

which we could express as;

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 22

Back Close
DBA 6434 Statistics for Business Sciences

p z1 p z2 p zp
x1 = (e11 λ1 ) √ + (e21 λ2 ) √ + . . . + (ep1 λp ) p
λ1 λ2 λp
p z1 p z1 p zp
x2 = (e12 λ1 ) √ + (e12 λ1 ) √ + . . . + (ep2 λp ) p
λ1 λ1 λp
..
JKUAT

.
z1 z2 zp
SODeL

p p p
xp = (e1p λ1 ) √ + (e2p λ2 ) √ + . . . + (epp λp ) p
λ1 λ2 λp

• The model
p p
• Set γjk = (ejk λj ) and φj = zj / λj
A clear link with the factor analysis model. Our loadings matrix
p
Γ is the p × p matrix where the jth column is given by λj ej
JJ II
so that:
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 23

Back Close
DBA 6434 Statistics for Business Sciences

S = ΓΓT

• But . . . the problem

x1 = e11 z1 + e21 z2 + . . . + eq1 zq + eq+1,1 zq+1 + . . . + ep1 zp


JKUAT

x2 = e12 z2 + e22 z2 + . . . + eq2 zq + eq+1,2 zq+1 + . . . + ep2 zp


SODeL

..
.
xp = e1p z1 + e2p z2 + . . . + eqp zq + eq+1,p zq+1 + . . . + epp zp

if we set eq+1,j zq+1 + . . . + epj zp = ζj ; j = 1, . . . , p we can


rewrite this as:

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 24

Back Close
DBA 6434 Statistics for Business Sciences

x1 = e11 z1 + e21 z2 + . . . + eq1 zq + ζ1


x2 = e12 z1 + e22 z2 + . . . + eq2 zq + ζ2
..
.
JKUAT

xp = e1p z1 + e2p z1 + . . . + eqp zq + ζp


SODeL

As earlier, we can expressed this as:

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 25

Back Close
DBA 6434 Statistics for Business Sciences

p z1 p z2 p zq
x1 = (e11 λ1 ) √ + (e21 λ2 ) √ + . . . + (eq1 λq ) p + ζ1
λ1 λ2 λq
p z1 p z1 p zq
x2 = (e12 λ1 ) √ + (e12 λ1 ) √ + . . . + (eq2 λq ) p + ζ2
λ1 λ1 λq
..
JKUAT

.
z1 z2 zq
SODeL

p p p
xp = (e1p λ1 ) √ + (e2p λ2 ) √ + . . . + (eqp λq ) p + ζp
λ1 λ2 λq
p √
where γjk = (ejk λj ) and φi = zi / λi as before, notice as
stated at the outset that var(ζ) = ψ.
If we consider this in terms of the decomposition of the co-
variance matrix we have:
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 26

Back Close
DBA 6434 Statistics for Business Sciences

 √  
λ1 e1 ψ1 0 . . . 0
 √
 λ2 e2  0 ψ2 . . . 0
p   
p p
Σ= λ1 e1 , λ2 e2 , . . . , λq eq  + .
..
  . .. . . ..

 .
  . . . .
p
λq eq 0 0 . . . ψp
JKUAT

(3.9)
SODeL

Pq 2
Where now ψj = var(ζj ) = σjj − k=1 γjk for k = 1, 2, . . . , q.

3.2. Uniquenesses
Estimates of the specific variances are given by diagonal ele-
T
ments of the matrix Σ̂ − Γ̂Γ̂ , i.e:

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 27

Back Close
DBA 6434 Statistics for Business Sciences

 
ψ1 0 ... 0
q
0 ψ2 ... 0
 
X
  2
ψ̂ =  .. .. ..  withψj = σjj − γjk (3.10)
..

 . . . .

 k=1
0 0 . . . ψp
JKUAT
SODeL

3.3. Common factors


So, when using the principal component solution of Σ̂, it is speci-
fied in terms of eigenvalue-eigenvector pairs (λ̂1 , ê1 ), (λ̂2 , ê2 ), . . .,
(λ̂p , êp ), where λ̂1 ≥ λ̂2 ≥ . . . ≥ λ̂p . If we wish to find a q < p
solution of common factors, then the estimated factor loadings
are given by:
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 28

Back Close
DBA 6434 Statistics for Business Sciences
p p p 
Γ̂ = λ1 e1 , λ2 e 2 , . . . , λq e q

• Common factors have identity covariance matrix


As with the factor analysis model given earlier, the factors φ
have identity covariance matrix
JKUAT
SODeL

p 
T
var(φ) = var Λ1 Γ1 (x − µ) = I q ,

3.4. Common factors are uncorrelated with the residu-


als
p  p
cov(φ, ζ) = cov Λ1 Γ1 (x − µ), Γ2 Γ2 (x − µ) = Λ1 ΓT1 ΣΓ2 ΓT2 =
T T

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 29

Back Close
DBA 6434 Statistics for Business Sciences
3.5. But the uniquenesses aren’t independent
Each ζi contains the same zi so they are not mutually unrelated.
Hence the latent variables obtained using the principal compo-
nent method do not explain all the correlation structure in our
data X. The covariance matrix for the errors is now:
JKUAT

var(ζ) = Γ2 Λ2 ΓT2
SODeL

• A “residual” matrix
 = S − LLT + ψ

(3.11)

By construction, the diagonal elements of this residual ma-


trix will be zero. A decision to retain a particular q factor model
JJ II could be made depending on the size of the off-diagonal ele-
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 30

Back Close
DBA 6434 Statistics for Business Sciences
ments. Rather conveniently, there is an inequality which gives
us:
h i
 = Σ̂ − LLT + ψ ≤ λ̂2q+1 + · · · + λ̂2p (3.12)

So it is possible to check the acceptability of fit in terms of


JKUAT

a small sum of squares of neglected eigenvalues.


SODeL

3.6. Proportion of variance explained


Instead of examining discarded components we could examine
those we intend to retain. Bearing in mind that trace(Σ) = σ11 +
σ22 + . . . + σpp , we know that the amount of variation explained
2 2 2
√ √
by the first factor γ11 + γ21 + . . . + γp1 = ( λ1 e1 )T ( λ1 e1 ) = λ1 .
JJ II So we know that the j-th factor explains the following pro-
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 31

Back Close
DBA 6434 Statistics for Business Sciences
portion of total sample variance:

λj
(3.13)
trace(S)
λj
which reduces to p
when using standardised variables (the
correlation matrix).
JKUAT

Kaiser criterion!
SODeL

• Exercise
Exercise 4. Consider the R data ability.cov, originally re-
ported by Smith, G. A. and Stanley G. (1983) “Clocking g: re-
lating intelligence and measures of timed performance” Intelli-
gence, 7:353-368 which measures p = 6 tests given to n = 112
JJ II individuals. The eigenvalues of the correlation matrix are: 3.08,
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 32

Back Close
DBA 6434 Statistics for Business Sciences
1.14, 0.82, 0.41, 0.36, 0.20. The corresponding eigenvectors of
the correlation matrix are given below:

e1 e2 e3 e4 e5 e6
General -0.47 0.00 0.07 0.86 0.04 -0.16
Picture -0.36 0.41 0.59 -0.27 0.53 0.00
Blocks -0.43 0.40 0.06 -0.20 -0.78 0.05
JKUAT

Maze -0.29 0.40 -0.79 -0.10 0.33 0.05


SODeL

Reading -0.44 -0.51 -0.01 -0.10 0.06 0.73


Vocab -0.43 -0.50 -0.09 -0.35 0.02 -0.66

4. Maximum likelihood solutions


Obvious conclusions might be drawn by noting that R only offers
JJ II this method of fitting factor analysis models, see the help-file for
J I the relevant function ?factanal as well as Venables and Ripley.
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 33

Back Close
DBA 6434 Statistics for Business Sciences
• Invariant to changes in scale i.e it doesn’t matter whether
the correlation or the covariance matrix are used, or indeed
whether any other scale changes are applied.
• There are a number of other advantages associated with
maximum likelihood fitting, but the problem of Heywood
JKUAT

cases still remains, whereby some of the unique variances


SODeL

are estimated with a negative value.


• We also need to impose an additional assumption over
and above the factor analysis assumptions set out earlier,
namely that the following matrix:
ΓT Ψ−1 Γ (3.14)

must be diagonal to enable model fitting.


JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 34

Back Close
DBA 6434 Statistics for Business Sciences
4.1. Developing the maximum likelihood solution
np 1 −1
( n
L(x; µ, Σ) = (2π)− 2 |Σ|− 2 e− 2 tr(Σ i=1 (xi −x̄)(xi −x̄) +n(x̄−µ)(x̄−µ) ))
n T T
P

(3.15)
we wish to solve this in terms of our factor analysis model
and therefore need to find an expression for the likelihood of
JKUAT

L(x; µ, Γ, ψ) where µ is a nuisance parameter for our purposes


SODeL

here, we can either get rid of it by using the estimate µ̂ = x̄


and hence use the profile likelihood to find Γ̂ and ψ̂ , or we can
factorise the likelihood as L(S; x̄, Σ)L(x̄; µ, Σ). In this latter
case, barx and S are the joint sufficient statistics for µ and Σ
respectively, for the purposes of factor analysis we only require
the first part of the factorised likelihood which can be estimated
JJ II by conditional maximum likelihood. Note that as barx and S
J I are independent this is also the marginal likelihood.
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 35

Back Close
DBA 6434 Statistics for Business Sciences
• The log likelihood
Taking logs of 3.15, and collecting constant terms into c1 and c2
we can say that we wish to maximise:

ln L = c1 − c2 ln |ΓΓT + ψ| + trace(ΓΓT + ψ)−1 S



(3.16)
JKUAT
SODeL

• Maximising the log-likelihood (subject to diagonality


constraint)
An initial estimate of ψe has to be made as before, for fixed
ψ > 0, the likelihood equations require:

p p
Γ̂ = ψE 1 (Λ1 − I) (3.17)
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 36

Back Close
DBA 6434 Statistics for Business Sciences
√ √
where Λ1 contains the q largest eigenvalues of ψS ψ, and
E 1 the corresponding eigenvectors. This is used to estimate Γ̂
given a value of ψ̂. Now, the log likelihood is maximised with
respect to ψ̂ given an estimate of Γ̂.

4.2. Hypothesis testing


JKUAT
SODeL

H0 : Σ = ΓΓT + ψ
H1 : Σ is any other positive def inite matrix

This (eventually) yields a likelihood ratio statistic:

!
JJ II |Σ̂|  −1

− 2 ln Λ = −2 ln + n tr(Σ̂ S) − p (3.18)
J I |S|
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 37

Back Close
DBA 6434 Statistics for Business Sciences
with 12 ((p − q)2 − p − q) degrees of freedom.
−1
It can be shown that tr(Σ̂ S) − p = 0 at the maximum
likelihood so this term can be removed and we can consider that
!
|Σ̂|
− 2 ln Λ = n ln (3.19)
|S|
JKUAT

which requires only a Bartlett correction replacing n with:


SODeL

2p + 5 2q
n−1− −
6 3
• Hence we are going to test: !
T
2p + 5 2q |Γ̂Γ̂ + ψ̂|
n−1− − ln > χ2((p−q)2 −p−q)/2,α (3.20)
6 3 |S|
JJ II
• Start with q small (anticipating the rejection of H0 ), and
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 38

Back Close
DBA 6434 Statistics for Business Sciences
increase q until H0 is no longer rejected.
• Do note, there are many reasons for rejecting H0 , not all
of these may concern us.
• Johnson and Wichern suggest that if n is large and q is
small relative to p, it will tend to reject H0 even though
JKUAT

Σ̂ is close to S. So the situation can arise whereby we can


SODeL

claim “statistical significance” for the inclusion of addi-


tional factors in our model, but they actually add little to
the model. This tends to reinforces the exploratory aspects
of multivariate analysis (for some sense of exploratory).

5. Rotation

JJ II Having added a constraint to limit rotation


J I We are now going to rotate:
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 39

Back Close
DBA 6434 Statistics for Business Sciences
• Orthogonal rotations: two objective criteria are most com-
monly used to determine the optimal rotation: the Vari-
max procedure and the Quartimax procedure.
Varimax rotation seeks to maximise:
JKUAT

 " p  # 2 
q p  2 4 2
1 X X γjk X γjk
SODeL

V = 2 p
2
− 2
 (3.21)
p k=1 j=1
ξj j=1
ξj

where ξi2 = qk=1 γjk2


P
is the communality for each of the j
variables as before.
Although is is more difficult to see what is going on with
q = 4, we can see for example that the eighth variable (Social
JJ II and Personal Services) has in increased loading in terms of γ18 ,
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 40

Back Close
DBA 6434 Statistics for Business Sciences
and a much decreased loading in terms of the second factor (γ28
)is virtually zero. Thus we may feel that we have achieved some
simplification of our factor structure.
A promax rotation of the factor solutions has been carried
out, and gives the following loadings:
Factor1 Factor2
JKUAT

general 0.364 0.470


SODeL

picture 0.671
blocks 0.932
maze 0.508
reading 1.023
vocab 0.811
Would you describe the first factor as a measure of:
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 41

Back Close
DBA 6434 Statistics for Business Sciences
6. Scoring
6.1. The weighted least squares estimates are:
φ̂i = (ΓT Ψ−1 Γ)ΓT Ψ(xi − x̄) (3.22)

6.2. Or
JKUAT

Based on assuming that both φ and ζ are multivariate normal,


SODeL

thus a concatenation of the manifest (x) and latent ( φ) variables


y T = (φT , xT ) will also be normal with dispersion matrix:
!
I ΓT
var(y) =
Γ ΓΓT + ψ
The mean of φ is zero by definition, therefore:
JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 42

Back Close
DBA 6434 Statistics for Business Sciences

E(z|x0 ) = ΓT (Γ)ΓT + Ψ)−1 (xo − µ)

which gives the estimate for the scores as:

T T
z = Γ̂ (Γ̂)Γ̂ + ψ̂)−1 (xi − mu)
ˆ (3.23)
JKUAT

6.3. Bayesian methods


SODeL

It might be clear that factor scoring takes no account of un-


certainty in the estimates of Γ̂ and ψ̂, this is one area where
Bayesian methods are coming to the fore

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 43

Back Close
DBA 6434 Statistics for Business Sciences
7. Summary
• A method for finding “latent variables” which explain the
dependence (correlation) structure in some data
• Unfortunately, solutions based on PCFA are still common
and you ought to know a little about them
JKUAT

• Maximum likelihood based methods have been available


SODeL

since 1972 - explicit probability model and hence hypoth-


esis testing possible
• Interpretation of the loadings is usually the end-goal. If
you don’t like the loadings you get you can rotate them
(using a suitable criteria to find the “optimal” rotation)

JJ II • You can check the quality of fit by examining the com-


J I monalities/uniqueness, as well as hypothesis tests (where
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 44

Back Close
DBA 6434 Statistics for Business Sciences
appropriate) and all the usual eigen value procedures ap-
ply
JKUAT
SODeL

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 45

Back Close
DBA 6434 Statistics for Business Sciences
7.1. Revision questions or guidelines
1. Given some loadings a and eigenvalues, calculate the value
of factor loadings (using PCFA)
2. Calculate and interpret the commonality and uniqueness
3. Explaining difference between PCFA and MLFA - why do
JKUAT

loadings differ (often higher in PCFA), why are unique-


SODeL

nesses from PCFA flawed.


4. Interpreting rotated and unrotated factor solutions. Ex-
plaining a little about criteria for optimising a rotation.
Explaining difference between orthogonal and oblique ro-
tation.

JJ II 5. Anything else you would like to suggest


J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 46

Back Close
DBA 6434 Statistics for Business Sciences
Learning Activities
1. Worked example of discriminant analysis with the wines
data - look at the effect of removing variables on the
APER, plot the discriminant functions
2. Worked example with the iris data - to look at the way
JKUAT

you can use a test and training set


SODeL

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 47

Back Close
DBA 6434 Statistics for Business Sciences
Solutions to Exercises
Exercise 2. Solution for the exercise Exercise 2
JKUAT
SODeL

JJ II
J I
J DocDoc I JKUAT-SODeL
c JKUAT is ISO:2008 certified 48

Back Close

You might also like