Confirmatory Factor Analysis
Confirmatory Factor Analysis
The idea is that the number of factors might be lower than the number of
variables, and might correspond to theories about the data. For example,
you might think there are different cognitive abilities, such as spatial
reasoning, analytic reasoning, analogical reasoning, quantitative ability,
linguistic ability, etc. A test might have 100 questions that measure these
underlying abilities, so different linear combinations of the factors might
give the distribution of the answers on the different test questions.
For this matrix, the first two variables are high correlated with each other
but to no other variables, and similarly the last three variables are highly
correlated to each other but to no other. Thus, the data could plausibly
have arisen if there were two factors, with variables 1 and 2 measuring the
first factor and variables 3–5 measuring the second factor.
April 24, 2015 2 / 67
Factor analysis
y − µ = Λf + ε
Where Ψ = diag(ψ1 , . . . , ψp ).
April 24, 2015 7 / 67
Factor analysis: example with 5 variables, 2 factors
Here hi2 is called the comunality or common variance and ψ is called the
specific variance or residual variance.
ΛΛ0 + Ψ ≈ Σ,
the covariance matrix for the original data, but often the approximation is
not very good if there are too few factors. If the estimated factor analysis
structure doesn’ fit the estimate for Σ, this indicates the inadequacy of
the model and suggests that more factors might be needed.
The book points out that this can be a good thing in that the inadequacy
of the model might be easier to see than other statistical settings that
require complicated diagnostics.
An issue that bothers some is that the factor loadings are not unique. If T
is any othogonal matrix, then TT0 = I. Consequently,
y − µ = ΛTT0 f + ε
is equivalent to the model with TT0 removed, and factor loadings ΛT with
factors T0 f will give equivalent results, and the new model could be written
y − µ = Λ∗ f ∗ + ε
with Λ = ΛT and f ∗ = T0 f.
The new factors and factor loadings are different, but the communalities
and residual variances are not affected.
There are different ways to estimate the factor loadings Λ. The first is
called the principal component method, but is unrelated to principal
components (!). Four methods listed in the book are
1. “Princpal components” (not PCA)
2. Principal factors
3. Iterated principal factors
4. maximum likelihood
For the first approach, the idea is to initially factor S, the sample
covariance matrix of the data, into
0
S≈Λ
bΛb
For the diagonal, j = i, and we have λ2ik in the summand, and the book
uses j instead of k as the index.
April 24, 2015 14 / 67
Factor analysis
The fact that m < p is what makes Λ bΛb 0 only approximate S. Adding Ψb
means that the original sample variances are recovered exactly in the
model but that the covariances are still estimated.
The proportion of the total sample variance (adding the variances of all
variables separately, regardless of covariances), is therefore
Pp b2
i=1 λij θj
=
tr(S) tr(S)
The fit of the model can be measured by comparing the covariance matrix
with its estimate into an error matrix
E = S − (Λ
bΛb 0 + Ψ)
b
indicating that there is collinearity in the columns (they are not linearly
independent). The proportion of the variance explained by the first factor
is 3.263/5 = 0.6526 and the proportion explained by the first two together
is (3.263 + 1.538)/5 = 0.9602, meaning that two factors could account for
96% of the variability in the data. Looking at the correlation matrix, we
have strong positive correlations within the sets {Kind, Happy , Likeable}
and {Intelligent, Just}, suggesting that these could correspond to separate
and somewhat independent factors in the characteristics perceived in
people by the subject.
In the example, the loadings in the first column give the relative
importance of the variables for factor 1, while the loadings in the second
column give the relative importance of factor 2. For factor 1, the highest
variables are for Kind and Happy, with Likeable being third. The Just
variable is somewhat similar to the Happy variable. For factor 2, Intelligent
and Just stand out as having much higher correlations than the other
variables.
Since there is no unique way to rotate the factors, there is no unique way
to interpret the factor loadings. So, for example, variables 3 and 5 are
similar on the original factor 1 (but not factor 2), while on the rotated
factors, variables 3 and 5 are quite different, with variable 5 being high on
factor 2 and variable 3 being high on factor 1. It seems that you could
choose whether or not to rotate axes depending on the story you want to
tell about variable 3.
The book suggests that when the factors are rotated, the factors could be
interpreted as representing humanity (rotated factor 1) and rationality
(factor 2). An objection is that this is imposing a theory of preconceived
personalities onto the data.
0
S−Ψ
b ≈Λ
bΛb
or
0
R−Ψ
b ≈Λ
bΛb
Here
R−Ψ
b
can be approximated using rij , the sample correlations, for the off diagonal
elements. For the diagonals, these can be estimated using Ri2 , the squared
multiple correlation between yi and the remaining variables. This is
computed as
1
hi = Ri2 = 1 − ii
b
r
where r denotes the ith diagonal element of R−1 .
ii
hi2
ψbi = rii − b
hi2 .
using the updated value for b
H0 : Σ = ΛΛ0 + Ψ, H1 : Σ 6= ΛΛ0 + Ψ
fi = (b
fi , b fm )0
f2 , . . . , b
fi = B01 (yi − y) +
If F had been observed, then the estimate for B1 would be (using the
usual matrix representation of regression)
b 1 = (Y0 Y)−1 Yc F
B c
We can estimate F by
b = Yc S−1 Λ
F b
(or use R in place of S). Often you would obtain factor scores after doing
rotations.
The eigenvalues are 7.91, 5.85, .31, .26, . . . , .002. Since only the first two
eigenvalues are large, this suggests that two factors is reasonable for the
data. This can be visualized by a scree plot.
The author points out that many statisticians dislike factor analysis partly
because of the nonuniqueness of the factor rotations and that this could
lead to different interpretations. One question is whether the factors really
exist? The previous example seems to suggest that people judge others
according to their perceived benevolence or competence. It would have
been interesting if these questions had been asked in addition to the
attributes such as strength, intelligence, kindness, etc.
There are a few ways to do factor analysis in R. Some common ones are
the fa() and factanal() functions. The fa() function is in the psych
package. factanal() is built into R but only does maximum likelihood
factor analysis.
The fa() function is more flexible, for example, it can handle missing data
(which would be common in questionnaire data) and doesn’t only do
maximum likelihood, as well as having more options.
The input is either a correlation matrix, covariance matrix, or the original data
matrix. The user specifies the number of factors. The number of observations
(number of rows in original data, not the number of variables or number of rows
in the correlation matrix) must be specified to get confidence intervals and
goodness-of-fit statistics.
Legend:
The data is in a comma delimited file, and looks something like this:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q
28,Q29,Q30,Q31,Q32,Q33,Q34,Q35,Q36,Q37,Q38,Q39,Q40,Q41,Q42
5,4,4,5,5,4,3,3,5,1,4,1,3,4,4,4,2,2,1,2,3,3,3,4,3,4,4,4,4,2,1,1,2,3,
5,5,4,4,4,5,5,4,5,2,4,2,2,4,5,4,2,1,1,2,2,4,4,4,3,5,5,5,5,2,2,2,2,2,
4,3,3,2,2,4,4,4,3,2,3,2,,4,3,2,1,1,1,,2,2,3,3,3,5,5,5,5,3,2,3,2,3,2,
5,4,3,4,3,3,2,4,4,2,4,2,3,5,4,4,2,1,1,2,2,4,4,4,4,5,5,5,5,2,1,1,1,2,
3,2,1,2,2,4,3,2,2,2,3,2,2,3,4,3,2,1,1,2,2,4,3,3,4,3,4,3,4,2,3,3,5,4,
5,5,5,5,4,5,5,4,5,2,3,2,2,2,4,4,1,2,1,1,2,5,5,5,5,5,5,5,5,1,1,1,2,3,
5,4,4,5,5,3,4,4,5,1,3,2,2,4,2,2,2,1,1,2,2,3,3,3,4,5,5,5,5,3,2,2,2,2,
5,1,3,5,4,2,2,4,4,2,4,2,4,5,3,1,2,1,1,2,2,4,4,5,4,5,5,5,5,2,1,1,2,3,
5,4,4,5,2,4,4,4,4,1,5,1,3,5,5,3,2,2,1,2,4,5,5,5,5,5,5,5,5,2,1,1,1,3,
5,4,4,5,4,4,4,5,5,2,4,1,3,5,4,4,2,1,1,2,2,4,4,4,4,5,5,5,5,2,1,1,2,3,
Note that there is missing data when two commas appear in a row.
To see some of the other output, the factor analysis gives linear combinations of
the factors for each question.
> a
Factor Analysis using method = pa
Call: fa(r = survey, nfactors = 4, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
Q1 0.17 -0.12 0.31 0.45 0.342 0.66 2.3
Q2 0.09 -0.05 0.40 0.02 0.169 0.83 1.2
Q3 0.13 -0.23 0.51 -0.11 0.342 0.66 1.7
Q4 0.12 0.00 0.50 0.25 0.328 0.67 1.6
April 24, 2015 61 / 67
Example
Even though four factors doesn’t fit the data well, we can try to see if we
can interpret the factors to some extent. The factor loadings can be made
easier to read by only printing those above a certain threshold:
> print(a$loadings,cutoff=.5)