0% found this document useful (0 votes)
8 views4 pages

Supplementatry Note - Exploratory Factor Analysis

Uploaded by

tangm1779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Supplementatry Note - Exploratory Factor Analysis

Uploaded by

tangm1779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Exploratory Factor Analysis

(Supplementary notes)

In this set of notes, I simplify the mathematical concepts using “less than exact” explanation, but
hopefully they are easier to understand. Mathematicians differentiate between factor analysis
(FA) and principal components (PC) analysis, please read standard statistics textbooks for their
differences. Here I am using the terms interchangeably. As LISREL is not particularly designed
for exploratory factor analysis (EFA), I need to run 2 programs in order to get all the necessary
output that I want. However, in other softwares (e.g., SPSS), it is possible to get more
information using simpler syntax.

1. Assume we have 9 school subjects, with 100 students, we find that the correlations among
these subjects are:
Correlation Matrix among 9 academic subjects
_________________________________________________________
1 2 3 4 5 6 7 8 9
_________________________________________________________
Subject 1 1.00
Subject 2 0.12 1.00
Subject 3 0.08 0.08 1.00
Subject 4 0.50 0.11 0.08 1.00
Subject 5 0.48 0.03 0.12 0.45 1.00
Subject 6 0.07 0.46 0.15 0.08 0.11 1.00
Subject 7 0.05 0.44 0.15 0.12 0.12 0.44 1.00
Subject 8 0.14 0.17 0.53 0.14 0.08 0.10 0.06 1.00
Subject 9 0.16 0.05 0.43 0.10 0.06 0.08 0.10 0.54 1.00
_________________________________________________________

By visual inspection, it seems Subjects 1, 4, 5 form a cluster. But what are the other relations?
It would be extremely difficult, if not impossible, to understand the relations among these
subjects when their relationships are less distinct as above, or when there are many variables
involved.
Our Research Question:
a) How many cluster of subjects are there? How do these 9 subjects relate to each of these
clusters (factors)?
b) Which of these subjects are more closely related/correlated than others?

2. Using LISREL, run the following program (EFA1.ls8, download from CUFORUM)
DA NI=9 NO=100
KM
1.0000
0.1200 1.0000
0.0800 0.0800 1.0000
0.5000 0.1100 0.0800 1.0000
0.4800 0.0300 0.1200 0.4500 1.0000
0.0700 0.4600 0.1500 0.0800 0.1100 1.0000
0.0500 0.4400 0.1500 0.1200 0.1200 0.4400 1.0000
0.1400 0.1700 0.5300 0.1400 0.0800 0.1000 0.0600 1.0000
0.1600 0.0500 0.4300 0.1000 0.0600 0.0800 0.1000 0.5400 1.0000
PC NC=6
OU
Explanation:
(a) In the first line, “DA” starts the DAta description, there are totally 9 variables (No. of
Indicators), with NO. of subjects = 100.
(b) KM means we are inputting the correlation matrix, by default, only the lower half of the full
matrix is entered.
(c) PC means principal components analyses, we are interested to reduce the correlation matrix
into a small set of variables (= components). We can request analyses on NC=1 to NC=6

The output:
Principal Components Analysis
Eigenvalues and Eigenvectors
PC_1 PC_2 PC_3 PC_4 PC_5 PC_6
-------- -------- -------- -------- -------- --------
Eigenvalue 2.56 1.66 1.63 0.69 0.59 0.56
% Variance 28.42 18.49 18.15 7.65 6.50 6.18
Cum. % Var 28.42 46.91 65.06 72.71 79.21 85.39

(d) 2 rules are generally used to determine the number of factors/components to be retained:
(i) Eigenvalue (EV): generally we stop at EV ≧ 1, using this rule, we 3

retain 3 factors 2.5

(ii) scree test: we plot EV (y-axis) against no. of factor (x-axis, see below); 2

the slope drops gradually, we stop at the point when this slope changes 1.5 數列1

most substantially; from the graph, if we smoothen the line, we will find 1

that the slope changes the most around N=2 to 4 (around 3); so we decide 0.5

that 3 factors should be retained. (note: this method can be quite 0


1 2 3 4 5 6
subjective).

e) note that the % explained by each additional factor drops. The first factor explains 28.42% of
the original variance (of 6 variables). Thus, for example, if we simplify the model by using 3
factors, they will account totally 65.06% of the original variance (explained by the 6 variables).
Of course, we gain to reproduce more % of the original variance by having more factors, but this
will be in diminishing return.

3. Assume we retain 3 factors, with LISREL, we run the following program and obtain further
information on each factor (EFA2.LS8):

DA NI=9 NO=100
KM
1.0000
0.1200 1.0000
0.0800 0.0800 1.0000
0.5000 0.1100 0.0800 1.0000
0.4800 0.0300 0.1200 0.4500 1.0000
0.0700 0.4600 0.1500 0.0800 0.1100 1.0000
0.0500 0.4400 0.1500 0.1200 0.1200 0.4400 1.0000
0.1400 0.1700 0.5300 0.1400 0.0800 0.1000 0.0600 1.0000
0.1600 0.0500 0.4300 0.1000 0.0600 0.0800 0.1000 0.5400 1.0000
FA NF=3
OU
Explanation: we ask for factor analyses results with 3 factors (NF=3).

The Output:
Varimax-Rotated Factor Loadings

Factor 1 Factor 2 Factor 3 Unique Var


-------- -------- -------- ----------
VAR 1 0.10 0.73 0.04 0.46
VAR 2 0.09 0.06 0.66 0.55
VAR 3 0.63 0.05 0.12 0.58
VAR 4 0.08 0.67 0.08 0.53
VAR 5 0.04 0.65 0.07 0.57
VAR 6 0.07 0.05 0.68 0.54
VAR 7 0.05 0.07 0.65 0.57
VAR 8 0.82 0.08 0.07 0.32
VAR 9 0.65 0.09 0.04 0.56

Promax-Rotated Factor Loadings

Factor 1 Factor 2 Factor 3 Unique Var


-------- -------- -------- ----------
VAR 1 0.73 -0.03 0.03 0.46
VAR 2 0.00 0.66 0.02 0.55
VAR 3 -0.02 0.06 0.64 0.58
VAR 4 0.68 0.02 0.01 0.53
VAR 5 0.66 0.01 -0.03 0.57
VAR 6 -0.01 0.68 0.00 0.54
VAR 7 0.01 0.66 -0.02 0.57
VAR 8 0.00 -0.01 0.83 0.32
VAR 9 0.03 -0.03 0.66 0.56

Factor Correlations

Factor 1 Factor 2 Factor 3


-------- -------- --------
Factor 1 1.00
Factor 2 0.19 1.00
Factor 3 0.21 0.22 1.00

Explanation:
(a) Usually, rotated factor patterns are provided which give us easier to interpret solutions.
Two types of rotation are available. Varimax assumes the factors are orthogonal,
which means the factors are uncorrelated (correlation = 0). Promax assumes the factors
are oblique, i.e., factors can be correlated. Whether factors should be orthogonal or
oblique depends on theory (what does the theory predict?) and empirical results (if we
assume oblique solution, whether they really turn out to be correlated).
(b) The factor loadings show the strength of relations between the variable and each of the
factors. For the varimax solution, it can be seen that Variable 1 (i.e., School Subject 1)
loaded most strongly (.73) on Factor 2 (i.e., most related to Factor 2). Similarly,
Variable 2 loads on Factor 3 (.66) and so on. You can say that Variable 1 belongs to
Factor 1, Variable 2 belongs to Factor 3, etc.
(c) By reading the result columnwise, you can see that Factor 1 is defined by Variables 3, 8,
9, Factor 2 by Variables 1, 4, 5, and Factor 3 by Variables 2, 6, 7. We usually find a
label for each factor by looking for the commonality among its items (variables). For
example, if Variables 3, 8, 9 are Biology, Chemistry, Physics, we may label Factor 1 as
Science. Similarly depending on the nature of the other variables, we may label them
as, for example, Language and Humanities.
(d) In the promax solution, though the order of factors are not identical to that in varimax
(e.g., Factor 1 in Varimax becomes Factor 3 in promax), the factor patterns are very
similar. With other examples and data sets, the promax and varimax results may be
quite different.
(e) With the promax solution, we also obtain the correlations among the factors
(.19, .21, .22). Very low correlations suggest that we should consider using orthogonal
rotation for simplicity.
(f) Usually, loadings with .3 or below are considered low and are sometimes not reported in
the tables in journal articles (to make the tables look simpler). Good psychological
instruments contain items having loadings of .6 (or .7) or above to their respective
factors.
(g) A negative loading means the variable is negatively related to the factor. A high score
in the factor indicates (means) the low end of the variable.

Difference between EFA (exploratory factor analyses) and CFA (confirmatory factor
analyses)

EFA(Exploratory Factor Analysis) Confirmatory Factor Analysis


We do not have specific idea on how each From our literature review or previous research, we
variable is related to each of the factor. have some guess (hypothesis) on how the variables
are related to each of the factors (e.g., Variables 1,
4, 5 should load on Factor 3).
We determine the number of factors using We know beforehand the number of factors.
eigenvalue (EV ≧ 1 or scree test)
Each item is loaded on ALL of the factors. Items are loaded on their target factors (usually
Even for the factors that the items are not loaded on 1 target factor, rarely more than 1); e.g.,
strongly related, there are still small if Variable 1 is loaded on Factor 2, the loadings of
positive or negative loadings. Variable 1 on Factors 1 and 3 are all zero.

You might also like