Lorenzo Seva 2006 FACTOR
Lorenzo Seva 2006 FACTOR
Lorenzo Seva 2006 FACTOR
Journal
2006, ??
38 (?),
(1), ???-???
88-91
Exploratory factor analysis (EFA) is one of the most widely used statistical procedures in psycholog-
ical research. It is a classic technique, but statistical research into EFA is still quite active, and various
new developments and methods have been presented in recent years. The authors of the most popular
statistical packages, however, do not seem very interested in incorporating these new advances. We
present the program FACTOR, which was designed as a general, user-friendly program for computing
EFA. It implements traditional procedures and indices and incorporates the benefits of some more
recent developments. Two of the traditional procedures implemented are polychoric correlations and
parallel analysis, the latter of which is considered to be one of the best methods for determining the
number of factors or components to be retained. Good examples of the most recent developments
implemented in our program are (1) minimum rank factor analysis, which is the only factor method
that allows one to compute the proportion of variance explained by each factor, and (2) the simplimax
rotation method, which has proved to be the most powerful rotation method available. Of these meth-
ods, only polychoric correlations are available in some commercial programs. A copy of the software,
a demo, and a short manual can be obtained free of charge from the first author.
Exploratory factor analysis (EFA) has had its ups and new developments and methods have been presented in
downs and its share of criticism. However, it is still one of recent years. Given that the creators of the most popu-
the most widely used statistical procedures in psychologi- lar statistical packages do not seem very interested in in-
cal research. Most criticisms of EFA do not challenge the corporating these new advances, most developers offer
fundamental usefulness of the technique per se, but rather programs or macros that allow these recent and specific
the manner in which it has been applied in some empiri- procedures to be computed. This solves the problem par-
cal studies. Part of the blame for this may lie in the way tially, but because different programs must be used for
in which the technique is implemented in the most com- the different procedures, and because these programs are
monly available statistical software (Fabrigar, Wegener, scattered around the literature, it is unlikely that the new
MacCallum, & Strahan, 1999). methods will be popularized. It may be that if a particular
EFA involves taking a series of steps that can be ap- EFA procedure is not offered in a current popular pro-
proached using various procedures. The main steps are gram, or at least in a general program, a researcher is un-
(1) to estimate or extract the factors, (2) to decide how likely to use it.
many factors to retain, (3) to rotate factors to an inter- An example of a program that computes procedures
pretable orientation, and (4) to obtain individual factor and indices not included in any commercial statistical
scores. Several traditional procedures for these steps are package is the comprehensive exploratory factor analysis
available in popular statistical packages. However, in gen- (CEFA) pack developed by Browne, Cudeck, Tateneni,
eral they are very few and, in some cases, far from ideal. and Mels (2004). This allows one, for example, to com-
This limitation was pointed out by MacCallum as early as pute McKeon’s (1968) Infomax rotation or Yates’s (1987)
1983 and is still present in modern statistical packages. Geomin rotation. Browne (2001) compared these rotation
The makers of these programs have made them easier to criteria to other well-known criteria and showed their high
use but have not improved the number of indices and pro- efficiency. However, none of the commercial programs
cedures implemented. implement them.
A review of the specialized literature shows that statisti- In this article, we present the program FACTOR, which
cal research into EFA is still quite active and that various was designed as a general, user-friendly program for com-
puting EFA. FACTOR uses traditional procedures and indi-
ces and incorporates the benefits of more recent develop-
ments not included in any commercial package or CEFA
This research was supported by Grant SEC2001-3821-C05-C02 from pack.
the Spanish Ministry of Science and Technology with the collaboration FACTOR is developed to fit the EFA model. Below, we
of the European Fund for the Development of Regions. Correspondence
concerning this article should be addressed to U. Lorenzo-Seva, Universi- describe in detail the procedures used.
tat Rovira i Virgili, Departament de Psicología, Ctra. De Valls, s/n, 43007 Factor analysis can be computed from different kinds
Tarragona, Spain (e-mail: urbano.lorenzo@urv.cat). of dispersion matrices, including covariance matrices,
Pearson correlation matrices, and tetrachoric/polychoric to good results and does not require any parameter to be
correlation matrices. The suitability of the dispersion ma- specified. Another recent method is weighted oblimin
trix to be factor analyzed is assessed by three tests: the (Lorenzo-Seva, 2000), which is intended to detect the sim-
determinant of the matrix, Bartlett’s test, and the Kaiser– plest observed variables before any rotation is performed,
Meyer–Olkin index. and these variables lead to the final position of the rotated
FACTOR requires the number of factors (or compo- axes. Finally, it must be pointed out that the following ad-
nents) to be retained to be specified. However, the mini- vice, offered by Browne (2001), is implemented in any
mum average partial test (Velicer, 1976) and parallel rotation: (1) To avoid convergence to local maxima, each
analysis (Horn, 1965) can be computed to help determine rotation is computed from a number of random starts and
whether or not the number of dimensions retained was from a prerotation method (clever start), and the rotated
suitable. Also, the eigenvalues of the dispersion matrices solution that attains the optimal criterion value is taken as
are printed. This means that Cattell’s screen test can be the solution for the analysis; and (2) Kaiser’s (1958) nor-
applied. malized weights and Cureton and Mulaik’s (1975) weights
Both principal component analysis (PCA) and factor are available. Even if these recommendations are imple-
analysis can be computed. For factor analysis, the pro- mented by default, they can also be modified by the user.
gram uses two common procedures—unweighted least After the rotation phase, FACTOR returns a series of
squares (ULS) and exploratory maximum likelihood indices—namely, (1) the reliabilities of orthogonally ro-
(EML)—plus a relatively new and unusual procedure: tated components (ten Berge & Hofstee, 1999); (2) the re-
minimum rank factor analysis (MRFA; see, e.g., Shapiro liabilities of rotated factor scores by the formula 1/(1 ⫹ SE2)
& ten Berge, 2002; ten Berge & Kiers, 1991). The MRFA given by Mislevy and Bock (1990), where SE is the stan-
method decomposes observed variables into common dard error of factor scores; (3) the proportion of common
parts and unique parts that satisfy the following classi- variance explained by each rotated factor when MRFA
cal requirements: The covariance matrices for common is computed; (4) Bentler’s (1977) simplicity; and (5) the
and unique parts must be positive semidefinite (no nega- loading simplicity indices (Lorenzo-Seva, 2003)—to as-
tive eigenvalues), and the latter covariance matrix must sess the level of factor simplicity attained in the rotated
be diagonal. Subject to these requirements, the method solution.
minimizes the common variance that is ignored when only Finally, factor scores are computed by the linear predic-
some factors are maintained. The proportion of common tion method originally proposed by Anderson and Rubin
variance explained by each of the retained factors can then (1956; see Gorsuch, 1983), which was generalized to
be interpreted. In ULS, the Heywood correction described oblique factor solutions by McDonald (1981) and further
in Mulaik (1972, p. 153) is included: When an update has developed by ten Berge, Krijnen, Wansbeek, and Shapiro
a sum of squares larger than the observed variance of the (1999).
variable, the corresponding row is updated by constrained
regression using the procedure proposed by ten Berge and Input and Output
Nevels (1977). The input consists of an ASCII format file containing
When ULS or EML is computed, the program returns respondents’ scores, the number of participants and vari-
the following indices to assess the model’s goodness of fit: ables, and the number of factors expected. Alternatively,
(1) the goodness-of-fit chi-square statistic; (2) the non- an ASCII format file containing the covariance or corre-
normed fit index, also known as the Tucker–Lewis reli- lation matrix can be used. When simplimax is computed,
ability coefficient; (3) the comparative fit index; (4) the further input is required. This is the number of expected
goodness-of-fit index; (5) the adjusted goodness-of-fit salient values in the rotated loading matrix (i.e., loadings
index; (6) the root mean square error of approximation; whose values are clearly different from zero). Even if a
and (7) the estimated noncentrality parameter. Descriptive number of salient loadings are expected in the rotated
statistics of the distribution of residuals are also computed. loading matrix, simplimax encourages the user to check
After the extraction phase, the solution is rotated to an interval of salient loadings rather than just one value.
achieve maximum simplicity and interpretability. The The program suggests the values of these intervals. Since
program includes a large number of rotation methods. a final number must be selected, the program follows
Some of these methods, such as varimax (Kaiser, 1958), Kiers’s (1994) advice—that is, it looks out for the largest
direct oblimin (Clarkson & Jennrich, 1988), and promax jump in simplimax function and suggests this value.
(Hendrickson & White, 1964), are already very popular in The output consists of the indices explained above and
the research literature, whereas others are new. One of the is stored in the ASCII format file Output.txt. Even if the
most efficient rotation methods ever proposed is probably output information is too detailed, the user can choose a
simplimax (Kiers, 1994). Unfortunately, this method is a simplified output.
little difficult to use: An interval of possible salient load-
ings must be specified, and a final one must be selected. An Empirical Example
The program suggests reasonable values (following the The accuracy of FACTOR is illustrated by an empirical
advice of Kiers, 1994) that usually lead to good results. example. We used two sets of items selected from the Ey-
Alternatively, promin (Lorenzo-Seva, 1999) can be senck Personality Questionnaire Revised (Eysenck, Ey-
computed. This special case of simplimax usually leads senck, & Barrett, 1985). The first set consisted of 8 binary
90 LORENZO-SEVA AND FERRANDO
items taken from the Extraversion scale. The second set Table 2
consisted of 6 binary items taken from the Neuroticism Eigenvalues and Proportions of
scale. These 14 items were mixed and presented in a sin- Common Variance Obtained by MRFA
gle questionnaire. Respondents were 279 psychology and Proportion of Accumulated Proportion
Factor Eigenvalue Common Variance of Variance
social sciences undergraduates at a university in Spain.
The theory suggested that a two-factor model should be 1 4.499 .412 .412
2 2.396 .220 .632
expected. 3 1.068 .098
With this data set, we computed PCA, ULS, and EML 4 0.716 .066
using our program FACTOR, but also using SPSS, SAS, 5 0.630 .058
BMDP, and CEFA. To obtain comparable outputs from 6 0.572 .052
7 0.407 .037
the different programs, we always computed the Pearson 8 0.250 .023
correlation matrix, retained two factors (or components), 9 0.156 .014
and rotated the data using direct oblimin (γ ⫽ 0). We also 10 0.123 .011
rotated the data using normalized promax (k ⫽ 4) to allow 11 0.063 .006
comparison with SAS output. Tucker’s (1951) congruence 12 0.031 .003
13 0.000 .000
coefficients between factors (or components) obtained by 14 0.000 .000
FACTOR and the other programs are shown in Table 1. Note—Total common variance ⫽ 10.911.
The high congruence of the values shown in Table 1
indicates that the solutions obtained by FACTOR were
identical to those obtained by the other programs, so the Program Limitations
interpretation of the factor solutions was also identical We have developed FACTOR in Visual C⫹⫹ to be run
irrespective of the program used. in Microsoft Windows operating systems. We have tested
Finally, we attempted an analysis that cannot be ob- the program on several computers with different chips
tained using SPSS, SAS, BMDP, CEFA, or other com- (always Pentium) and versions of Windows (95, 98, NT,
mercial programs. We analyzed polychoric correlations, 2000, and XP) and found that it works correctly.
extracted factors using MRFA, and computed simplimax The number of variables and subjects in the data set is
oblique rotation. Table 2 shows the eigenvalues of the re- not limited. However, when large data sets are analyzed,
duced correlation matrix, the proportion of common vari- the speed of the analysis depends on the amount of mem-
ance explained by each eigenvalue, the total amount of ory installed in the computer. One limitation of FACTOR
common variance, and the proportion of common vari- could be the time needed to compute some of the methods
ance explained by the two factors retained. We can see used, such as MRFA or simplicity indices.
that none of the eigenvalues is negative (as would be the Although they can be considered the exception rather
case if any other extraction method were used), so the total than the rule, convergence problems and improper solu-
amount of common variance (10.911) and the correspond- tions could appear in practical applications and have been
ing proportion of common variance (.632) explained by reported in relation to ill-conditioned, nearly singular ma-
the two factors extracted are available. trices. Commercially available programs usually have a
Table 3 shows the rotated loading matrix, the common- set of automatic, internal procedures (modifications, cor-
ality of each item, the common variance explained by each rections, or restrictions) for dealing with such problems
rotated factor (and its corresponding proportion), and the (see, e.g., Clarke, 1970; Jöreskog, 1977), and an accept-
reliabilities of the rotated factor scores. The level of fac- able final solution is usually found. In our opinion, how-
tor simplicity attained after rotation was assessed by the ever, a convergence problem or an improper solution must
values of Bentler’s (1977) simplicity index and the loading be considered an indication that there is some problem
simplicity index, which were .998 and .707, respectively. with the data or that the hypothesized factor model is inap-
Both values were highly significant, since they were in the propriate for the population being sampled. Therefore, no
99th percentile in the distribution of possible values for such automatic controls have been programmed. Examples
this data set. The interfactor correlation value was .283. of situations in which FACTOR fails to converge are cases
in which (1) the correlation matrix is not positive definite,
and therefore MRFA cannot be computed; and (2) the de-
terminant of the dispersion matrix is lower than .00001,
Table 1 and therefore no further analysis will be computed.
Congruence Coefficients Between Factors (or Components)
Obtained by FACTOR and by Other Programs Finally, FACTOR computes neither any available ex-
PCA ULS EML
traction method nor loading standard errors. It must be
said that loading standard errors can be computed using
Program CI CII FI FII FI FII
CEFA pack.
SPSS* 1.000 1.000 1.000 1.000 1.000 1.000
CEFA* – – 1.000 0.998 1.000 0.998
SAS** 1.000 1.000 1.000 1.000 1.000 1.000 Program Availability
BMDP* 1.000 1.000 – – 1.000 0.998 A copy of the software, a demo, and a short manual can
Note—CI and FI, extraversion; CII and FII, neuroticism. *Direct obli- be obtained free of charge from the first author at urbano
min (γ ⫽ 0) computed. **Promax (k ⫽ 4) computed. .lorenzo@urv.cat.
FACTOR 91