Cfa 2

Uploaded by

benfares

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Cfa 2

Uploaded by

benfares

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/273003228

Conﬁrmatory factor analysis: a brief introduction and

critique

Article · August 2013

CITATIONS READS

20 17,050

1 author:

Peter Prudon
FZP-press
43 PUBLICATIONS 113 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Optimization of predicted clusters of test items. View project

Heterogeneity and etiology of OCD View project

All content following this page was uploaded by Peter Prudon on 17 July 2015.

The user has requested enhancement of the downloaded file.

ORIGINALITY | CREATIVITY | UNDERSTANDING

Comprehensive Psychology
Comprehensive Psychology is an Open Access peer-reviewed publication and
operates under the CC-BY-NC-ND Creative Commons License. The Author(s)
retains copyright to this article and all accompanying intellectual property rights.
Attribution — You must give appropriate credit, provide a link to the
license, and indicate if changes were made. You may do so in any
reasonable manner, but not in any way that suggests the licensor
endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
NoDerivatives — If you remix, transform, or build upon the material, you may
not distribute the modiﬁed material.’

Additional Information about the terms and conditions of this Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International Public License
can be found at www.CreativeCommons.org

COMPREHENSIVE
PSYCHOLOGY

www.AmSci.com
Comprehensive Confirmatory factor analysis as a tool in research
Psychology using questionnaires: a critique1, 2
2015, Volume 4, Article 10
ISSN 2165-2228
Peter Prudon
DOI: 10.2466/03.CP.4.10 Independent Researcher in Psychology, Amsterdam, The Netherlands
© Peter Prudon 2015
Attribution-NonCommercial-
NoDerivs CC-BY-NC-ND
Abstract
Received March 2, 2015 Predicting the factor structure of a test and comparing this with the factor struc-
Accepted June 23, 2015 ture, empirically derived from the item scores, is a powerful test of the content
Published July 15, 2015 validity of the test items, the theory justifying the prediction, and the test's con-
struct validity. For the last two decades, the preferred method for such testing
has often been confirmatory factor analysis (CFA). CFA expresses the degree of
discrepancy between predicted and empirical factor structure in χ2 and indices
of “goodness of fit” (GOF), while primary factor loadings and modification in-
dices provide some feedback on item level. However, the latter feedback is very
limited, while χ2 and the GOF indices appear to be problematic. This will be
demonstrated by a selective review of the literature on CFA.

For construct validation of psychopathology and personality questionnaires, research-

Citation ers often make use of confirmatory factor analysis (CFA), especially when the tests are
Prudon, P. (2015) Confirma- supposed to be multidimensional. For this, a covariance matrix is calculated over the
tory factor analysis as scores of a number of subjects and CFA is then applied to test whether a presumed fac-
a tool in r esearch using
questionnaires: a critique.
tor structure or pattern is not contradicted by this matrix. CFA is executed by means of
Comprehensive Psychology, structural equation modeling (SEM), a very sophisticated statistical procedure for test-
4, 10 ing complex theoretical models on data. CFA is only used for the measurement part of
the models. Since a computer program became available for SEM (LISREL; Jöreskog &
Sörbom, 1974), this method has gained much in popularity. LISREL has been updated
several times (Jöreskog & Sörbom, 1988), and there are several similar programs avail-
able now, e.g., Amos (Arbuckle, 2004; now included in SPSS), EQS (Bentler, 2000-2008),
and Mplus (Muthén and Muthén, 1998-2010). All these can run on current personal
computers, so the threshold to using CFA has become very low.
However, problems with the method have also increasingly been reported, espe-
cially since the turn of the century (e.g., Breivik & Olsson, 2001; Browne, MacCallum,
Kim, Andersen, & Glaser, 2002; Tomarken & Waller, 2003). Current users of the method,
working in the applied social sciences, may be less aware of these limitations of CFA
than statisticians are and be over-optimistic about the reliability of the method when
striving to validate questionnaires. The current paper is primarily meant for them. Re-
searchers well initiated in statistics will probably find little new in it, but to see the prob-
lems recapitulated in a systematic way may motivate them to take note of the alterna-
tive approach to goodness-of-fit which I briefly sketch in the last section. Also, the two
cautious explanations of puzzling findings, suggested in later sections, may invite them
to dispute.
Test scales are devised to measure certain abilities or skills, whereas questionnaire
scales are devised to measure, for instance, certain personality traits, diagnostic catego-
ries, or psychological conditions. So there is much difference between test items which
provide an objective correct–incorrect score and questionnaire items which provide a
subjective rating of oneself or another person, often on a quasi-interval scale of 3 to 10

1
Address correspondence to Peter Prudon, Sajetplein 31, 1091DB Amsterdam, The Netherlands or e-mail (p.
[email protected]).
2
The dissertation by Stuive (2007) stimulated and molded my interest in this topic. I received helpful com-
ments on my thoughts about this issue from Lesley Hayduk on SEMNET and Stanley Mulaik on and beyond
SEMNET, June–July 2011. Herbert Marsh, Feinian Chen, and Ilse Stuive kindly granted me permission to
reproduce and summarize some of their quantitative results. Thanks. I am grateful for the professional com-
Ammons Scientific ments of peer reviewers, editors and colleagues on forerunners of this paper. Among the latter were Paul Bar-
www.AmmonsScientific.com rett, Steven Blinkhorn, and William Robert Nugent, who made stimulating comments, while Robert Brooks
and Patrick Malone took the trouble of making useful critical comments.
Confirmatory Factor Analysis: A Critique / P. Prudon

points. Nevertheless, questionnaires are often referred gradual manner (Stouthard, 2006), because after each
to as tests in literature. For reasons of convenience the modification the entire picture of cluster-item correla-
term “test” will be used interchangeably with “ques- tions will change. Test devisers do not go to extremes
tionnaires” in this paper as well, yet it should never be with these modifications because it is not wise to mold
interpreted as meaning a measure of aptitudes and abil- one's well thought-through test to every whim of imper-
ities. This paper is about questionnaire validation only. fect empirical data.
Although this type of item analysis appears to be
Item Clustering as Feedback on a Test's tailored for testing (and revising) cluster predictions,
Theoretical Basis and Validity statisticians are uncomfortable about relying so totally
Because they are meant to tap an aspect of a certain con- on zero-order correlations between raw item scores and
struct, the items within a questionnaire are supposed to on equating cluster scores with the unweighted sum of
have at least modest inter-correlations and should clus- the test scale items. Common factor analysis seems a
ter. If a questionnaire is supposed to measure several better option because in this approach the variance per
distinct qualities, then the items should show a cluster- item is divided into a common part (common with the
ing corresponding to these various subscales. factor on which the item loads) and a unique part (item-
It follows that an empirically found item clustering specific variance plus error). Principal axis factor anal-
that corresponds to the ideas that guided the construc- ysis is the most applied form of common factor analy-
tion of the questionnaire is strong support for these sis. It has partly replaced principal component analysis,
ideas, as well as for the content validity of the items which is based on the undivided variance of variables.
and the construct validity of the questionnaire. If the In factor analysis all variables contribute—with a great-
predicted clustering differs vastly from the empirical- er or smaller weight—to each factor.
ly found clustering, the theory behind it could be con- However, these are examples of exploratory factor
sidered faulty, and/or the test scale a mistaken opera- analysis (EFA). EFA is applied to data without an a pri-
tionalization, provided a good sample has been drawn. ori model. It traces dimensions within a covariance or
However, if the difference is moderate the discrepancies correlation matrix to the point at which enough vari-
could be used to refine the theory and/or for further ance has been explained. To further optimize these di-
improvement of the measuring instrument. mensions rotation is performed: orthogonal, when in-
The discrepancy could involve a low or opposite dependent scales are expected, and oblique, when some
correlation of an item with its predicted scale. This may of the dimensions are expected to correlate moder-
imply that the item is either poorly formulated or a poor ately or highly (up to a ceiling). The resulting factors
operationalization of the phenomenon it represents, or may be compared to the presumed test scales: are the
that there is a flaw in the theory. The discrepancy could items of Scale A the ones loading highly on Factor X,
also involve a high correlation between an item and a those of Scale B on Factor Y, etc.? And are they loading
scale to which it had not been assigned. In that case, the much lower on a different factor? Is the correlation be-
theory probably needs modification. Other flaws are: a tween the factors in line with the expected scale cor-
correlation between two scales is much higher than ex- relations?
pected, even to the point that the two scales could be The disadvantage of this often applied approach
considered as forming a single one-dimensional scale; may be that the empirical factor structure/pattern3 is
or it is much lower than expected. In both cases, the the- too much affected by incidentally extreme item inter-
ory will need to be reformulated to some extent. correlations or by over- or under-representation of cer-
tain items, yielding factors that differ in number and
Methods For Testing a Predicted Item content from the test scales. When a result is due to un-
Clustering representative properties of the correlation/covariance
How does one test and evaluate the empirical cluster- matrix, such a result need not indicate a faulty predic-
ing in relation to an a priori cluster prediction? A very tion. However, adjusting the predicted structure/pat-
direct and controlled way would be to examine the cor- tern to the data without unnecessarily giving up the
relations between the items and the predicted scales (the prediction, on the one hand, and not violating the data
latter are mostly operationalized as the unweighted sum on the other seems a less vulnerable procedure. CFA
of the item scores). These correlations should be in line offers a way to achieve this, based on common factor
with the prediction, and to the extent that they are not analysis with its division in common and unique vari-
they offer a basis for revising the scales in the direction ance of the variables. How does this work?
of greater homogeneity and independence. This method
is known as item analysis (along the lines of classical test
theory, not of item response theory!); many test devel- 3
Structure refers to the matrix of factor–item correlations (factor load-
ings) after both an orthogonal and oblique rotation; pattern refers to
opers make use of it. If the clusters are indeed revised, the matrix of pattern coefficients (standardized beta weights) which is
the revision must be performed iteratively in a very part of the output after an oblique rotation only.