0% found this document useful (0 votes)

44 views10 pages

Testing Testing One Two Three by Les Hayduk

This document critiques recommendations by Paul Barrett for evaluating structural equation models. It makes three key points: 1) Researchers should aim to rigorously test their theories and models, rather than just "adjudging fit". Even minor ill fit could indicate serious flaws and alternative equivalent models exist. 2) Structural equation models represent specific theoretical connections that should be tested against data, not just fitted. Non-significant ill fit is expected for properly specified models, but small ill fit could still suggest flaws. 3) Barrett's recommendations downplay the importance of the chi-square test and overlook that multiple different models could all fit the data equally well. Greater attention should be paid to model testing rather than just achieving

Uploaded by

dr.abdulqayyum.ait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views10 pages

Testing Testing One Two Three by Les Hayduk

Uploaded by

dr.abdulqayyum.ait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Personality and Individual Diﬀerences 42 (2007) 841–850

www.elsevier.com/locate/paid

Testing! testing! one, two, three – Testing the theory in

structural equation models!
Leslie Hayduk a,*, Greta Cummings b, Kwame Boadu a,
Hannah Pazderka-Robinson c, Shelley Boulianne d
a
Department of Sociology, University of Alberta, Edmonton Alta., Canada T6G 2H4
b
Faculty of Nursing, University of Alberta, Edmonton Alberta, Alta., Canada T6G 2G3
c
Alberta Mental Health Board, 10025 Jasper Avenue, Canada T5J 2N3
d
Department of Sociology, University of Wisconsin-Madison, Madison WI 53706, USA

Received 11 January 2006; received in revised form 1 September 2006; accepted 27 September 2006
Available online 21 December 2006

Abstract

Barrett (2007) presents minor revisions to statements previously posted on Barrett’s website, and discussed
on SEMNET (a web discussion group about structural equation modeling). Unfortunately, Barrett’s ‘‘recom-
mendations’’ remain seriously statistically and methodologially flawed. Unlike Barrett, we see scientific value
in reporting models that challenge or discredit theories. We critique Barrett’s way of proceeding in the context
of both especially small and large samples, and we urge greater attention to the v2 significance test.
Ó 2006 Elsevier Ltd. All rights reserved.

Keywords: SEM; Structural equation model; Chi-square (or v2); Fit; Testing; Sample size

1. Introduction

In ‘‘Structural equation modeling: adjudging model ﬁt’’ Barrett (2007) provides minor revisions
to advice previously presented on Barrett’s web site, and critiqued by one of us (Hayduk). Despite
some revisions, Barrett’s recommendations remain seriously problematic. We begin with a brief
*
Corresponding author.
E-mail address: [email protected] (L. Hayduk).

0191-8869/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.paid.2006.10.001
842 L. Hayduk et al. / Personality and Individual Diﬀerences 42 (2007) 841–850

context for Barrett’s article, before considering speciﬁc problematic statements. We argue that jour-
nal reviewers should pay greater attention to v2 signiﬁcance tests, and to the theory being tested.

2. Testing structural equation models

Structural equation modeling (SEM) grew out of the confluence of a path analytic tradition and
a factor analytic tradition. The path analytic tradition had a history of seeking to test theories
(Duncan, 1975). The exploratory factor analytic tradition had almost no theory1 and non-testing
was the factor analytic norm. The differences between investigating theory-laden models and
reducing data factorially became widely apparent when Les Hayduk (a supporter of the theoret-
ical-model tradition) joined SEMNET, an archived web discussion group, in 1997 and confronted
supporters of the factor analytic tradition. The extensive differences were summarised in a target
article, several commentaries, and a rejoinder published in Structural Equation Modeling (Hayduk
& Glaser, 2000a, 2000b). The Hayduk and Glaser target article demanded careful and attentive
model testing (2000a: p. 20–31), and none of the published commentaries challenged the need
for careful testing. However, testing and the deficiencies of disregarding significant ill fit became
points of discussion and debate on SEMNET and in the published research literature (Cummings,
Hayduk, & Estabrooks, 2006).
Paul Barrett participated in the SEMNET discussions and put forward recommendations on
how journal reviewers should deal with structural equation model testing (Sept 26, 2005). Les
Hayduk (Oct 1, 2005) challenged Barrett’s reviewer-statement, which led to progressively revised
statements (Oct 26, and Dec 2, 2005). In March 2006, Barrett switched venues and provided his
target article to Personality and Individual Differences (Barrett, 2007). Unfortunately, Barrett’s lat-
est attempt remains seriously deficient with respect to model testing. Some of Barrett’s statements
may strike the uninitiated as ‘‘surprisingly strong’’, indicating some responsiveness to Hayduk’s
earlier critiques, but numerous important deficiencies and contradictions remain.

3. Researchers must test theories!

Serious problems begin right in Barrett’s title. Do we want to ‘‘adjudge’’ fit, or do we want to
test our theorizing? Barrett’s title says judge fit. Our view is that researchers ought to do their best
to test their theorizing. Our scientific bias is to carefully test our structural equation models/the-
ories. Barrett’s bias displaces or avoids model/theory testing by replacing testing with adjudica-
tion, and by focusing on fit rather than on the model providing the fit. A statistician who
focuses on fit is merely ‘‘sticking to what they are supposed to know about’’ (statistics and data)
and ‘‘avoiding what they are not required to know’’ (substantive theoretical matters). Researchers
ought to be extremely interested in the relevant substantive theory. Abandoning substantive the-
ory is abandoning the life-blood of science. As researchers, we gain access to currently unknown
features of the world by proofing, checking, and testing our current theoretical understandings.
We place our current understanding of ‘‘how the world works’’ in our models as best we can,

1
It is overly charitable to describe as theory, the assertion that ‘‘an initially unspecified number of unknown latent
variables having unspecified interconnections’’ underlie some set of items.
L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850 843

and use diagnostic evidence accompanying any model’s ‘‘failure to ﬁt’’ to improve our under-
standing. Barrett’s focus on ‘‘judging ﬁt’’ rather than ‘‘testing models’’ hampers research by cir-
cumnavigating theory.

4. Structural equation models as theoretical models

Structural equation models represent specific theory-based causal connections between latent
variables and between those latents and relevant indicator variables. Estimates of the model’s
parameters are those values which, when placed in the model’s equations, imply an indicator var-
iance/covariance matrix that is as similar as possible to the data variance/covariance matrix. The
similarity, or dissimilarity, of these matrices is usually expressed as the likelihood of observing the
data covariance matrix had the model, with its causal estimates, constituted the population from
which the data was obtained. The model implied covariance matrix would be the population
covariance matrix if the model was the proper model.
Even if the model is properly causally specified and the estimates provide proper population
parameter values, random sampling fluctuations can keep the data matrix from corresponding ex-
actly to the model-implied (population) covariance matrix. That is, some degree of ill fit between
the model’s (theory’s) implications and the observed covariance matrix is expected to appear
merely by chance alone. But the differences between a model’s implications and the data might
not result from mere chance sampling fluctuations. Sometimes the model is simply causally
mis-specified, so the differences between the model-implied and data covariance matrices originate
in real model/theory deficiencies, not mere sampling fluctuations. Scientific tradition has set an
alpha probability of .05 as a limit on the degree of covariance divergence that can be tentatively
attributed to sampling fluctuations. A properly specified model should lead to non-significant dif-
ferences between the model-implied and data covariance matrices (p > .05) 19 times out of 20.
There are reasons to increase alpha in the context of SEM testing, but for our current purposes
the consequences of equivalent and nearly-equivalent models are more important.
Properly specified models should lead to non-significant differences between the model-implied
and data covariance matrices, but two or more different causal models may imply covariance
matrices that match any one data matrix (i.e., there are equivalent models (Hayduk, 1996)). Prop-
er model/theoretical specifications should imply covariance matrices that are within sampling fluc-
tuations of the data, but some (and possibly many) causally mis-specified models can also imply
covariance matrices corresponding to the observed data. Finding a model that fits the covariance
data does not say the model is the correct model, but merely that the model is one of the several
potentially very causally different models that are consistent with the data. The possibility of mul-
tiple models that fit but are seriously causally mis-specified (be they covariance-equivalent or
nearly-covariance-equivalent models) makes it unreasonable to assume that a small degree of
covariance ill fit reports the existence of only minimal causal mis-specification. A saturated model
is guaranteed to fit, but is not guaranteed to be even close to properly causally specified. Hence,
even a slight sign of ill fit should be attended to, because it might be the first detectable sign of
serious model causal mis-specification.
With these comments in mind, we turn to addressing specific statements made by Barrett,
roughly in the order they appear in his article.
844 L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850

5. Barrett on testing

Barrett (2007, p. 816) claims the model v2 test is a ‘‘conventional null hypothesis significance
test (NHST)’’. The v2 test has a null hypothesis and is a significance test, but it is rendered impor-
tantly unconventional by the possibility of mis-specified covariance-equivalent (and nearly covari-
ance-equivalent) models. Both the proper causal model, and seriously causally-misspecified
covariance-equivalent models can provide zero, or near zero, residual covariances. The existence
of covariance-equivalent models means that unlike tests applied to effects or correlations, the ‘‘en-
tity of interest’’ does not disappear or vanish when a ‘‘zero’’ hypothesized value is reached. Attain-
ing zero residual difference between the model-implied and data covariances does not mean the
entity of interest (causal mis-specification) has disappeared. This makes the consequences of
observing a hypothesis of zero effect (or zero correlation) and a hypothesis of zero-covariance-
residuals radically different. It renders some cogent critiques of ordinary null hypothesis signifi-
cance testing inapplicable to testing structural equation models, but a detailed discussion of this
would push us beyond the length limitation on this commentary.
Barrett says that ‘‘In general, the larger the sample size, the more likely a model will fail to fit
via using the chi-square goodness-of-fit test’’ (2007, p. 816). Barrett’s statement is true of only
some misspecified models, not models in general. For properly specified models, Barrett’s state-
ment is simply false because as N increases, the fit function that connects N to v2 decreases cor-
respondingly (Bollen, 1990), and hence v2 does not increase, and does not lead to model rejection.
Barrett’s statement is also false for covariance-equivalent yet causally mis-specified models (where
again, as N increases, the fit function decreases correspondingly). We cannot prevent Barrett from
claiming that all his models are detectably wrong in general, but we can encourage everyone to
strive for models that are properly specified, not wrong. Observing that at least some wrong mod-
els are more assuredly detected by larger samples (because of decreased sampling variability) is
good methodological news to those seeking proper models!
Barrett (2007, p. 817) connects covariance fit to the worth of a model. In ‘‘other areas of sci-
ence, model fit is adjudged according to how well a model predicts or explains that which it is de-
signed to predict or explain’’. This good-idea from ‘‘other areas of science’’ must be nuanced when
brought into the SEM context. Is a structural equation model designed to ‘‘predict or explain’’ a
theory? Is a model designed to encapsulate and test a theory? Structural equation models imply,
result in, and provide for variances and covariances as consequences of causal actions, but models
can fit with low or high R2, and models can fail with low or high R2. The R2 type of predictive
information is available in structural equation models, but this information does not statistically
test the model. Barrett (2007, p. 818) seems to acknowledge this when he says ‘‘the [v2] test is blind
to whether the model actually predicts or explains anything to some substantive degree’’. But
researchers ought to recognize that the model v2 test outcome is not foretold by the magnitude
of the ‘‘proportion of explained variance’’. An incorrect causal model can lead to biasedly-large
effect estimates and consequently to a biasedly-large R2; where the incorrectness of the model nul-
lifies and invalidates the high R2. Explaining variance is secondary to having a proper model that
provides the variance explanation.
We do not object to multi-faceted model assessments – examining predictive accuracy in terms
of R2, parsimony in terms of degrees of freedom, and theoretical substance in terms of the direct
and indirect effect routings permitted or contradicted by the estimates – but none of these replace
L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850 845

or displace model testing. If the model v2 test detects a causally mis-specified model, the biased
estimates and variance improperly-explained by those biased estimates, become impotent and
unconvincing. Detecting model causal mis-specification is not the only facet of model evaluation,
but it is the most fundamental facet because model mis-specification attacks the estimates that
underpin the other modes of model assessment.
Barrett does not see SEM as a tool for investigating theory, or differentiating potential mech-
anisms of action. He says ‘‘SEM is a modeling tool, and not a tool for ‘descriptive’ analysis. It fits
models to data.’’ (2007, p. 823). This misses the opportunity to see that theoretical intent can be
encapsulated in, and solid description provided by, structural equation models that investigate the
mechanisms by which a set of variables become interconnected. Barrett (2007, p. 823) ‘‘These
models require testing in order to determine the fit of a model to the data’’, but he does not
see the opportunity, potential, and demand for testing the underlying theory. He sees the fit as
being tested, not the substantive theory providing the fit!
Barrett (2007, p. 818) says that ‘‘When the model is theory-based, cross-validated predictive accu-
racy is even more compelling as the sole arbiter of model ‘acceptability’.’’ What he does not recog-
nize is that failing the model fit test can question the theory and render the predictions biased due to
biased estimates; and that cross-validation becomes double-crossing invalidation if the model is
repeatedly significantly inconsistent with the data, and no one takes notice of this. He ‘‘presumes’’
the theory is OK by repeatedly overlooking evidence potentially pointing to model specification
problems! By repeatedly overlooking or disregarding model fit test evidence, Barrett prevents the
data from speaking against the model’s theory, and somehow transforms repeated model failures
into cross-validation! This sounds like magic, but it is just methodological deficiency in disguise.

6. Barrett’s ‘‘recommendations’’

6.1. Barrett section 1

We agree that structural equation model testing via v2, degrees of freedom, and the associated
probability must be reported for all manuscripts reporting SEM results. We disagree that N < 200
or N > 10,000 provide reasonable boundaries for altering this requirement. The researcher simply
MUST report this information.
Barrett says that ‘‘if the model fits’’, the researcher might ‘‘proceed to report and discuss fea-
tures of the model’’ (2007, p. 820). Should researchers report, discuss, and publish failing models?
We argue that attentively constructed and theoretically meaningful models that fail ought to be
carefully discussed and published. Contrary to Barrett, we see discussion of well-conducted failing
models as contributing to scientific progress. Any area that is unable to openly acknowledge and
examine the deficiencies in its current theories is hampered from proceeding toward better theo-
ries. So we unequivocally disagree with Barrett’s recommendation to limit discussion and publi-
cation to only fitting models. If a model fails, the authors should not proceed to discuss the model
as if it were ‘‘OK anyway’’. They should publish a discussion of ‘‘how the world looks from this
theory/model perspective’’, and their diagnostic investigations of ‘‘how and why this theory/mod-
el perspective on the world fails’’. We need to understand what is problematic if we are to do bet-
ter next time around.
846 L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850

Next, consider the quotation Barrett selected from Burnham and Anderson (2002) regarding
large sample size. This quote is devious because it is not describing structural equation models,
but survival models. Burnham and Anderson were speaking in a context where ‘‘truth’’ could
be equated with not having to confront the complications of sampling fluctuations (in their sur-
vival data). Here is how Burnham and Anderson lead up to, the statement quoted by Barrett.
‘‘The results in Table 5.7 provide a motivation for us to mention some philosophical issues about
model selection when truth is essentially known, or equivalently in statistical terms, when we have
a huge sample size.’’ (Burnham & Anderson, 2002, p. 219). In the context in which Burnham and
Anderson were writing, ‘‘truth’’ was merely a statement that the data are stable and not subject to
sampling fluctuations. Barrett is mistaken if he thinks the truth of structural equation models be-
comes known merely because the data covariance matrix has become stable due to a large sample.
It is simply wrong SEM statistics to think that having a stable data covariance matrix would in-
form us about the truth or falsity of the several DIFFERENT models that could be fit to that
stable data! Barrett’s Burnham and Anderson quote is so far out of context that it can not be rea-
sonably connected to structural equation modeling!

6.2. Barrett section 2

This segment is titled ‘‘Sample size’’ even if problematic discussions of N appear in Sections 1,
2, 4, and 5. In this section Barrett begins with the unreasonable claim, that ‘‘SEM analyses based
upon samples of less than 200 should simply be rejected outright. . .’’ unless the population is
small. This claim is based on two statistical mistakes. First, the population relevant to model test-
ing is not the list of cases from which the sample was drawn, but is the model-implied covariance
matrix (often called sigma, or R). In model testing, the issue is not generalizability, but sampling
fluctuations – namely whether one can reasonably, or whether one ought not, attribute the differ-
ences between the model-implied (putative population) and data covariance matrices to sampling
fluctuations. Barrett implicitly talks generalizability, when the v2 model test concerns whether the
discrepancies with the data are stable enough to challenge the model. Generalizability has an
important place in modeling, but that place is not the same statistical place as structural equation
model testing.
Barrett’s second reason for ‘‘rejecting outright’’ models based on N’s less than 200 is that tests
based on small N do not have sufficient power. Barrett’s reasoning fails because high model-test-
ing power can be attained in other ways, even if N is less than 200. For example, Browne, Mac-
Callum, Kim, Andersen, and Glaser (2002) had an N of only 72 with sufficient power to detect
severe model specification problems (Hayduk, Pazderka-Robinson, Cummings, Levers, & Beres,
2005). The statistical basis of this high power (small measurement errors or unique variances) is
well discussed in Browne et al. (2002), but Barrett fails to recognize that there are ‘‘other ways to
attain high power’’ even when N is small.
Barrett hesitates ‘‘to advise authors to test the power of their model’’ despite the fact that highly
significant failure of a small-N model actually is a prima facie demonstration of sufficient power.
Non-borderline failure to fit is direct evidence of sufficient power, and the multitudes of failing
small-N models constitute direct demonstrations of sufficient testing power with small N. Barrett
should have noticed that N’s considerably smaller than 200 often display sufficient power to
clearly reject and ‘‘advise revision of’’ models. For example, Hayduk (1985) discusses instances
L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850 847

where samples of 22, 38, and 40 in the context of experimental studies provided suﬃcient power to
consistently reject some models while ‘‘accepting’’ others. Barrett’s blanket rejection of structural
equation models with N’s less than 200 is seriously inconsistent with both the statistical literature
and practical experience.

6.3. Barrett section 3

Section 3 is Barrett’s most reasonable section, but there remains room for improvement. When
considering multivariate non-normality in Section 3.1, Barrett should have reported that a re-
searcher would be entirely unjustified in disregarding severe ill-fit on the basis of a trivial degree
of non-normality. It would be faulty statistics to point to some significant yet minimal degree of
non-normality as explaining or excusing severe ill-fit. Researchers pointing to non-normality as
‘‘the culprit producing ill fit’’ should be required to document that their observed degree of ill
fit is comparable in magnitude to what could reasonably result from the degree and style of
non-normality in their data. Only some kinds of non-normality lead to excessive model failures
– other kinds of non-normality can actually lead to an increased likelihood of the model fitting
(Yuan, Bentler, & Zhang, 2005). There are several investigations of the impact of the degree of
non-normality on v2 (see the references in Yuan et al., 2005), but the degree or extent of the im-
pact is much less than urban myth suggests. And researchers claiming non-normality as their ex-
cuse for ill-fit should also be required to demonstrate that the greatest ill fit actually connects to
the most non-normal variables. If the model ill-fit is driven by a huge covariance residual between
two nearly normal variables, pointing to non-normality among other variables in the model is
untenable as an excuse for the model’s ill-fit.
Barrett also fails to recognize that if the non-normality is sufficiently severe to render testing
questionable, it is also likely to be sufficiently severe to render the estimates themselves question-
able. The v2 test originates from the same fit function that provides the estimates, and hence it is
‘‘awkward’’ for a researcher to claim that the fit test is questionable while the estimates remain
sound. The dependence of both the v2 test and estimates on the same fit function urges caution
against discounting the model fit test while claiming trustworthy estimates.
We will not belabor the fact that Barrett’s Section 3 provides no instructions or recommenda-
tions to those who use the v2 test and find a fitting model.

6.4. Barrett Section 4

This section returns to the statistical mistake of using increased sample size as a rationale for
ignoring the v2 test. Barrett should have been sensitive to this because it has been discussed repeat-
edly on SEMNET. He says the rationale ‘‘is always likely to revolve around the argument that as
a sample increases in size, so will increasingly trivial ‘magnitude’ discrepancies between the actual
and model implied covariance matrix assume statistical significance’’ (Barrett, 2007, p. 821). It is
reasonable to claim that increasing N increases the power to detect any given size of covariance
difference. Barrett’s mistake is to presume that the size of the covariance differences themselves
(the residuals) can be trusted to correspond to the size of the problem in the model’s specification.
Small residuals do not mean that the model mis-specifications are correspondingly small. SEM-
NET discussions between Roger Millsap and Les Hayduk (between Nov 2003 and Apr 2004)
848 L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850

clearly demonstrated that the size of the covariance residuals (the degree of ill-fit) and the degree
of model causal mis-specification may or may not correspond. That is, in some instances the de-
gree of causal mis-specification does correspond to the size of the covariance residuals, but in
other instances, it does not. And it is even possible for the degree of causal mis-specification in
a structural equation model to be entirely uncorrelated with the degree of covariance ill-fit dis-
played by that model.
This preceding statement is an extreme version of the well-known fact that adding a modifi-
cation-index-recommended coefficient can improve model fit even if the coefficient itself consti-
tutes a mis-specification, and even if the model the coefficient is being added to is seriously
causally-misspecified. Barrett assumes incorrectly that the size of model mis-specification can
be trusted to shrink as the size of covariance ill-fit shrinks. The statistical mistake is to think
that the seriousness of causal mis-specifications can be trusted to decline as residual covariances
decline. That is simply an unreliable statistical claim. For an example of where small covariance
residuals arose from substantial model mis-specifications, see Hayduk et al. (2005). Another way
to see this point is to consider that a seriously wrong model can be bludgeoned into fitting by
blindly following modification indices. This has received SEMNET discussion (June 1, 2006) in
the context of why ‘‘replication’’ is deficient as a way to verify the properness of structural equa-
tion models.

6.5. Barrett Section 5

Barrett’s next major misstep appears in Section 5a, where a model is supposed to attain ‘‘empir-
ical adequacy’’ or to be ‘‘‘good enough’ for practical purposes’’ (Barrett, 2007, p. 822) despite
being significantly inconsistent with the data – presumably highly statistically significantly incon-
sistent with the data ‘‘given the huge sample size’’. The contradiction between ‘‘empirical ade-
quacy’’ and ‘‘conflicting with the empirical data’’ is obvious – adequacy despite the data!
Barrett’s resorting to predictive accuracy (be it R2 or covariance fit) directly conflicts with
SEM as a means of investigating the mechanisms of causal action. SEM is typically not oriented
toward predicting some variance or covariance outcome – it seeks to represent and investigate the
theorized mechanisms via which various outcomes (plural) arise. Focusing on predictive accuracy
pretends that the goal of SEM is to predict something, as opposed to understanding the causal
mechanisms connecting some things.
Treating close as ‘‘‘good enough’ for practical purposes’’ (Barrett, 2007, p. 822) is statistical
risk-taking behavior, particularly when models have real-life consequences. Since even fitting
models can be seriously wrongly specified and hence potentially harmful if implemented in prac-
tice, the warning should be clear: the first indication of significant ill-fit might constitute the first
indication of huge problems. Overlooking indications of potentially huge problems are the kinds
of things lawyers will gladly describe as malfeasance, dereliction of responsibility, or absence of
due diligence, if harm results. Since a small signal (minor covariance ill fit) can originate from ma-
jor model mis-specification, discounting the signal without paying careful diagnostic attention,
and issuing a warning to all concerned is needless risk-taking. If harm results from implementing
a wrong model in ‘‘practice’’ this may result in the SEM researcher becoming the defendant. Close
enough for practical implementation purposes is not a phrase to be employed lightly in the con-
text of SEM. Close but significant ill-fit in SEM-speak, translates as ‘‘close to being sued’’ in legal-
L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850 849

speak. By ‘‘choosing to ignore its [v2’s] result’’ (Barrett, 2007, p. 822), the researcher becomes
culpable.
Barrett suggests (2007, p. 822) that there is some way to examine the residual matrix and to be
able to claim that examination ‘‘of the residual matrix . . . might lead them to conclude that the
[v2] test result is misleading’’. We notice that Barrett failed to indicate what a researcher can look
at in the covariance residuals as indicating the test is misleading. We know of no reference that
supports this, and we hear this as inviting blatant author-bias toward disregarding the v2 test.
And here is another invitation to bias. Barrett (2007, p. 822) says in the context of a researcher
‘‘choosing to ignore its [v2’s] result’’ that the test is misleading because ‘‘it appears to have little
consequence in terms of the distribution, location, and/or size of residual discrepancy’’. Even
SEM-novices will appreciate that a single statistical value like a model v2 can not report on var-
ious distributions or locations of residuals! How could a distribution of residuals, the placement
of multiple potential residuals, or patterns in the magnitude of residuals be reported by a single
numerical value? They cannot. Rather than biasedly pretending this constitutes a rationale for dis-
regarding v2, Barrett should have required that a significant model v2 demands diagnostic exam-
ination of the distribution, location, and size of the residual discrepancies, and possibly even
specific model features as the sources of the distributed ill-fit.
Barrett claims (2007, p. 823) to have ‘‘provided the logic of how to approach assessing model
acceptability if a sample size is huge, or where the assumptions of the chi-square test do not hold
in the sample.’’ He simply has not done this. He did not mention the normality-adjusted versions of
v2 (Satorra & Bentler, 1994) as a way to address concern for non-normality. What ‘‘logic’’ did he
provide beyond blatantly, and wrongly, asserting that you might claim the model is good anyway –
despite the evidence? What SEM logic is involved in turning away from SEM to seek external cri-
terion variables (‘‘An alternative strategy might be to include some criterion variables external to
the SEM analysis’’ (Barrett, 2007, p. 822)) if your question is about mechanisms of causal action?
This does not provide any SEM logic. It implicitly dismisses SEM and displaces what SEM testing
is capable of, by distractingly pointing the SEM researcher in non-SEM directions.
Barrett mentions parsimony at several points – often in ways that imply parsimony is an excuse
for permitting or disregarding significant model ill-fit. Parsimony is attained by adding model/the-
oretical constraints. These constraints are appropriately acknowledged in v2’s degrees of freedom,
but if the constraints are problematic the model will (we hope) tend to fail. Once we have evidence
of failure to fit, we have evidence that the parsimony may have been ill-gotten. Model parsimony
does not provide an excuse for overlooking ill fit. The indicated degree of parsimony is confronted
and questioned by failure of the model to fit.
We will end on a point of near-agreement with Barrett. It seems obvious to us that when
researchers pay careful attention to model testing, there is less need for fit indices. Barrett says:
‘‘In fact, I would now recommend banning ALL such indices from ever appearing in any paper
as indicative of model ‘acceptability’ or ‘degree of misfit’. Model fit can no longer be claimed
via recourse to some published ‘threshold-level recommendation’. There are no recommendations
anymore which stand serious scrutiny.’’ (Barrett, 2007, p. 821, emphasis in original). The problem
is not merely one of locating an arbitrary cut-point for degree of close fit; the fundamental prob-
lem is that even tiny covariance residuals, and miniscule ill fit, can be the ONLY detectable sign of
severe structural equation model specification problems. Hence the emphasis turns from ‘‘index-
ing’’ to ‘‘testing,’’ and diagnostically investigating whatever significant ill fit the test locates.
850 L. Hayduk et al. / Personality and Individual Differences 42 (2007) 841–850

7. Conclusion

We conclude that Barrett’s article is too statistically ill-founded to constitute reasonable advice
on structural equation model testing. Barrett seems to think he has adopted a ‘‘hard line’’. We
think he is still circumnavigating, and not addressing, the hard-point of structural equation model
testing. Barrett has provided sporadically strong assertions, but these do not nullify his culpability
for making other deficient statistical recommendations. The occasional strong assertion in favor
of testing ought not deflect the attentive reader from the remaining serious flaws in Barrett’s ap-
proach. Researchers unwilling to acknowledge the failings of their models are unlikely to do the
detailed diagnostic investigations, and novel thinking, capable of providing advancement via
structural equation modeling.
We recommend that all journal reviewers insist that authors of research articles involving struc-
tural equation models report v2, its degrees of freedom, and p-value, and that the authors also
report the implications of the diagnostics undertaken to investigate any significant model ill-fit.

Acknowledgement

The authors thank Dionne Pohler for participating in discussions of this paper.

References

Barrett, P. (2007). Structural equation modelling: adjudging model fit. Personality and Individual Differences, 42(5),
815–824. doi:10.1016/j.paid.2006.09.018.
Bollen, K. A. (1990). Overall fit in covariance structure models: two types of sample size effects. Psychological Bulletin,
107, 256–259.
Browne, M. W., MacCallum, R. C., Kim, C. T., Andersen, B. L., & Glaser, R. (2002). When fit indices and residuals are
incompatible. Psychological Methods, 7(4), 403–421.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic
approach (2nd ed.). New York: Springer.
Cummings, G. G., Hayduk, L., & Estabrooks, C. (2006). Is the nursing work index measuring up? Moving beyond
estimating reliability to testing validity. Nursing Research, 55, 82–93.
Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press.
Hayduk, L. A. (1985). Personal space: the conceptual and measurement implications of structural equation models.
Canadian Journal of Behavioural Science, 17(2), 140–149.
Hayduk, L. A. (1996). LISREL issues, debates and strategies. Baltimore: Johns Hopkins University Press.
Hayduk, L. A., & Glaser, D. N. (2000a). Jiving the four-step, waltzing around factor analysis, and other serious fun.
Structural Equation Modeling, 7(1), 1–35.
Hayduk, L. A., & Glaser, D. N. (2000b). Doing the four-step, right-2–3, wrong-2–3: a brief reply to Mulaik and
Millsap; Bollen; Bentler; and Herting and Costner. Structural Equation Modeling, 7(1), 111–123.
Hayduk, L. A., Pazderka-Robinson, H., Cummings, G. G., Levers, M. J., & Beres, M. A. (2005). Structural equation
model testing and the quality of natural killer cell activity measurements. BMC Medical Research Methodology, 5(1),
1–9.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis.
In A. Von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419).
Newbury Park, CA: Sage.
Yuan, K.-H., Bentler, P. M., & Zhang, W. (2005). The effect of skewness and kurtosis on mean and covariance structure
analysis: the univariate case and its multivariate implication. Sociological Methods and Research, 34(2), 240–258.

A Top Ten List of Measurement-Related Erros: Alan D. Mead Illinois Institute of Technology
No ratings yet
A Top Ten List of Measurement-Related Erros: Alan D. Mead Illinois Institute of Technology
15 pages
Patrick Meyer Reliability Understanding Statistics 2010
100% (2)
Patrick Meyer Reliability Understanding Statistics 2010
160 pages
Resa
No ratings yet
Resa
168 pages
Amine Unit
100% (1)
Amine Unit
69 pages
50 KLD STP Boq
No ratings yet
50 KLD STP Boq
104 pages
Court Order
100% (1)
Court Order
17 pages
Personality
No ratings yet
Personality
17 pages
Vander Ven 2000
No ratings yet
Vander Ven 2000
20 pages
Personality Psychology As Science: Research Methods: Sixth Edition
No ratings yet
Personality Psychology As Science: Research Methods: Sixth Edition
44 pages
Wade PPT 02
No ratings yet
Wade PPT 02
61 pages
PSYCHOMETRICS
No ratings yet
PSYCHOMETRICS
88 pages
What Do Data Really Mean?: Research Findings, Meta-Analysis, and Cumulative Knowledge in Psychology
No ratings yet
What Do Data Really Mean?: Research Findings, Meta-Analysis, and Cumulative Knowledge in Psychology
9 pages
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
No ratings yet
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
3 pages
Should Researchers Use Single Indicators Best Indi PDF
No ratings yet
Should Researchers Use Single Indicators Best Indi PDF
17 pages
Sem - Sample.notes
No ratings yet
Sem - Sample.notes
46 pages
Chapter 5 Research Methods
No ratings yet
Chapter 5 Research Methods
181 pages
An Introduction To Factor Analysis: Philip Hyland
No ratings yet
An Introduction To Factor Analysis: Philip Hyland
34 pages
10 1002@per 2265
No ratings yet
10 1002@per 2265
14 pages
SEM Concepts
No ratings yet
SEM Concepts
30 pages
Improving Psychological Science Through Transparency and Openness An Overview - Hales Et Al. - 2019
No ratings yet
Improving Psychological Science Through Transparency and Openness An Overview - Hales Et Al. - 2019
19 pages
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
No ratings yet
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
64 pages
Lesson 3 Introducing Psychological Research
No ratings yet
Lesson 3 Introducing Psychological Research
11 pages
Discussion Piece: The Psychometric Principles of Assessment Professor John Rust Psychometrics Centre
No ratings yet
Discussion Piece: The Psychometric Principles of Assessment Professor John Rust Psychometrics Centre
5 pages
Responding To Low Coefficient Alpha: Potential Alternatives To The File Drawer
No ratings yet
Responding To Low Coefficient Alpha: Potential Alternatives To The File Drawer
25 pages
Chapter 5 PDF
No ratings yet
Chapter 5 PDF
181 pages
Cross-Validation A Method Every Psychologist Shoul
No ratings yet
Cross-Validation A Method Every Psychologist Shoul
16 pages
Cyp 2
No ratings yet
Cyp 2
2 pages
An Introduction To Factor Analysis: Philip Hyland
No ratings yet
An Introduction To Factor Analysis: Philip Hyland
34 pages
John Soto 2007
No ratings yet
John Soto 2007
35 pages
BRM Module 2
No ratings yet
BRM Module 2
9 pages
Weston& Gore Jr. 2006 A Brief Guide To Structural Equation Modelingl
No ratings yet
Weston& Gore Jr. 2006 A Brief Guide To Structural Equation Modelingl
34 pages
Conceptual Review
No ratings yet
Conceptual Review
21 pages
Are The Signs of Factor Loadings Arbitrary in Confirmatory Factor Analysis Problems and Solutions
No ratings yet
Are The Signs of Factor Loadings Arbitrary in Confirmatory Factor Analysis Problems and Solutions
35 pages
11 Structural Education Modeling
No ratings yet
11 Structural Education Modeling
27 pages
Rogers Trait Theory
No ratings yet
Rogers Trait Theory
11 pages
C7 Neuropsychology - Interpreting Psychometric Data
No ratings yet
C7 Neuropsychology - Interpreting Psychometric Data
20 pages
Module 04 Formulating Research Problems
No ratings yet
Module 04 Formulating Research Problems
27 pages
Lecture 2 Reseach Measurement 2020
No ratings yet
Lecture 2 Reseach Measurement 2020
28 pages
Econ2330 Ch09
No ratings yet
Econ2330 Ch09
65 pages
Schermelleh Moosbrugger Mueller ModelFit MPR 2003
No ratings yet
Schermelleh Moosbrugger Mueller ModelFit MPR 2003
53 pages
Reliability and Validity
No ratings yet
Reliability and Validity
29 pages
The Counseling Psychologist
No ratings yet
The Counseling Psychologist
34 pages
Assignment On Management Planning
No ratings yet
Assignment On Management Planning
21 pages
Scheel Et Al 2020 Why Hypothesis Testers Should Spend Less Time Testing Hypotheses
No ratings yet
Scheel Et Al 2020 Why Hypothesis Testers Should Spend Less Time Testing Hypotheses
12 pages
SOC 10e CH 2 OUTLINE
No ratings yet
SOC 10e CH 2 OUTLINE
5 pages
Dorothea Dax - Cohen
No ratings yet
Dorothea Dax - Cohen
31 pages
McDonaldMoon Ho2002 PDF
No ratings yet
McDonaldMoon Ho2002 PDF
19 pages
A Brief Guide To Structural Equation Modeling: The Counseling Psychologist September 2006
No ratings yet
A Brief Guide To Structural Equation Modeling: The Counseling Psychologist September 2006
34 pages
A Guide To Structural Equation Modeling PDF
No ratings yet
A Guide To Structural Equation Modeling PDF
33 pages
PSY417 Week10
No ratings yet
PSY417 Week10
58 pages
Best Practice Recommendations For Using Structural Equation Modelling in Psychological Research
No ratings yet
Best Practice Recommendations For Using Structural Equation Modelling in Psychological Research
16 pages
McDonald 2002 Principles and Practice in Reporting Structural Equation Analyses
No ratings yet
McDonald 2002 Principles and Practice in Reporting Structural Equation Analyses
19 pages
Measurement Concepts & Interpretation
No ratings yet
Measurement Concepts & Interpretation
21 pages
Assumption 1
No ratings yet
Assumption 1
4 pages
PSCYHOLOGICAL ASSESSMENT First Activity
No ratings yet
PSCYHOLOGICAL ASSESSMENT First Activity
6 pages
Health Measurement Scales
No ratings yet
Health Measurement Scales
6 pages
The Research Process
No ratings yet
The Research Process
48 pages
Psych 162
No ratings yet
Psych 162
14 pages
Prediction and Explanation in Social Systems
No ratings yet
Prediction and Explanation in Social Systems
4 pages
An Introduction To Structural Equation Modeling
No ratings yet
An Introduction To Structural Equation Modeling
29 pages
The Counseling Psychologist: The Use of Structural Equation Modeling in Counseling Psychology Research
No ratings yet
The Counseling Psychologist: The Use of Structural Equation Modeling in Counseling Psychology Research
31 pages
APSS5065 - Week 5 Notes
No ratings yet
APSS5065 - Week 5 Notes
10 pages
Statistical and Methodological Myths and Urban Legends Doctrine, Verity and Fable in Organizational and Social Sciences, 1st Edition
No ratings yet
Statistical and Methodological Myths and Urban Legends Doctrine, Verity and Fable in Organizational and Social Sciences, 1st Edition
14 pages
Customer Loyalty Toward An Integrated Conceptual Framework
No ratings yet
Customer Loyalty Toward An Integrated Conceptual Framework
15 pages
Psych Assessment Chapter 4
No ratings yet
Psych Assessment Chapter 4
32 pages
Reviewer For Psych Assessment
No ratings yet
Reviewer For Psych Assessment
5 pages
A Multiple Item Scale For Measuring Customer Loyalty Development
No ratings yet
A Multiple Item Scale For Measuring Customer Loyalty Development
12 pages
Mercedes-Benz: Faculty of Political Science
No ratings yet
Mercedes-Benz: Faculty of Political Science
7 pages
Master Thesis Vu Amsterdam
100% (2)
Master Thesis Vu Amsterdam
8 pages
Weston, Gore JR., 2006
No ratings yet
Weston, Gore JR., 2006
34 pages
Lesson Plan - Metal Work
50% (2)
Lesson Plan - Metal Work
6 pages
Lesson 4 (Computer Maintenance)
No ratings yet
Lesson 4 (Computer Maintenance)
4 pages
Occult Herbmaster - Theras
No ratings yet
Occult Herbmaster - Theras
1 page
Teen Smart Prep 2 2020
No ratings yet
Teen Smart Prep 2 2020
151 pages
Customer Inquiry Report-9
No ratings yet
Customer Inquiry Report-9
7 pages
14S Operator Manual
100% (1)
14S Operator Manual
106 pages
Sustainable Architecture Wiki
No ratings yet
Sustainable Architecture Wiki
9 pages
Keralauniversity of Fisheries & Ocean Studies: Panangad P.O., Kochi 682 506, Kerala, India
No ratings yet
Keralauniversity of Fisheries & Ocean Studies: Panangad P.O., Kochi 682 506, Kerala, India
13 pages
Ethical Viewpoint of Islam
No ratings yet
Ethical Viewpoint of Islam
55 pages
Book Report Choice Board 1
No ratings yet
Book Report Choice Board 1
1 page
Figure of Speech
No ratings yet
Figure of Speech
4 pages
QIG Quick Installation Guide DCU 305 R3
No ratings yet
QIG Quick Installation Guide DCU 305 R3
2 pages
Anthropology 14th Edition Carol R Ember HQ File Fast Access
No ratings yet
Anthropology 14th Edition Carol R Ember HQ File Fast Access
312 pages
CH 01
No ratings yet
CH 01
26 pages
Lab02 DataTypes PDF
No ratings yet
Lab02 DataTypes PDF
5 pages
Linking The Hierarchical Service Quality Model To Customer Satisfaction and Loyalty
No ratings yet
Linking The Hierarchical Service Quality Model To Customer Satisfaction and Loyalty
9 pages
CH 02
No ratings yet
CH 02
20 pages
A Methodology To Facilitate Knowledge Sharing in The New Product
No ratings yet
A Methodology To Facilitate Knowledge Sharing in The New Product
17 pages
CH 05
No ratings yet
CH 05
16 pages
Bio Paper 5 PDF
No ratings yet
Bio Paper 5 PDF
8 pages
Semitic Alphabets
No ratings yet
Semitic Alphabets
16 pages
Porsche Case Study
No ratings yet
Porsche Case Study
4 pages
Markel and Frone 1998 Job Characteristics, Work School Conflict and School Outcomes Among Adoliscents
No ratings yet
Markel and Frone 1998 Job Characteristics, Work School Conflict and School Outcomes Among Adoliscents
11 pages
Sport
No ratings yet
Sport
1 page
What We Do - MeisterKraft
No ratings yet
What We Do - MeisterKraft
1 page
Upgrading Cimplicity 6.1 To 8.1 License Issue
No ratings yet
Upgrading Cimplicity 6.1 To 8.1 License Issue
2 pages
Task3.Ipynb - Colaboratory Dip
No ratings yet
Task3.Ipynb - Colaboratory Dip
3 pages
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
No ratings yet
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
2 pages
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet

Testing Testing One Two Three by Les Hayduk

Uploaded by

Testing Testing One Two Three by Les Hayduk

Uploaded by

Personality and Individual Diﬀerences 42 (2007) 841–850

Testing! testing! one, two, three – Testing the theory in

2. Testing structural equation models

3. Researchers must test theories!

4. Structural equation models as theoretical models

6.1. Barrett section 1

6.2. Barrett section 2

6.3. Barrett section 3

6.4. Barrett Section 4

6.5. Barrett Section 5

You might also like