Structural Equation Modeling Back To Basics
Structural Equation Modeling Back To Basics
Ralph O. Mueller
To cite this article: Ralph O. Mueller (1997) Structural equation modeling: Back to
basics, Structural Equation Modeling: A Multidisciplinary Journal, 4:4, 353-369, DOI:
10.1080/10705519709540081
TEACHER'S CORNER
During the past 25 years, structural equation modeling (SEM) has become a
powerful, mainly nonexploratory research tool for many social and behavioral
scientists. The initial development during the 1970s and continuous improve-
Requests for reprints should be sent to Ralph O. Mueller, Department of Educational Leadership,
Graduate School of Education and Human Development, The Geoige Washington University, 2134 G
Street, NW, Washington, DC 20052. E-mail: [email protected]
354 MUELLER
"The study of structural equation models can be divided into two parts: the easy part
and the hard part" (Duncan, 1975, p. 149). "The easy part is mathematical. The hard
part is constructing causal models that are consistent with sound theory. In short,
causal models are no better than the ideas that go into them" (Wolfle, 1985, p. 385).
Here, Duncan and Wolfle pointed to the fundamental truth in SEM: No matter how
technically sophisticated the employed statistical techniques, SEM analyses can
only be beneficial to the researcher if a strong substantive theory underlies the
initially hypothesized model(s). Based on correlational data, the statistical methods
cannot, for example, establish or prove causal relations between variables. At most,
they can help in identifying some empirical evidence to either reject or retain
hypothesized causal theories and/or assess the strengths and directions of certain a
priori hypothesized causal or structural relations within the context of a specific
model. More specifically, SEM users should realize and remember that:
1
Parts of this section are based on comments made elsewhere (Mueller, 1996, pp. xii-xiii).
356 MUELLER
• Models that were not carefully constructed from knowledge of the underlying
substantive theories will only lead to empty interpretations, adding very little to our
understanding of these theories. For example, I (and maybe you) have come across
studies that employed both exploratory and confirmatory factor analyses using the
exact same data. The authors justified "their" choice of the confirmatory model
with the results from the exploratory analysis and were excited about the excellent
model fit. (Obviously, here a good fit between data and model is not surprising or
even noteworthy because the data—not a theory—were used to generate the model
from the exploratory analysis.)
• Results from a SEM analysis can be interpreted validly only within the context
of the analyzed model. That is, the analysis of the causal/structural relations in one
model says nothing of the character of these relations in a structurally different,
modified, or competing model.
• There are many (actually, infinitely many) alternative structures that can yield
identical data-model fit results. Thus, the hunt for the model is, indeed, a fruitless
one. Instead, a carefully constructed model—preferably a set of equally theoreti-
cally plausible alternative models—based on the researcher's in-depth under-
standing of the substantive area and the constructs being modeled is needed before
SEM can assist in understanding the phenomenon being investigated.
during a regression analysis? In my view, too few attempts are being made to clarify
for the practicing structural equation modeler the possible role(s) causality can and
should play in conceptualizing and interpreting SEM analyses (notable exceptions
are Bullock, Harlow, & Mulaik, 1994;Mulaik, 1987,1993; Mulaik& James, 1995;
and the various authors in Shaffer, 1992). Especially if we believe that causal
explanation is the "ultimate" aim of science (Shaffer, 1992, p. x; Mulaik, 1993),
"the [SEM] framework should help, rather than hinder, clear thinking about causal
mechanisms that we think lie behind the correlations in the data" (Rothenberg,
1992, p. 99). Thus, it seems imperative that we come to an awareness of the
ramifications of the various positions that have been taken on the role of causality
in SEM. Such fundamental considerations as the few listed following should serve
as a reminder that the various approaches to causality can greatly impact the
conceptualization and interpretation of structural equation models (for more
detail, you might start with Shaffer, 1992; and the various cited contributions by
Mulaik.
A serious and sometimes ignored issue for correct parameter estimation in structural
equation models is model identification. The question here is whether or not
sufficient variance and covariance information from the observed variables is
available to uniquely estimate the unknown coefficients. Note that this issue is not
one of sample size but mostly one of the ratio of the number of variables in the
model to the number of unknown parameters. Also, even if the model is theoreti-
358 MUELLER
Once the researcher has conceptualized a model and checked its theoretical identi-
fication (and hopefully no data anomalies lead to empirical underidentification),
SEM software seems to make the estimation step in the modeling process the easiest
2
Usually, this assumption is evaluated with the "badness-of-fit" chi-square test (df= p - c); see, for
example, Bollen (1989, pp. 263-269) or Mueller (1996, pp. 82-84).
TEACHER'S CORNER 359
and least worrisome of all: Even though a variety of estimation procedures are
available—their appropriateness depending on the viability of distributional and
structural assumptions about the data under investigation—most researchers em-
ploy the maximum likelihood (ML) method, probably because it is the default in
most, if not all, currently available SEM software packages. Alternatives such as
the generalized least squares (GLS) or asymptotically distribution free (ADF)
methods are rarely considered by the nonexpert because much confusion and
as-of-yet insufficient evidence exists regarding the advantages of one such method
over the others. For example, not many new SEM users might be aware of (a) the
asymptotic equivalence and dependence on the multivariate normality assumption
of ML and GLS, (b) the large sample requirement and inconclusive evidence of the
benefits of using the ADF method, or (c) the multitude of still unanswered questions
regarding the behavior of the various estimation methods when analyzing data from
nominal or ordinal variables.3 In my opinion, an appropriate method should be
chosen deliberately, not by default. At a minimum, users should remember that:
3
For an introduction to parameter estimation with nonnormal data, see West, Finch, and Curran
(1995).
360 MUELLER
• If the structural and distributional assumptions are met (but are they ever?),
ML provides asymptotically (large sample) unbiased,4 consistent,5 and efficient6
parameter estimates and standard errors. Furthermore, the ML-based large sample
chi-square statistic7 is appropriate for testing an overidentifying restriction and
assessing data-model consistency. As the degree of violation of the normality
assumption increases, however, confidence in the validity of obtained results
decreases (e.g., ML estimates become less efficient and the chi-square statistic more
inflated, leading to an increase in the Type I error rate for model rejection).
If the structural assumption is met and sample size is sufficiently large, ADF
estimates and standard errors are asymptotically consistent, efficient, and largely
independent of the observed data distributions. For the purpose of data-model fit
assessment, an appropriate large sample chi-square statistic again is available. For
small to moderate samples, however, conclusive evidence of the behavior of ADF
estimates is still unavailable (for current reviews, see Bentler & Dudgeon, 1996;
and Curran, West, & Finch, 1996).
Once parameter estimation is complete, SEM users often brace themselves with
great anxiety when examining the various available fit indices.8 Sometimes, this
inspection translates—at least initially—into either the biggest thrill ("Yes! My
model fits! The hypothesized theory is confirmed!") or disappointment ("Oh no!
The model does notfit!I better change it until it does !"). These reactions are fueled,
respectively, by the still somewhat popular beliefs that (a) finding a well-fitting
model is equivalent to discovering and/or confirming the underlying theory that is
consistent with reality and (b) modifying and reestimating an initially ill-fitting
model eventually will lead to the right structure. Following, I try to dispel both
myths regarding fit assessment and model modification during SEM analyses.
4
An estimate is asymptotically unbiased if its expected value (mean value after repeated large-n
sampling) is equal to the parameter it is estimating.
5
An estimate is asymptotically consistent if it converges in probability, as n increases, to the
parameter it is estimating.
6
An estimate is asymptotically efficient if it has a smaller asymptotic variance than other consistent
estimators.
7
Under the distributional and structural assumptions, the product of (n - 1) and the minimum value
of the ML fitting function is asymptotically distributed as a chi-square distribution with df=p-c.
8
For example, choices include an appropriate chi-square statistic; Jöreskog and Sörbom's (1981)
unadjusted or adjusted goodness-of-fit indices; incremental fit indices such as Bentler and Bonnett's (1980)
nonnormed and normed fit indices or Bentler's (1990) nonnormed and normed comparative fit indices;
the parsimony fit indices suggested by James, Mulaik, and Brett (1982) and Mulaik et al. (1989; the Akaike
(1987) information criterion (AIC); or Steiger's (1990) root mean square error of approximation, to name
just a few. For comprehensive reviews consult Marsh, Baila, and McDonald (1988) or Tanaka (1993).
TEACHER'S CORNER 361
The interpretive weight that is placed on the many alternative fit indices of recent
years (see footnote 8) seems to depend to a large extent on the purpose of the SEM
analysis: Assuming that a model was conceptualized and hypothesized to capture
as accurately as possible some slice of reality—by carefully balancing the principle
of parsimony with the complexity of the social science phenomenon being stud-
ied—the analysis' purpose can be clarified by reflecting on the relative importance
of two key questions:
• How and to what degree do certain variables or factors affect each other?
• Why should certain theories be retained as plausible reflections of reality?
The researcher's focus on the first query suggests & predictive purpose, whereas an
emphasis on the second question points toward an explanatory aim. Now, if the
purpose of a particular analysis is mainly prediction, the interpretation of overall fit
indices might be secondary to the interpretation of the estimated strengths and
directions of the structural paths. Here, assessment of the "fit" or match between data
and model may be focused more on questions dealing with individual parameter
estimates: Do coefficients and their standard errors have theory contradicting signs
or magnitudes? Are any of the variance estimates negative? Are coefficients of
determination (R2) negative, close to zero, or greater than unity, and so forth?
If, on the other hand, the investigator seeks information on the tenability of a
theory, that is, the primary purpose of the analysis is explanation, then the evaluation
of overall fit indices might become the primary—but clearly not the only—concern
(note the parallel between the previous arguments and the relative emphasis on the
interpretation of regression weights and coefficients of determination in traditional
regression analyses). Here, the implicit question often is whether or not the
hypothesized model or theory is consistent with reality ("Is this the model?"). An
unfortunate truth, however, is that empirical fit indices cannot confirm the model-
reality link, only address the data-model consistency question. Figure 1 illustrates
this major nonstatistical limitation9 of the available overall measures of fit. The
relation between Sets A, B, and C shown in Figure 1 is not the only one possible;
it is used here to clarify the following points:
• Evidence of data-model consistency does not necessarily imply that the chosen
model represents a valid, albeit simplified, reflection of reality.
• Data-model inconsistency reflects a mismatch between the proposed model
and reality (unless, of course, a Type 1 error was comitted).
• Fit assessment mainly is a disconfirmatory activity: A model or theory could be
disconfirmed after concluding that observed data do not fit the hypothesized structure.
But a model cannot be confirmed as being the best (or even, a good) approximation
9
For a review of statistical issues concerning various fit indices, see, for example, Tanaka (1993).
362 MUELLER
SETB
(models consistent with data)
SETC
(models consistent with reality)
to reality after discovering that the model happens to fit the observed data, possibly
by chance alone.
• All that can be expected from "good" fit values is some indication that the
model, as specified, could be a viable representation of the true relations between
the included variables.
• A semantic change from interpreting measures of "model fit" to interpreting
indices of "data fit" might be helpful to those researchers who wonder whether
"various models fit the data" or whether "the collected data fit a particular model."
Although the former query can lead the investigator toward an exploratory search
for any model that (by chance?) fits a particular data set, the latter question might
direct the researcher more toward a disconfirmatory assessment of whether there
is evidence to conclude that the collected data do not fit the a priori specified
model(s). That is, to some investigators, assessing the "data fit" might convey the
largely disconfirmatory idea of judging the consistency between collected data and
an a priori theoretically conceptualized model better than attempting to improve
the "model fit," which might be interpreted as a more exploratory approach to
identifying models that happen to fit a particular data set.
If any data-model inconsistencies are identified, the researcher has several options
on how to proceed, depending on where on the deduction-induction continuum the
main goal of the just conducted analysis is placed: (a) model rejection, (b) model
comparison, or (c) model generation (adapted from Jöreskog, 1993). In the first
instance, the aim is a mainly deductive and disconfirmatory one: The model or
theory might be rejected based on poor data-model fit and deemed an invalid
TEACHER'S CORNER 363
10
For introductions to these and other modification statistics and strategies, consult any structural
equation modeling text, for example, Bollen and Long (1993), Hoyle (1995), or Mueller (1996).
364 MUELLER
also increases the power of the chi-square-based tests used to evaluate data-model
fit and, hence, increases the chance of committing a Type I error, that is, flat-out
rejecting a model even though it could serve as a "good" approximation to reality.
• Model Comparison: An interpretation of results from a selected model that
favorably compares with other competing structures cannot be made outside the
context of such alternative models. That is, favorable results are relative, not
absolute. Statistical model comparisons must, of course, be interpreted in light of
the usual concerns of Type I and Type II errors. Finally, note that models that are
not nested cannot be directly compared with statistical means.
• Model Generation: Here, an attempted search for the structure that best fits a
particular data set leads to a mostly exploratory view of SEM that, in my view, has the
potential of doing more harm than good. Substantially changing a null hypothesis (i.e.,
the initially hypothesized model) based on the observed data (i.e., the values of various
fit or modification indices)—and then retesting the modified hypothesis with the same
data—is an unacceptable practice in most applications of traditional statistical thinking.
Of course, a lack of resources and other circumstances might provide sufficient
justification for responsible model modification and areanalysis of the data. However,
a researcher should be aware of the possibility of capitalizing on chance when various
not carefully conceptualized models are being fitted to the same data. When a modified
model is presented as a possible—but certainly not the—representation of the true
structure, a Type II error might have been committed.
• Irrespective of which of the various post hoc model modification strategies is
chosen, cross-validating obtained results with a new and independent sample is
very desirable because it gives some protection against the capitalization on chance
and specification errors that are internal11 to the model. If the original sample is
deemed large enough, it can be split into calibration and validation subsamples and
the model's predictive validity could be judged by interpreting Cudeck and
Browne's (1983) cross-validation index (also see Browne & Cudeck, 1989, 1993,
for extensions of their work and a discussion of benefits and shortcomings of the
various approaches to cross-validation in SEM).
SUMMARY
11
Kaplan (1990) distinguished between external and internal model specification errors: The former
result from having omitted important variables from inclusion in the model, the latter occur when
important relations between variables within the model have been omitted.
12
For current statistics from psychology, see Tremblay and Gardner (1996).
TEACHER'S CORNER 365
13
For examples, just scan the many entries in the annotated bibliography by Austin and Calderón
(1996).
366 MUELLER
ACKNOWLEDGMENT
A previous version of this article was delivered as an invited address to the Special
Interest Group on Structural Equation Modeling during the meeting of the Ameri-
can Educational Research Association, New York, April 1996.
REFERENCES
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,
238-246.
Bentler, P. M. (1993). EQS: Structural equations program manual. Los Angeles: BMDP Statistical
Software.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance
structures. Psychological Bulletin, 88, 588-606.
Bentler, P. M., & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory,
directions. Annual Review of Psychology, 47, 541-570.
Bentler, P. M., & Wu, E. J. C. (1995). EQS For Windows 5.0. [Computer software]. Encino, CA:
Multivariate Software.
Blalock, H. M. (1964). Causal inferences in nonexperimental research. Chapel Hill: University of North
Carolina Press.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Long, J. S. (Eds.). (1993). Testing structural equation models. Newbury Park, CA:
Sage.
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance
structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures.
Multivariate Behavioral Research, 24, 445-455.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.
Bullock, H. E., Harlow, L. L., & Mulaik, S. A. (1994). Causation issues in structural equation modeling
research. Structural Equation Modeling, 1, 253-267.
Byrne, B. M. (1994). Structural equation modeling with EQS and EQS/Windows: Basic concepts,
applications, and programming. Thousand Oaks, CA: Sage.
Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate
Behavioral Research, 18, 115-126.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.
Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behav-
ioral Research, 18, 147-167.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and
specification error in confirmatory factor analysis. Psychological Methods, 7(1), 16-29.
Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic.
Duncan, O. D., Haller, A. O., & Portes, A. (1968). Peer influence on aspiration: A reinterpretation.
American Journal of Sociology, 74, 119-134.
Fan, X. (1996). An SAS Program for assessing multivariate normality. Educational and Psychological
Measurement, 56, 668-674.
Freedman, D. A. (1987). As others see us: A case study in path analysis. Journal of Educational
Statistics, 12(2), 101-128.
Glymour, C , Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure: Artificial
intelligence, philosophy of science, and statistical modeling. Orlando, FL: Academic.
Hayduk, L. A. (1987). Structural equation modeling with LISREL Baltimore: Johns Hopkins University
Press.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association,
81, 945-960.
Hoyle, R. H. (Ed.). (1995). Structural equation modeling: Concepts, issues, and applications. Thousand
Oaks, CA: Sage.
James, L. R., Mulaik, S. A., & Brett, J. (1982). Causal analysis: Models, assumptions, and data. Beverly
Hills: Sage.
368 MUELLER
Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 294-316). Newbury Park, CA: Sage.
Jöreskog, K. G., & Sörbom, D. (1981). Analysis of linear structural relationships by maximum likelihood
and least squares methods (Research Report 81-8). Uppsala, Sweden: University of Uppsala.
Jöreskog, K. G., & Sörbom, D. (1995). LISREL 8 with PRELIS 2 For Windows. [Computer software].
Chicago: Scientific Software International.
Kaplan, D. (1990). Evaluating and modifying covariance structure models: A review and recommen-
dation. Multivariate Behavioral Research, 25, 137-155.
Kenny, D. A. (1979). Correlation and causality. New York: Wiley.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psycho-
logical Methods, 56, 746-759.
Ling, R. (1983). Review of "Correlation and Causality" by David Kenny. Journal of the American
Statistical Association, 77, 489-491.
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of
sample size for covariance structure modeling. Psychological Methods, 1, 130-149.
Marcoulides, G. A., & Schumacker, R. E. (Eds.). (1996). Advanced structural equation modeling: Issues
and techniques. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Marini, M. M., & Singer, B. (1988). Causality in the social sciences. In C. C. Clogg (Ed.), Sociological
methodology: Vol. 18 (pp. 347-409). Washington, DC: American Sociological Association.
Marsh, H. W., Baila, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor
analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.
Mueller, R. O. (1996). Basic principles of structural equation modeling: An introduction to LISREL
and EQS. New York: Springer-Verlag.
Mulaik, S. A. (1987). Toward a conception of causality applicable to experimentation and causal
modeling. Child Development, 58, 18-32.
Mulaik, S. A. (1993). Objectivity and multivariate statistics. Multivariate Behavioral Research, 28,
171-203.
Mulaik, S. A., & James, L. R. (1995). Objectivity and reasoning in science and structural equation
modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications
(pp. 118-137). Thousand Oaks, CA: Sage.
Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation
of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445.
Muthén, B. O. (1992). Response to Freedman's critique of path analysis: Improve credibility by better
methodological training. In J. P. Shaffer (Ed.), The role of models in nonexperimental social science:
Two debates (pp. 80-86). Washington, DC: American Educational Research Association.
Rothenberg, T. J. (1992). Comments on Freedman's paper. In J. P. Shaffer (Ed.), The role of models in
nonexperimental social science: Two debates (pp. 98-99). Washington, DC: American Educational
Research Association.
Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implica-
tions for the training of researchers. Psychological Methods, 1, 115-129.
Schumacker, R. E., & Lomax, R. G. (1996). A beginner's guide to structural equation modeling.
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Shaffer, J. P. (Ed.). (1992). The role of models in nonexperimental social science: Two debates.
Washington, DC: American Educational Research Association.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach.
Multivariate Behavioral Research, 25, 173-180.
Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. A. Bollen &
J. S. Long (Eds.), Testing structural equation models (pp. 10-39). Newbury Park, CA: Sage.
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three sug-
gested reforms. Educational Researcher, 25(2), 26-30.
TEACHER'S CORNER 369
Thompson, B. (Ed.). (1993). Statistical significance testing in contemporary practice [Special issue].
The Journal of Experimental Education, 61(4).
Thompson, B. (1990). MULTINOR: A FORTRAN program that assists in evaluating multivariate
normality. Educational and Psychological Measurement, 50, 845-848.
Tremblay, P. F., & Gardner, R. C. (1996). On the growth of structural equation modeling in psycho-
logical journals. Structural Equation Modeling, 3, 93-104.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables.
In R. H. Hoyle (Ed.), Structural equation modeling (pp. 56-75). Thousand Oaks, CA: Sage.
Wolfle, L. M. (1985). Applications of causal models in higher education. In J. C. Smart (Ed.), Higher
education: Handbook of theory and research (Vol. 1, pp. 381-413). New York: Agathon.