0% found this document useful (0 votes)

32 views

Structural Equation Modeling Back To Basics

Uploaded by

Assaye

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Structural Equation Modeling Back To Basics

Uploaded by

Assaye

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Structural Equation Modeling: A Multidisciplinary Journal

ISSN: 1070-5511 (Print) 1532-8007 (Online) Journal homepage: https://www.tandfonline.com/loi/hsem20

Structural equation modeling: Back to basics

Ralph O. Mueller

To cite this article: Ralph O. Mueller (1997) Structural equation modeling: Back to
basics, Structural Equation Modeling: A Multidisciplinary Journal, 4:4, 353-369, DOI:
10.1080/10705519709540081

To link to this article: https://doi.org/10.1080/10705519709540081

Published online: 03 Nov 2009.

Submit your article to this journal

Article views: 1379

View related articles

Citing articles: 39 View citing articles

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=hsem20
STRUCTURAL EQUATION MODELING, 4(4), 353-369
Copyright © 1997, Lawrence Erlbaum Associates, Inc.

TEACHER'S CORNER

Structural Equation Modeling:

Back to Basics
Ralph O. Mueller
Department of Educational Leadership
Graduate School of Education and Human Development
The George Washington University

Major technological advances incorporated into structural equation modeling (SEM)

computer programs now make it possible for practitioners who are basically unfamil-
iar with the purposes and limitations of SEM to use this tool within their research
contexts. The current move by program developers to market more user friendly
software packages is a welcomed trend in the social and behavioral science research
community. The quest to simplify the data analysis step in the research process
has—at least with regard to SEM—created a situation that allows practitioners to
apply SEM but forgetting, knowingly ignoring, or most dangerously, being ignorant
of some basic philosophical and statistical issues that must be addressed before sound
SEM analyses should be conducted. This article focuses on some of the almost
forgotten topics taken here from each step in the SEM process: model conceptuali-
zation, identification and parameter estimation, and data-model fit assessment and
model modification. The main objective is to raise awareness among researchers new
to SEM of a few basic but key philosophical and statistical issues. These should be
addressed before launching into any one of the new generation of SEM software
packages and being led astray by the seemingly irresistible temptation to prematurely
start "playing" with the data.

During the past 25 years, structural equation modeling (SEM) has become a
powerful, mainly nonexploratory research tool for many social and behavioral
scientists. The initial development during the 1970s and continuous improve-

Requests for reprints should be sent to Ralph O. Mueller, Department of Educational Leadership,
Graduate School of Education and Human Development, The Geoige Washington University, 2134 G
Street, NW, Washington, DC 20052. E-mail: [email protected]
354 MUELLER

ment—together with very successful marketing strategies—of major software

packages such as LISREL (Jöreskog & Sörbom, 1995), EQS (Bentler & Wu, 1995),
and, more recently, AMOS (Arbuckle, 1996), has led to an influx of SEM applica-
tions (path analyses, confirmatory factor analyses, and more general SEM utiliza-
tions) in almost every quantitatively oriented social and behavioral science research
journal. Today's user of SEM is thrilled that the days finally belong to the past
when model parameters could be estimated only after writing program code that
presumed a fairly complete understanding of several complex (matrix) equations.
For example, starting with the Microsoft Windows version of EQS 4, the
BUILD_EQS option is available, allowing model specification without writing
actual program code but, instead, by "clicking" on various options in pull-down
menus. Similarly, beginning with the MS-Windows version of LISREL 8, the user
now can use a nonmatrix-based, English-like syntax, SIMPLIS (SIMple LISrel),
which eliminates the need for specifying a particular model through explicitly
fixing and freeing elements in the basic model parameter matrices. Powerful
graphic interfaces and other advances make it possible to not only display path
diagrams on screen but also to modify these diagrams by simple "click and drag"
movements from within the diagram display and to subsequently reestimate the
now modified structure. On the one hand, software developers are to be comple-
mented for taking advantage of state-of-the-art hardware and programming knowl-
edge to simplify a mathematically and statistically complex research tool that was
previously seen as being beyond the reach of many social scientists. On the other
hand, the move toward more user-friendly software implicitly creates the danger
of individuals using the tools without clearly understanding their purposes and
limitations. Of course, the progress made in SEM software development is no more
dangerous than the increased user-friendliness seen in any other statistical software
package. In both cases, complex statistical analyses can be conducted with great
ease without necessarily having afull understanding of their unique (dis)advantages
and underlying assumptions.
In my opinion, researchers can benefit greatly from conducting SEM analyses
because SEM has the potential to help bridge the gap between the theoretical and
empirical aspects of social science research. SEM encompasses several strategies
for hypothesizing, analyzing, and interpreting relations between sets of variables
that are based on and extend the more traditional concepts of correlation and
regression. For example, covariances between observed and/or unobserved (latent)
variables can be decomposed into structural and nonstructural components; meas-
urement errors of observed variables can be directly incorporated into data analyses;
associations between prediction errors of outcome variables can be assessed; and
information can be obtained from an analysis of certain models on whether or not
collected data fit a particular, a priori hypothesized structure in which certain
relations are restricted to be constant (e.g., zero effects among variables). But
interpretations of SEM analyses can assist in understanding aspects of social/be-
TEACHER'S CORNER 355

havioral phenomena only if a set of one or more alternative models is conceptual-

ized based on a sound underlying substantive theory; representative sample data
are collected and an estimation method is deliberately chosen so that the unknown
population parameters can be estimated as correctly as possible; and the fit of those
data to the a priori hypothesized model(s) is assessed to indicate whether or not the
model should be rejected as an acceptable approximation to the true, but unknown
structure. If data-model inconsistencies are identified, the availability of theoretical
and statistical justifications will determine whether it might be appropriate to
compare and/or modify models to rectify initial specification errors. In short, I
understand SEM as a largely nonexploratory research process that is based on, and
driven by, a particular substantive theory or theories. This understanding would be
contrary to someone who believed that SEM is a mere statistical technique.

THE SOMETIMES NEGLECTED

(OR ALMOST FORGOTTEN?) BASICS

In the following, I seek to synthesize some basic considerations that should be

addressed before the benefits of a SEM analysis can be fully realized. To the active
and alert user of SEM, nothing that is written should be new or surprising. The
focus here is on those individuals who, maybe for the first time, are contemplating
the use of a SEM approach in addressing their research questions. In addition to the
works cited throughout the article, a few of the more critical contributions to the
SEM literature that I strongly recommend to the reader are Baumrind (1983), Cliff
(1983), Freedman (1987), and Ling (1983).

THEORY IN "CAUSAL" MODELING: WHO NEEDS IT?1

"The study of structural equation models can be divided into two parts: the easy part
and the hard part" (Duncan, 1975, p. 149). "The easy part is mathematical. The hard
part is constructing causal models that are consistent with sound theory. In short,
causal models are no better than the ideas that go into them" (Wolfle, 1985, p. 385).
Here, Duncan and Wolfle pointed to the fundamental truth in SEM: No matter how
technically sophisticated the employed statistical techniques, SEM analyses can
only be beneficial to the researcher if a strong substantive theory underlies the
initially hypothesized model(s). Based on correlational data, the statistical methods
cannot, for example, establish or prove causal relations between variables. At most,
they can help in identifying some empirical evidence to either reject or retain
hypothesized causal theories and/or assess the strengths and directions of certain a
priori hypothesized causal or structural relations within the context of a specific
model. More specifically, SEM users should realize and remember that:

1
Parts of this section are based on comments made elsewhere (Mueller, 1996, pp. xii-xiii).
356 MUELLER

• Models that were not carefully constructed from knowledge of the underlying
substantive theories will only lead to empty interpretations, adding very little to our
understanding of these theories. For example, I (and maybe you) have come across
studies that employed both exploratory and confirmatory factor analyses using the
exact same data. The authors justified "their" choice of the confirmatory model
with the results from the exploratory analysis and were excited about the excellent
model fit. (Obviously, here a good fit between data and model is not surprising or
even noteworthy because the data—not a theory—were used to generate the model
from the exploratory analysis.)
• Results from a SEM analysis can be interpreted validly only within the context
of the analyzed model. That is, the analysis of the causal/structural relations in one
model says nothing of the character of these relations in a structurally different,
modified, or competing model.
• There are many (actually, infinitely many) alternative structures that can yield
identical data-model fit results. Thus, the hunt for the model is, indeed, a fruitless
one. Instead, a carefully constructed model—preferably a set of equally theoreti-
cally plausible alternative models—based on the researcher's in-depth under-
standing of the substantive area and the constructs being modeled is needed before
SEM can assist in understanding the phenomenon being investigated.

One of the most controversial philosophical issues surrounding model conceptu-

alization and interpretation is the definition and role of the concept of causality
within a SEM context. Perhaps based on titles of some of the earlier treatments of
SEM (e.g., Asher's, 1983, Causal Modeling; Blalock's, 1964, Causal Inferences
in Nonexperimental Research; Glymour, Scheines, Spirtes, & Kelly's, 1987, D/j-
covering Causal Structure; or Kenny's, 1979, Correlation and Causality), the
techniques discussed here have been known to many investigators as "causal
modeling," possibly fostering the erroneous idea that here is a statistical technique
that can establish whether or not a causal relation exists based just on correlations
among the involved variables. Today, many authors and editors (e.g., Bollen, 1989;
Bollen & Long, 1993; Byrne, 1994; Hayduk, 1987; Hoyle, 1995; Marcoulides &
Schumacker, 1996; Schumacker & Lomax, 1996; or Mueller, 1996) prefer the term
structural equation modeling or covariance structure analysis and emphasize that
establishing causality from correlations alone is not what is being attempted. In
fact, some argue that "it would be very healthy if more researchers abandon thinking
of and using terms such as cause and effect. Instead they should work [within the
SEM framework] in terms of regression relations with predictors and outcomes"
(Muthén, 1992, p. 82).
However, the challenge seems to lie in the identification of the interpretative
difference between causes/effects and predictors/outcomes. After all, how many
among us do not at least think of the underlying concept of causality when
attempting to predict a dependent variable from a set of independent variables
TEACHER'S CORNER 357

during a regression analysis? In my view, too few attempts are being made to clarify
for the practicing structural equation modeler the possible role(s) causality can and
should play in conceptualizing and interpreting SEM analyses (notable exceptions
are Bullock, Harlow, & Mulaik, 1994;Mulaik, 1987,1993; Mulaik& James, 1995;
and the various authors in Shaffer, 1992). Especially if we believe that causal
explanation is the "ultimate" aim of science (Shaffer, 1992, p. x; Mulaik, 1993),
"the [SEM] framework should help, rather than hinder, clear thinking about causal
mechanisms that we think lie behind the correlations in the data" (Rothenberg,
1992, p. 99). Thus, it seems imperative that we come to an awareness of the
ramifications of the various positions that have been taken on the role of causality
in SEM. Such fundamental considerations as the few listed following should serve
as a reminder that the various approaches to causality can greatly impact the
conceptualization and interpretation of structural equation models (for more
detail, you might start with Shaffer, 1992; and the various cited contributions by
Mulaik.

• Some contributors make the epistemological assumption that nonexperimen-

tal research can generate causal inferences (e.g., Glymour et al., 1987) whereas the
motto "no causation without manipulation" (Holland, 1986, p. 959) implies the
inappropriateness of causal modeling in nonexperimental research settings.
• If time precedence is seen as a necessary condition for causality (e.g., Kenny,
1979), only recursive models (models that do not include bidirectional causal
relations) seem valid. If the cause does not need to precede the effect in time (e.g.,
Marini & Singer, 1988), nonrecursive models can be specified (for a classic
example, see Duncan, Haller, & Portes, 1968).
• Differences in causality definitions seem to contribute to the discussion over
the appropriateness of SEM in exploratory versus confirmatory research modes.
Whereas Glymour et al. (1987) proposed methods that might discover causal
structures, Bollen (1989) and others saw the main value of SEM being the ability
to reject a priori specified models.

IDENTIFYING MAXIMUM ASYMPTOTIC

LEAST SQUARES DISTRIBUTION-FREE
GENERALIZED LIKELIHOOD! ... HUH?

A serious and sometimes ignored issue for correct parameter estimation in structural
equation models is model identification. The question here is whether or not
sufficient variance and covariance information from the observed variables is
available to uniquely estimate the unknown coefficients. Note that this issue is not
one of sample size but mostly one of the ratio of the number of variables in the
model to the number of unknown parameters. Also, even if the model is theoreti-
358 MUELLER

cally identified, empirical underidentification can lead to unsolvable parameter

estimation problems due to random quirks in the data (see Hayduk, 1987; Kenny,
1979). The task of establishing the identification status of a particular structural
equation model is somewhat difficult because, strictly speaking, the researcher
needs to investigate whether or not each parameter can be written as a function of
the variances and covariances of observed variables. Fortunately, SEM software
packages include algorithms that usually—but not always—warn the user if a
certain coefficient might not be identified. Bollen (1989) and others discussed
several ways for checking identification that are beyond the scope and purpose of
this article. However, the researcher applying SEM seems well advised to gain a
basic understanding of the identification issue, hopefully leading to more parsimo-
nious models that can be analyzed without serious estimation problems. In particu-
lar, the researcher should keep in mind the following:

• The number of parameters to be estimated in the model, p, must be less than

or equal to the number of nonredundant variances and covariances of measured
variables, c. That is, if p > c, the model is not identified and parameter estimation
should not be attempted; on the other hand, p < c does not necessarily imply that
the model is identified.
• For models that involve latent variables, each unobserved factor must have an
assigned unit of measurement. Usually, this is accomplished by either (a) specifying
the latent variable(s) to be standardized (i.e., have unit variance) or (b) specifying
the unit of measurement of a latent variable to be equal to the unit of one of its
observed indicator variables (termed a reference variable). In addition, an observed
variable that is the only indicator of a latent variable is assumed and specified to
be measured with known error (possibly zero).
• Recursive path analysis models (models that involve no causal loops and no
latent variables) with uncorrelated error terms associated with the endogenous
(dependent) variables are always identified.
• Identified models with p > c are overidentified. That is, parameter estimates
are based on an implicit assumption that certain variance/covariance parameters are
equal. If this statistically testable2 assumption is judged not to hold, parameter
estimates might not be very accurate because the observed data might not fit the
specified model (see data-model fit section).

Once the researcher has conceptualized a model and checked its theoretical identi-
fication (and hopefully no data anomalies lead to empirical underidentification),
SEM software seems to make the estimation step in the modeling process the easiest

2
Usually, this assumption is evaluated with the "badness-of-fit" chi-square test (df= p - c); see, for
example, Bollen (1989, pp. 263-269) or Mueller (1996, pp. 82-84).
TEACHER'S CORNER 359

and least worrisome of all: Even though a variety of estimation procedures are
available—their appropriateness depending on the viability of distributional and
structural assumptions about the data under investigation—most researchers em-
ploy the maximum likelihood (ML) method, probably because it is the default in
most, if not all, currently available SEM software packages. Alternatives such as
the generalized least squares (GLS) or asymptotically distribution free (ADF)
methods are rarely considered by the nonexpert because much confusion and
as-of-yet insufficient evidence exists regarding the advantages of one such method
over the others. For example, not many new SEM users might be aware of (a) the
asymptotic equivalence and dependence on the multivariate normality assumption
of ML and GLS, (b) the large sample requirement and inconclusive evidence of the
benefits of using the ADF method, or (c) the multitude of still unanswered questions
regarding the behavior of the various estimation methods when analyzing data from
nominal or ordinal variables.3 In my opinion, an appropriate method should be
chosen deliberately, not by default. At a minimum, users should remember that:

• All estimation methods depend on a structural assumption. That is, strictly

speaking, they all fail to provide correct sample estimates, standard errors, and
data-model fit chi-square statistics (see footnotes 2 and 7) if the model under
consideration is misspecified and does not reflect at least a very close approximation
to the true structure in the population. Only a sound theoretical understanding of
the modeled phenomenon can help minimize the chance of seriously violating this
fundamental assumption underlying parameter estimation.
• When distribution-dependent methods are considered (e.g., ML, GLS), the
analyzed data should be scrutinized with regard to the viability of the multivariate
normality assumption before an analysis is conducted. Note, however, that simple
and straightforward tools for that purpose are not readily available; but see the
graphical procedures by Fan (1996) or Thompson (1990) and the suggestions in the
most recent versions of the EQS and LISREL manuals.
• Sample size (n) requirements largely depend on the number of parameters to
be estimated (p). As one suggested general rule of thumb, analyses probably should
not be conducted with n:p ratios of less than 10:1 if parameter estimates are to be
trusted (Bentler, 1993, p. 6). If the purpose is to test overall data-model consistency,
a power analysis approach to the sample size question was recently articulated by
MacCallum, Browne, and Sugawara (1996).
• The choices among estimation techniques seem to reduce to two basic alter-
natives: the multivariate normality dependent ML (or its asymptotic equivalents)
or the largely distribution independent ADF methods.

3
For an introduction to parameter estimation with nonnormal data, see West, Finch, and Curran
(1995).
360 MUELLER

• If the structural and distributional assumptions are met (but are they ever?),
ML provides asymptotically (large sample) unbiased,4 consistent,5 and efficient6
parameter estimates and standard errors. Furthermore, the ML-based large sample
chi-square statistic7 is appropriate for testing an overidentifying restriction and
assessing data-model consistency. As the degree of violation of the normality
assumption increases, however, confidence in the validity of obtained results
decreases (e.g., ML estimates become less efficient and the chi-square statistic more
inflated, leading to an increase in the Type I error rate for model rejection).
If the structural assumption is met and sample size is sufficiently large, ADF
estimates and standard errors are asymptotically consistent, efficient, and largely
independent of the observed data distributions. For the purpose of data-model fit
assessment, an appropriate large sample chi-square statistic again is available. For
small to moderate samples, however, conclusive evidence of the behavior of ADF
estimates is still unavailable (for current reviews, see Bentler & Dudgeon, 1996;
and Curran, West, & Finch, 1996).

"DO THE DATA FIT THE MODEL?" OR "HOW CAN I

CHANGE THE MODEL TO FIT THE DATA?"

Once parameter estimation is complete, SEM users often brace themselves with
great anxiety when examining the various available fit indices.8 Sometimes, this
inspection translates—at least initially—into either the biggest thrill ("Yes! My
model fits! The hypothesized theory is confirmed!") or disappointment ("Oh no!
The model does notfit!I better change it until it does !"). These reactions are fueled,
respectively, by the still somewhat popular beliefs that (a) finding a well-fitting
model is equivalent to discovering and/or confirming the underlying theory that is
consistent with reality and (b) modifying and reestimating an initially ill-fitting
model eventually will lead to the right structure. Following, I try to dispel both
myths regarding fit assessment and model modification during SEM analyses.

4
An estimate is asymptotically unbiased if its expected value (mean value after repeated large-n
sampling) is equal to the parameter it is estimating.
5
An estimate is asymptotically consistent if it converges in probability, as n increases, to the
parameter it is estimating.
6
An estimate is asymptotically efficient if it has a smaller asymptotic variance than other consistent
estimators.
7
Under the distributional and structural assumptions, the product of (n - 1) and the minimum value
of the ML fitting function is asymptotically distributed as a chi-square distribution with df=p-c.
8
For example, choices include an appropriate chi-square statistic; Jöreskog and Sörbom's (1981)
unadjusted or adjusted goodness-of-fit indices; incremental fit indices such as Bentler and Bonnett's (1980)
nonnormed and normed fit indices or Bentler's (1990) nonnormed and normed comparative fit indices;
the parsimony fit indices suggested by James, Mulaik, and Brett (1982) and Mulaik et al. (1989; the Akaike
(1987) information criterion (AIC); or Steiger's (1990) root mean square error of approximation, to name
just a few. For comprehensive reviews consult Marsh, Baila, and McDonald (1988) or Tanaka (1993).
TEACHER'S CORNER 361

The interpretive weight that is placed on the many alternative fit indices of recent
years (see footnote 8) seems to depend to a large extent on the purpose of the SEM
analysis: Assuming that a model was conceptualized and hypothesized to capture
as accurately as possible some slice of reality—by carefully balancing the principle
of parsimony with the complexity of the social science phenomenon being stud-
ied—the analysis' purpose can be clarified by reflecting on the relative importance
of two key questions:

• How and to what degree do certain variables or factors affect each other?
• Why should certain theories be retained as plausible reflections of reality?

The researcher's focus on the first query suggests & predictive purpose, whereas an
emphasis on the second question points toward an explanatory aim. Now, if the
purpose of a particular analysis is mainly prediction, the interpretation of overall fit
indices might be secondary to the interpretation of the estimated strengths and
directions of the structural paths. Here, assessment of the "fit" or match between data
and model may be focused more on questions dealing with individual parameter
estimates: Do coefficients and their standard errors have theory contradicting signs
or magnitudes? Are any of the variance estimates negative? Are coefficients of
determination (R2) negative, close to zero, or greater than unity, and so forth?
If, on the other hand, the investigator seeks information on the tenability of a
theory, that is, the primary purpose of the analysis is explanation, then the evaluation
of overall fit indices might become the primary—but clearly not the only—concern
(note the parallel between the previous arguments and the relative emphasis on the
interpretation of regression weights and coefficients of determination in traditional
regression analyses). Here, the implicit question often is whether or not the
hypothesized model or theory is consistent with reality ("Is this the model?"). An
unfortunate truth, however, is that empirical fit indices cannot confirm the model-
reality link, only address the data-model consistency question. Figure 1 illustrates
this major nonstatistical limitation9 of the available overall measures of fit. The
relation between Sets A, B, and C shown in Figure 1 is not the only one possible;
it is used here to clarify the following points:

• Evidence of data-model consistency does not necessarily imply that the chosen
model represents a valid, albeit simplified, reflection of reality.
• Data-model inconsistency reflects a mismatch between the proposed model
and reality (unless, of course, a Type 1 error was comitted).
• Fit assessment mainly is a disconfirmatory activity: A model or theory could be
disconfirmed after concluding that observed data do not fit the hypothesized structure.
But a model cannot be confirmed as being the best (or even, a good) approximation

9
For a review of statistical issues concerning various fit indices, see, for example, Tanaka (1993).
362 MUELLER

SETB
(models consistent with data)

SETC
(models consistent with reality)

data-model inconsistency k model-reality inconsistency

data-model consistency \ k model-reality consistency

FIGURE 1 Data-model versus model-reality consistency.

to reality after discovering that the model happens to fit the observed data, possibly
by chance alone.
• All that can be expected from "good" fit values is some indication that the
model, as specified, could be a viable representation of the true relations between
the included variables.
• A semantic change from interpreting measures of "model fit" to interpreting
indices of "data fit" might be helpful to those researchers who wonder whether
"various models fit the data" or whether "the collected data fit a particular model."
Although the former query can lead the investigator toward an exploratory search
for any model that (by chance?) fits a particular data set, the latter question might
direct the researcher more toward a disconfirmatory assessment of whether there
is evidence to conclude that the collected data do not fit the a priori specified
model(s). That is, to some investigators, assessing the "data fit" might convey the
largely disconfirmatory idea of judging the consistency between collected data and
an a priori theoretically conceptualized model better than attempting to improve
the "model fit," which might be interpreted as a more exploratory approach to
identifying models that happen to fit a particular data set.

If any data-model inconsistencies are identified, the researcher has several options
on how to proceed, depending on where on the deduction-induction continuum the
main goal of the just conducted analysis is placed: (a) model rejection, (b) model
comparison, or (c) model generation (adapted from Jöreskog, 1993). In the first
instance, the aim is a mainly deductive and disconfirmatory one: The model or
theory might be rejected based on poor data-model fit and deemed an invalid
TEACHER'S CORNER 363

representation of reality (see Figure 1 ). Now, a new model could be conceptualized,

new data collected, and the SEM process repeated from the start.
If the goal of the SEM analysis is to select a "best" model from an a priori
conceptualized and specified set of competing alternative models, we can utilize
some of the obtained statistical information—in addition to substantive considera-
tions—in the comparison of these structures. For example, if the model with
unsatisfactory fit results was conceptualized as a nested submodel of a more general,
complex structure, an easy-to-perform difference in chi-square test allows us to
evaluate whether or not the latter model results in a significantly more favorable
data-model fit result than the former. If the to-be-compared models are not nested,
an alternative form of comparison is to evaluate the relative predictive validity of
the models with some version of a regression-like cross-validation index (CVI or
ECVI; see Cudeck & Browne, 1983; and Browne & Cudeck, 1989, 1993).
By far the most common, but probably the least admitted of the aforementioned
goals, seems to be that of model generation. The search for the best fitting, yet
substantively meaningful model led several authors to suggest various model
modification strategies (see, e.g., Bollen & Long, 1993; Kaplan, 1990). All of these
usually more inductive and somewhat exploratory approaches are focused on using
the obtained analysis results to modify, then reestimate with the same data, and thus
hopefully improve the initially conceptualized model with regard to data-model
consistency. Stated another way, during model modification, an aspect of a null
hypothesis (the specified structural equation model) is being changed based on an
in-depth "peek" at the observed data (e.g., fit indices, modification indices, expected
parameter change statistics, Wald tests10) and then reevaluated to obtain the
preferred results (a now better fitting model). Note that a "good" model emerging
from repeated modifications could fit the data just by chance rather than being a
truly better approximation to reality than the initially hypothesized model.
Overall, it might be helpful to consider the following points before reaching
conclusions from observed data-model inconsistencies and embarking on a model
modification mission to improve the fit results (also, see Steiger, 1990, and the other
responses to Kaplan, 1990):

• Model Rejection: A Type I error might have been committed. As mentioned

previously, violations of distributional assumptions underlying parameter estimation
can lead to falsely rejecting a correctly specified model that is a valid approximation
of reality. In addition, the effect of a chosen sample size on the model rejection decision
must be taken into account: Because desirable properties of ML and ADF estimators
are asymptotic, analyses based on large samples lead to more trustworthy results than
those conducted with smaller samples. On the other hand, increasing the sample size

10
For introductions to these and other modification statistics and strategies, consult any structural
equation modeling text, for example, Bollen and Long (1993), Hoyle (1995), or Mueller (1996).
364 MUELLER

also increases the power of the chi-square-based tests used to evaluate data-model
fit and, hence, increases the chance of committing a Type I error, that is, flat-out
rejecting a model even though it could serve as a "good" approximation to reality.
• Model Comparison: An interpretation of results from a selected model that
favorably compares with other competing structures cannot be made outside the
context of such alternative models. That is, favorable results are relative, not
absolute. Statistical model comparisons must, of course, be interpreted in light of
the usual concerns of Type I and Type II errors. Finally, note that models that are
not nested cannot be directly compared with statistical means.
• Model Generation: Here, an attempted search for the structure that best fits a
particular data set leads to a mostly exploratory view of SEM that, in my view, has the
potential of doing more harm than good. Substantially changing a null hypothesis (i.e.,
the initially hypothesized model) based on the observed data (i.e., the values of various
fit or modification indices)—and then retesting the modified hypothesis with the same
data—is an unacceptable practice in most applications of traditional statistical thinking.
Of course, a lack of resources and other circumstances might provide sufficient
justification for responsible model modification and areanalysis of the data. However,
a researcher should be aware of the possibility of capitalizing on chance when various
not carefully conceptualized models are being fitted to the same data. When a modified
model is presented as a possible—but certainly not the—representation of the true
structure, a Type II error might have been committed.
• Irrespective of which of the various post hoc model modification strategies is
chosen, cross-validating obtained results with a new and independent sample is
very desirable because it gives some protection against the capitalization on chance
and specification errors that are internal11 to the model. If the original sample is
deemed large enough, it can be split into calibration and validation subsamples and
the model's predictive validity could be judged by interpreting Cudeck and
Browne's (1983) cross-validation index (also see Browne & Cudeck, 1989, 1993,
for extensions of their work and a discussion of benefits and shortcomings of the
various approaches to cross-validation in SEM).

SUMMARY

The recent rapid developments in commercially available SEM software packages

and growth in SEM applications in the social sciences12 provide new challenges
and responsibilities to instructors, potential users, and reviewers of SEM applica-
tions: Whereas computer programs become more user-friendly and accessible to

11
Kaplan (1990) distinguished between external and internal model specification errors: The former
result from having omitted important variables from inclusion in the model, the latter occur when
important relations between variables within the model have been omitted.
12
For current statistics from psychology, see Tremblay and Gardner (1996).
TEACHER'S CORNER 365

novices, the increasing amount of technical reports on advances in SEM theory

become more difficult to synthesize for many applied social science researchers.13
Earlier, I reviewed issues important to the satisfactory completion of each step in
a typical SEM analysis: (a) careful, theory-based model conceptualization; (b)
identification and parameter estimation; (c) data-model fit assessment; and, if
justifiable, (d) possible model modification. For the new generation of SEM users
and consumers, a clearer understanding of the purposes, advantages, limitations,
and as-of-yet unanswered questions is becoming more critical than the previously
needed knowledge of technical computing issues in SEM. Basically:

• The central purpose of SEM can be conceptualized as assisting researchers in

(a) predicting how and to what degree variables and/or latent factors structurally
affect each other within the context of a particular model and (b) explaining why a
particular theoretical model should be rejected or could be viewed as one possible
representation of the true causal mechanisms underlying the observed associations.
• A structural equation model is nothing more than an oversimplified approximation
of reality, no matter how carefully conceptualized. A good model can be characterized
as featuring an appropriate balance between efforts to represent acomplex phenomenon
in the simplest possible way and to retain enough complexity that leads to the most
meaningful interpretations possible. Whenever possible, a set of competing alternative
models should be conceptualized, each exhibiting only those features of a perceived
reality that relate directly to the analysis purposes, be they the prediction or explanation
of structural relations. That is, simply fitting (sometimes grossly misspecified) models
to data seems not to address adequately the "ultimate" aim of science: causal explanation
(Mulaik, 1993; and Shaffer, 1992, p. x).
• In most straightforward applications, theoretical—but not empirical—underiden-
tification can be avoided by following a set of simple rules. In addition, modern SEM
software usually can aid in detecting the source of a potential underidentification.
• Parameter estimation should be conducted with appropriate estimation meth-
ods. All available methods depend on the (mostly unrealistic) structural assumption
that the correct model was specified. Furthermore, it remains difficult to assess
whether or not the data meet the multivariate normality assumption underlying
some commonly used estimation methods (but see Fan, 1996; Thompson, 1990; or
the recommendations in the most recent SEM software manuals). If the structural
or distributional assumptions are violated, especially in analyses with small sample
sizes, obtained results might not be as accurate as previously thought and, thus, the
validity of inferential interpretations might be compromised (for a review of the
problems associated with violated assumptions and proposed solutions, see Bentler
& Dudgeon, 1996).

13
For examples, just scan the many entries in the annotated bibliography by Austin and Calderón
(1996).
366 MUELLER

• Renewed attempts have surfaced to reexamine the usefulness of statistical

hypothesis testing in the social and behavioral sciences (e.g., Cohen, 1994; Kirk,
1996; Schmidt, 1996; Thompson, 1993, 1996). Implications and outcomes from
these developments and discussions could have profound effects on the interpreta-
tion of future inferential SEM analyses.
A "best" way to assess data-model fit has not yet been identified due to many
unanswered questions regarding the statistical behavior of the various fit indices.
Recently, a confidence interval approach to data-model fit assessment has been
proposed (MacCallum et al., 1996) that allows for power estimations (and associ-
ated sample size calculations) for "not-good-fit" statistical tests.
• When a modified structure is reanalyzed and reevaluated using the same data
set that was utilized for the initial analysis, data-model fit results usually will
improve, not necessarily due to a truly "better" model (a structure that better reflects
the true causal processes in the population that generated the data) but simply
because a model has been fitted to a particular sample data set.
• Finally, more research reports utilizing a SEM analysis should include (a)
sufficient theoretical justification for the proposed models; (b) a brief definition or
explanation of the concept of causality, if causal links between variables are
investigated; (c) a justification for the chosen estimation method and a demonstra-
tion of the viability of the underlying distributional assumptions, if applicable; (d)
multiple data-model fit assessments based on different approaches or indices from
different categories (see MacCallum et al., 1996; Tanaka, 1993); (e) a theoretical
and statistical justification for model modifications, if applicable; and, of course,
(f) sufficient descriptive statistics (e.g., an appropriate covariance matrix) so that
the reader can verify the reported results.

ACKNOWLEDGMENT

A previous version of this article was delivered as an invited address to the Special
Interest Group on Structural Equation Modeling during the meeting of the Ameri-
can Educational Research Association, New York, April 1996.

REFERENCES

Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-332.

Arbuckle, J. L. (1996). AMOS 3.5 [Computer software]. Chicago: SmallWaters.
Asher, H. B. (1983). Causal modeling. Beverly Hills, CA: Sage.
Austin, J. T., & Calderón, R. F. (1996). Theoretical and technical contributions to structural equation
modeling: An updated annotated bibliography. Structural Equation Modeling, 3, 105-175.
Baumrind, D. (1983). Specious causal attributions in the social sciences: The reformulated stepping-
stone theory of heroin use as exemplar. Journal of Personality and Social Psychology, 45,
1289-1298.
TEACHER'S CORNER 367

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,
238-246.
Bentler, P. M. (1993). EQS: Structural equations program manual. Los Angeles: BMDP Statistical
Software.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance
structures. Psychological Bulletin, 88, 588-606.
Bentler, P. M., & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory,
directions. Annual Review of Psychology, 47, 541-570.
Bentler, P. M., & Wu, E. J. C. (1995). EQS For Windows 5.0. [Computer software]. Encino, CA:
Multivariate Software.
Blalock, H. M. (1964). Causal inferences in nonexperimental research. Chapel Hill: University of North
Carolina Press.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Long, J. S. (Eds.). (1993). Testing structural equation models. Newbury Park, CA:
Sage.
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance
structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures.
Multivariate Behavioral Research, 24, 445-455.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage.
Bullock, H. E., Harlow, L. L., & Mulaik, S. A. (1994). Causation issues in structural equation modeling
research. Structural Equation Modeling, 1, 253-267.
Byrne, B. M. (1994). Structural equation modeling with EQS and EQS/Windows: Basic concepts,
applications, and programming. Thousand Oaks, CA: Sage.
Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate
Behavioral Research, 18, 115-126.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.
Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behav-
ioral Research, 18, 147-167.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and
specification error in confirmatory factor analysis. Psychological Methods, 7(1), 16-29.
Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic.
Duncan, O. D., Haller, A. O., & Portes, A. (1968). Peer influence on aspiration: A reinterpretation.
American Journal of Sociology, 74, 119-134.
Fan, X. (1996). An SAS Program for assessing multivariate normality. Educational and Psychological
Measurement, 56, 668-674.
Freedman, D. A. (1987). As others see us: A case study in path analysis. Journal of Educational
Statistics, 12(2), 101-128.
Glymour, C , Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure: Artificial
intelligence, philosophy of science, and statistical modeling. Orlando, FL: Academic.
Hayduk, L. A. (1987). Structural equation modeling with LISREL Baltimore: Johns Hopkins University
Press.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association,
81, 945-960.
Hoyle, R. H. (Ed.). (1995). Structural equation modeling: Concepts, issues, and applications. Thousand
Oaks, CA: Sage.
James, L. R., Mulaik, S. A., & Brett, J. (1982). Causal analysis: Models, assumptions, and data. Beverly
Hills: Sage.
368 MUELLER

Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 294-316). Newbury Park, CA: Sage.
Jöreskog, K. G., & Sörbom, D. (1981). Analysis of linear structural relationships by maximum likelihood
and least squares methods (Research Report 81-8). Uppsala, Sweden: University of Uppsala.
Jöreskog, K. G., & Sörbom, D. (1995). LISREL 8 with PRELIS 2 For Windows. [Computer software].
Chicago: Scientific Software International.
Kaplan, D. (1990). Evaluating and modifying covariance structure models: A review and recommen-
dation. Multivariate Behavioral Research, 25, 137-155.
Kenny, D. A. (1979). Correlation and causality. New York: Wiley.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psycho-
logical Methods, 56, 746-759.
Ling, R. (1983). Review of "Correlation and Causality" by David Kenny. Journal of the American
Statistical Association, 77, 489-491.
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of
sample size for covariance structure modeling. Psychological Methods, 1, 130-149.
Marcoulides, G. A., & Schumacker, R. E. (Eds.). (1996). Advanced structural equation modeling: Issues
and techniques. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Marini, M. M., & Singer, B. (1988). Causality in the social sciences. In C. C. Clogg (Ed.), Sociological
methodology: Vol. 18 (pp. 347-409). Washington, DC: American Sociological Association.
Marsh, H. W., Baila, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor
analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.
Mueller, R. O. (1996). Basic principles of structural equation modeling: An introduction to LISREL
and EQS. New York: Springer-Verlag.
Mulaik, S. A. (1987). Toward a conception of causality applicable to experimentation and causal
modeling. Child Development, 58, 18-32.
Mulaik, S. A. (1993). Objectivity and multivariate statistics. Multivariate Behavioral Research, 28,
171-203.
Mulaik, S. A., & James, L. R. (1995). Objectivity and reasoning in science and structural equation
modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications
(pp. 118-137). Thousand Oaks, CA: Sage.
Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation
of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445.
Muthén, B. O. (1992). Response to Freedman's critique of path analysis: Improve credibility by better
methodological training. In J. P. Shaffer (Ed.), The role of models in nonexperimental social science:
Two debates (pp. 80-86). Washington, DC: American Educational Research Association.
Rothenberg, T. J. (1992). Comments on Freedman's paper. In J. P. Shaffer (Ed.), The role of models in
nonexperimental social science: Two debates (pp. 98-99). Washington, DC: American Educational
Research Association.
Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implica-
tions for the training of researchers. Psychological Methods, 1, 115-129.
Schumacker, R. E., & Lomax, R. G. (1996). A beginner's guide to structural equation modeling.
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Shaffer, J. P. (Ed.). (1992). The role of models in nonexperimental social science: Two debates.
Washington, DC: American Educational Research Association.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach.
Multivariate Behavioral Research, 25, 173-180.
Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. A. Bollen &
J. S. Long (Eds.), Testing structural equation models (pp. 10-39). Newbury Park, CA: Sage.
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three sug-
gested reforms. Educational Researcher, 25(2), 26-30.
TEACHER'S CORNER 369

Thompson, B. (Ed.). (1993). Statistical significance testing in contemporary practice [Special issue].
The Journal of Experimental Education, 61(4).
Thompson, B. (1990). MULTINOR: A FORTRAN program that assists in evaluating multivariate
normality. Educational and Psychological Measurement, 50, 845-848.
Tremblay, P. F., & Gardner, R. C. (1996). On the growth of structural equation modeling in psycho-
logical journals. Structural Equation Modeling, 3, 93-104.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables.
In R. H. Hoyle (Ed.), Structural equation modeling (pp. 56-75). Thousand Oaks, CA: Sage.
Wolfle, L. M. (1985). Applications of causal models in higher education. In J. C. Smart (Ed.), Higher
education: Handbook of theory and research (Vol. 1, pp. 381-413). New York: Agathon.