0% found this document useful (0 votes)
2 views

working-with-difficult-errors-in-sem

This document discusses troubleshooting issues in Structural Equation Modeling (SEM) related to improper solutions, such as negative variance parameters and non-positive definite covariance matrices. It outlines common causes of these problems, signs to recognize them, and strategies for fixing them, emphasizing the importance of model specification and data quality. The document also includes references for further reading on the topic.

Uploaded by

ayeshaford88
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

working-with-difficult-errors-in-sem

This document discusses troubleshooting issues in Structural Equation Modeling (SEM) related to improper solutions, such as negative variance parameters and non-positive definite covariance matrices. It outlines common causes of these problems, signs to recognize them, and strategies for fixing them, emphasizing the importance of model specification and data quality. The document also includes references for further reading on the topic.

Uploaded by

ayeshaford88
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Troubleshooting problems with

SEM models that have “Heywood”


cases such as negative variance
parameters and non-positive
definite covariance matrices

Jeremy Yorgason
Brigham Young University
Introduction
• In SEM, it is fairly common to encounter Improper Solutions
• Non-positive definite covariance matrices
• Models with negative variance terms
• Negative PSI matrix
• Correlation or other standardized values > 1
• Model is not identified, you need “x” number of constraints for it to be
identified

• Why is this important?


• Results from models that have this problem cannot be trusted, and shouldn’t
be reported in journal articles.
• Standard errors of estimates may be affected (Chen et al., 2001)
• Error messages are Diagnostic tools
• It’s a good idea to confirm the diagnosis that the computer system is giving you
• The point is to understand what may be going on with your model/data
• This often requires that you look at all of the output for your model
Goals for this Segment of the Workshop

1. How do I recognize the


problem?

2. How do I fix the problem?

3. Examples
Causes of Improper Solutions in SEM
Causes of Improper Solutions in SEM
1. Specification error in the model
A.Missing a “1” on one of the factor loadings of a latent variable, or
on an error term
B.Correlations of variables or errors from IV to DV of a model
C.Excessive error correlations on indicators of a single latent variable
D.Very low factor loadings on a latent variable
E.Omitted paths that should be in a model
2. Model under-identified (negative degrees of freedom)
A. V(V+1)/2 minus parms (if estimating means/intercepts use V(V+3)/2)
3. Non-convergence
4. Outliers in the data
5. Too small of sample for the model being estimated
Kline, 2011; Kolenikov & Bollen, 2012; Chen et al., 2001; Newsome, 2012
Causes of Improper Solutions in SEM
6. Missing data
7. “Sampling fluctuations”
8. Two indicator latent variables
A.This includes 2nd order latent variables
9. Non-normally distributed outcome or indicator variables in your
model
B.Categorical
C.Count, zero-inflated, etc.
10. Empirical under-identification
A.“Positive degrees of freedom, but there is insufficient covariance
information in a portion of the model for the computer to generate valid
estimates” (Newsome, 2012)
B.May be caused by some of the above issues

Kline, 2011; Kolenikov & Bollen, 2012; Chen et al., 2001; Newsome, 2012
Signs that there is a problem
Amos:
“XX: Default Model”

“The following variances are negative.”

“This solution is not admissible”

“The model is probably unidentified. In order to achieve


identifiability, it will probably be necessary to impose 1 additional
constraint.”

In place of estimates in the Amos output you see “unidentified”


Signs that there is a problem
Mplus:
THE MODEL ESTIMATION TERMINATED NORMALLY

THE STANDARD ERRORS OF THE MODEL PARAMETER


ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME
PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-
ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO
THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF
MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS
-0.762D-17. PROBLEM INVOLVING PARAMETER 59.

MODIFICATION INDICES COULD NOT BE COMPUTED.


THE MODEL MAY NOT BE IDENTIFIED.
More signs that there is a problem
Mplus or other programs:
1. Negative variance estimate (remember Variance =
stddev2)
A. Find it in your output

2. Correlations above 1 (remember, can’t be larger than 1)


A. Find it in your output

3. Error variance that is really BIG (999 usually indicates a


problem in Mplus, although this is ok if something is
“constrained” to be a certain number)
How can I fix these problems?
1. Look at a diagram of your model and see if you have miss-specified
your model.
A. Check your syntax (e.g., look for missing semi-colons in Mplus)
B. Missing “1” for a factor loading on latent variables
C. Missing “1” on regression path of error term
D. Sometimes Amos creates “GHOST” variables. You can’t see them, but they
are there! Sometimes off the screen, sometimes really, really, really, small
E. Sometimes Amos will “double correlate variables”
F. Any correlations across IV/DV lines?
1. Careful, this is something your “modification indices” will suggest to improve
model fit. However, don’t ever add parameters that go against theory
G. Make sure you have appropriate regression paths in the model (not too few,
in this case)
H. Make sure your measurement model is appropriate
A. Factor loadings > .40
B. Error correlations – start with none, correlation between items is captured in the
latent variables. Typically you’ll use mod indices here
How can I fix these problems?
2. Researchers need to be attentive to model problems
when there are latent variables with only 2 indicators (can
be unstable)
A. Newsom (2012) suggests constraining the two factor loadings to
be equal…

3. Caution is also warranted when estimating “higher order”


latent variables with only two factors, and certain complex
models (e.g., common fate models) that require specific
constraints in order for the model to be identified
How can I fix these problems?
4. Either use a large sample , OR check the sample size
and compare with the number of parameters being
estimated.
• N/q rule (n = sample, q = parameters in the model; Kline, 2011)
• Count variances, covariances, and means OR
• Most programs tell you how many parms are in your model. Amos:

Number of distinct sample moments: 77


Number of distinct parameters to be
44
estimated:
Degrees of freedom (77 - 44): 33

• Quick Check: 10 people in the sample for every observed


(rectangle) variable in the model
How can I fix these problems?
5. If your model looks to be specified correctly, but you still
have a problem with the model, it’s time to start looking at
your data
A. Run a frequency on all variables in the model, to see if there is
some data entry error or outliers that could be inflating the
variance of one or more variables
1. Side note: sometimes SEM models have trouble with variables
that have very different (larger or smaller) variance values than
the rest of the variables in your model (e.g., income in dollars)
2. If this is the case, you will want to rescale or transform these
variables to ensure similar variances
3. Also, in the transfer of data from one program to another,
sometimes columns of data are shifted or otherwise corrupted
How can I fix these problems?
6. Do you have any categorical or non-normally distributed
dependent variables that are specified as continuous?
A. Amos doesn’t handle dichotomous, count, or zero inflated
outcomes
B. Mplus does handle them well, but you have to specify in the
syntax that you are working with such distributions
C. You may have specified non-normal variable distributions, but
you have small cell sizes (e.g., ordered categorical variable
with only 1 or 2 cases on one end of the distribution)
How can I fix these problems?
• 7. If your model does not “converge” it means that the
program went through X number of iterations, but could
not find a suitable solution. You can increase iterations
from the default number to try to estimate your model. If
this doesn’t work, you probably need to change your
model or you have a data problem.
Atypical Solutions: Start Values and
Iterations
8. A start value is a number assigned to each estimated
parameter when “iterations” begin for a model. Amos and
Mplus automatically create start values for each parameter to
be estimated, yet it is possible to assign start values if the
program assigned ones don’t work. Researchers can provide
start values for a model, which are essentially any known
parameter estimates (e.g., regression weight or coefficient).
You can get these by running simple linear regression with the
variables in your model, and then plug in the coefficient from
the simpler model.
A. How in the world would I know if I have bad start values???
B. How would I know what variable to look at that might be non-
normally distributed? Or be categorical and have small cell sizes?
Greek Alphabet and Mplus Output
• Nu (Ν/ν)= intercepts or means of observed variables
• Lambda (Λ/λ)= Factor Loadings
• Theta (Θ/θ)= error variances and covariances
• Alpha (Α/α)= means and intercepts of latent variables
• Beta (Β/β) and Gamma (Γ/γ) = regression coefficients
• Psi (Ψ/ψ)= residual variances and covariances of continuous latent variables
• Tau (Τ/τ) = thresholds of categorical observed variables
• Delta (Δ/δ) = scaling information for observed dependent variables
• Etc. – see Mplus manual

• Ask for Tech1 in the output and then when Mplus says there is a problem with, for
example, parameter #16, go and find that parameter and see which matrix it is in
and then identify the variable and go look at the model/data to see where the
problem is. If no variable is identified, need to go back to Model Specification.
• CAUTION: Specific Parameter warnings are usually a DECOY! They generally are simply letting
you know the model is not correctly specified, and no matter what you do to the identified variable
it will not make your model work.
Examples: “Message of Death!”
• From a class assignment with a model involving 56 cases.
• Mplus error:
THE MODEL ESTIMATION TERMINATED NORMALLY

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE


TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.383D-13. PROBLEM INVOLVING PARAMETER 31.

THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE SAMPLE SIZE
IN ONE OF THE GROUPS.

WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IN GROUP GRAD IS NOT


POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL
VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO
ONE
BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE
THAN TWO
OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION.
PROBLEM INVOLVING VARIABLE DE4.
Atypical Solutions: Sampling Fluctuations
• Model is specified correctly
• Don’t have outliers in your data, and you have a large enough sample to
estimate the model at hand
• Model is not identified, although you have positive degrees of freedom

• 9. Possible tests to confirm you have sampling fluctuations and not some other
problem:
• Confidence interval from standard errors includes a zero
• Calculate a “z” by taking the ratio – Estimate: Standard Error, and then compare to a z
distribution
• Wald test, take ratio – (Estimate:Standard Error) 2 then compare to a chi-square distribution
with 1 df
• Likelihood ratio test statistic
• Lagrangian multiplier (mod indices when var constrained to 0)
• Boostrap resampling method (esp. with non-normal data)
• Scaled chi-square difference test
• Signed root tests
• Empirical sandwich estimators
Atypical Solutions: Sampling Fluctuations

• Model is specified correctly


• Don’t have outliers in your data, and you have a large
enough sample to estimate the model at hand
• Model is not identified, although you have positive degrees of
freedom

• Fix the negative variance to 0 or to a small positive number


• This can affect model parameters
Handout
• Chen et al (2001) suggested decision tree
• 1. Is your model identified?
• 2. If so, do you have any negative error variances?
• 3. If so, do you have any outliers that are a problem?
• 4. If not, is the model empirically underidentified?
• 5. If not, do you have sampling fluctuations?
• 6. If so, constrain the negative variance to be 0, a small positive
number, or to be the population variance
• Newsome (2012) prevention tips
• Careful specification
• Use larger samples
• Model factors with 3 or more indicators
• Use reliable measures (high loadings)
• Well conditioned data
Working Example
• See Amos Program

• Depending on time, manipulate an example to show what


errors commonly occur, what the program tells you, and
how to fix the problems
Conclusion
• Either….
• Work with perfect data and perfect models

• OR
• Learn to interpret SEM error messages, and how to fix common
problems
References
• Chen, F., Bollen, K. A., Paxton, P., Curran, P., & Kirby, J.
(2001).Improper solutions in structural equation models:
Causes, consequences, and strategies. Sociological
Methods and Research, 29, 468-508.
• Kline, R. B. (2011). Principles and practices of structural
equation modeling (3rd Ed). New York, NY: Guilford Press.
• Kolenikov, S. & Bollen, K. A. (2012). Testing negative error
variances: Is a Heywood case a symptom of
misspecification? Sociological Methods and Research, 41,
124-167.

You might also like