Cfabase
Cfabase
Preface 3
2
Preface
The data set and the models evaluated are those used by James Boswell in his APSY613 Multivariate
Analysis class in the Psychology Department at the University at Albany. The data set is the WISC-R data
set that the multivariate statistics textbook by the Tabachnick textbook (Tabachnick et al., 2019) employs for
confirmatory factor analysis illustration. The goal of this document is to outline rudiments of Confirmatory
Factor Analysis strategies implmented with three different packages in R. The illustrations here attempt to
match the approach taken by Boswell with SAS. The document is targeted to UAlbany graduate students
who have already had instruction in R in their introducuctory statistics courses.
This book/monograph uses the bookdown package (Xie, 2018a) for R (R Core Team, 2018), which was
built on top of rmarkdown (Allaire et al., 2018) and knitr (Xie, 2015). RStudio (RStudio Team, 2015)
was used for all writing and programming.
3
Chapter 1
This short monograph outlines three approaches to implementing Confirmatory Factor Analysis with R,
by using three separate packages. The illustration is simple, employing a 175 case data set of scores on
subsections of the WISC. The idea is to fit a bifactor model where the two latent factors are the verbal and
performance constructs. In this primary two-factor model, each observed variable is associated with only one
latent factor. Then a second model is fit. It includes a path from both latent factors to one of the variables.
Comparisons of models are then performed.
Several R packages are required for the implementations outlined in the succeeding chapters. Since CFA
is implemented as a structural equation model, commercial software (e.g., LISREL, EQS, SAS) as well as
open-source approaches to CFA all use SEM routines. The three primary R packages to illustrate CFA are
lavaan, sem and OpenMx, along with the drawing package, semPlot. One major advantage of using
R for implementation of these methods is that semPlot provides a user-friendly method for producing
path diagrams of many styles by simply taking a model object from the CFA fitting functions of the other
packages.
Other “housekeeping” packages are loaded here, but the three analytical packages for CFA are loaded at
the point in the sequence of their usage since some common function names are shared - thus load order is
important.
library(car)
library(semPlot)
library(psych)
library(knitr)
library(kableExtra)
library(MVN)
library(dplyr)
library(magrittr)
library(tidyr)
library(corrplot)
library(ggraph)
Package citations for packages loaded here (in the above order): car (Fox et al., 2018), semPlot (Epskamp
and with contributions from Simon Stuber, 2017), psych (Revelle, 2019), knitr (Xie, 2018b), kableExtra
(Zhu, 2019), MVN (Korkmaz et al., 2018), dplyr (Wickham et al., 2018), magrittr (Bache and Wickham,
2014), tidyr (Wickham and Henry, 2018), corrplot (Wei and Simko, 2017)
Package citations for packages loaded elsewhere in this document: bookdown (Xie, 2018a), rmarkdown
(Allaire et al., 2018), sem (Fox et al., 2017), lavaan (Rosseel, 2018), OpenMx (Boker et al., 2019)
4
1.1. CAVEAT ON THIS DOCUMENT 5
1.2 Resources
The following list will provide a good start for those needing a broader in CFA modeling and more detailed
sources for the primary packages empoyed in this document.
• A comprehensive textbook treatment of SEM and CFA: (Tabachnick et al., 2019)
• Tim Brown’s well-regarded book on CFA: (Brown, 2015)
• Rosseel’s extensive original article on lavaan: (Rosseel, 2012)
• El-Sheik, et al on a comparison of software for SEM: (El-Sheikh et al., 2017)
• Narayanan’s review of eight SEM software approaches (Narayanan, 2012)
• Espkamp’s helpful original article on the semPlot package: (Narayanan, 2012)
In addition, the following internet resources can be helpful.
• Lavaan package home: [http://lavaan.ugent.be/]
• Google Group for Lavaan: [https://groups.google.com/forum/#!forum/lavaan]
• OpenMx package home: [https://openmx.ssri.psu.edu/wiki/projects]
• OpenMx package online forums: [https://openmx.ssri.psu.edu/forums]
• SEM package page on CRAN: [https://cran.r-project.org/web/packages/sem/index.html]
• lavaan package page on CRAN: [https://cran.r-project.org/web/packages/lavaan/index.html]
• OpenMx package page on Cran: [https://cran.r-project.org/web/packages/OpenMx/index.html]
• MVN package page on Cran: [https://cran.r-project.org/web/packages/MVN/index.html]
• semPlot package page on Cran: [https://cran.r-project.org/web/packages/semPlot/index.html]
Chapter 2
This chapter prepares the data set and does some univariate and multivariate description of its characteristics
prior to the CFA implementation in later chapters. Both numeric and graphical description and inference
about distribution shape are quickly available with R functions from the psych and MVN packages.
ID info comp arith simil vocab digit pictcomp parang block object coding
3 8 7 13 9 12 9 6 11 12 7 9
4 9 6 8 7 11 12 6 8 7 12 14
5 13 18 11 16 15 6 18 8 11 12 9
6 8 11 6 12 9 7 13 4 7 12 11
7 10 3 8 9 12 9 7 7 11 4 10
6
2.2. NUMERIC AND GRAPHICAL DESCRIPTION OF THE DATA 7
ID info comp arith simil vocab digit pictcomp parang block object coding
8 11 7 15 12 10 12 6 12 10 5 10
A note about tables in this document: Many of the tables generated by the various R functions in this
document are reformatted so that they do not appear as the plain text that is typically output into the R
console. The kable function in the knitr package permits formatting that is well-rendered with rmarkdown
and bookdown document production. kable is used frequently.
Density
Density
Density
0.00
0.00
0.00
0.00
5 10 20 0 5 10 4 8 12 16 5 10 15
Density
Density
Density
0.00
0.00
0.00
0.00
5 10 20 0 5 10 15 5 10 20 5 10 15
Density
Density
0.00
0.00
0.00
5 10 15 5 10 20 0 5 10 15
The MVN package permits a good array of diagnostic tests/plots for univariate/multivariate shape and
outliers.
1
Outliers (n=16)
25
Non−outliers (n=159) 2
3
Chi−Square Quantile
4
5
6
87
20
11 10 9
12
13
14
1615
15
10
Quantile: 21.92
5
5 10 15 20 25 30 35
1
Outliers (n=9)
25
Non−outliers (n=166) 2
3
Chi−Square Quantile
4
5
6
987
20
15
10
Quantile: 26.051
5
5 10 15 20 25 30
2.3.1 SPLOM
Among the many scatterplot matrix capabilies in R, John Fox’ scatterplot.matrix function in his car
package has probably been seen by most students.
scatterplotMatrix(wisc2,cex=.2,
smooth=list(col.smooth="red", spread=F, lwd.smooth=.3),
col="skyblue1",
regLine=list(lwd=.3,col="black"))
0 10 5 15 0 10 5 15 5 15
info
5
comp
0
arith
4
simil
5
vocab
5
digit
0
pictcomp
5
parang
5
block
5
object
5
coding
0
5 15 4 12 5 15 5 15 5 15 0 10
Even with some control over colors and sizes of points/lines, this SPLOM probably has too many variables
to be effective - each plot is very small. Nonetheless, the sense of fairly linear relationships among all pairs
is somewhat apparent, as is the relative univariate normality of each of the eleven.
Note that the image can be enlarged if the reader is using a pdf version of this document simply by using
the increase/decrease size capability of pdf readers. If the user is reading an html version of this document,
then try to do a right mouse click on the image and “view image” (in Windows). Then the image can be
increased in size in a browser.
2.4. COVARIANCES AND ZERO ORDER CORRELATIONS 13
info comp arith simil vocab digit pictcomp parang block object coding
info 8.481 4.034 3.322 4.758 5.338 2.720 1.965 1.561 1.808 1.531 0.059
comp 4.034 8.793 2.684 4.816 4.621 1.891 3.540 1.471 2.966 2.718 0.517
arith 3.322 2.684 5.322 2.713 2.621 1.678 1.052 1.391 1.701 0.282 0.598
simil 4.758 4.816 2.713 10.136 5.022 2.234 3.450 2.524 2.255 2.433 -0.372
vocab 5.338 4.621 2.621 5.022 8.601 2.334 2.456 1.031 2.364 1.546 0.842
digit 2.720 1.891 1.678 2.234 2.334 7.313 0.597 1.066 0.533 0.267 1.344
pictcomp 1.965 3.540 1.052 3.450 2.456 0.597 8.610 1.941 3.038 3.032 -0.605
parang 1.561 1.471 1.391 2.524 1.031 1.066 1.941 7.074 2.532 1.916 0.289
block 1.808 2.966 1.701 2.255 2.364 0.533 3.038 2.532 7.343 3.077 0.832
object 1.531 2.718 0.282 2.433 1.546 0.267 3.032 1.916 3.077 8.088 0.433
coding 0.059 0.517 0.598 -0.372 0.842 1.344 -0.605 0.289 0.832 0.433 8.249
14 CHAPTER 2. PREPARE AND DESCRIBE THE DATA
We can use the Corrplot package to produce a useful combination of a schematic and correlation matrix.
mat1 <- cor(wisc2)
corrplot(mat1,type="upper",tl.pos="tp")
corrplot(mat1,add=T,type="lower", method="number",
pictcomp
col="black", diag=FALSE,tl.pos="n", cl.pos="n")
parang
coding
object
vocab
comp
block
simil
arith
digit
info
1
info
0.8
comp 0.47
0.6
arith 0.49 0.39
One of the primary tools for SEM in R is the lavaan package. It permits path specification with a simple
syntax.
Fit the model and obtain the basic summary. Note that in this default approach, the latent factors are
permitted to covary and the model estimates this covariance.
One R syntax note….. the format here to call the cfa function (lavaan::cfa(.....)) is employed to ensure
no ambiguity that the correct cfa function is the one from the lavaan package. This precludes confusion
when multiple packages contain functions with the same name as is the case with both lavaan and sem
which is also used in this document. Even though sem is loaded later in this document, if there is a chance
that it may simultaneously exist in the R environment with lavaan then the approach here is safer.
fit1 <- lavaan::cfa(wisc.model1, data=wisc2,std.lv=TRUE)
summary(fit1, fit.measures=T,standardized=T)
15
16 CHAPTER 3. USING THE LAVAAN PACKAGE FOR CFA
##
## Estimator ML
## Model Fit Test Statistic 70.640
## Degrees of freedom 43
## P-value (Chi-square) 0.005
##
## Model test baseline model:
##
## Minimum Function Test Statistic 519.204
## Degrees of freedom 55
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.940
## Tucker-Lewis Index (TLI) 0.924
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4491.822
## Loglikelihood unrestricted model (H1) -4456.502
##
## Number of free parameters 23
## Akaike (AIC) 9029.643
## Bayesian (BIC) 9102.433
## Sample-size adjusted Bayesian (BIC) 9029.600
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.061
## 90 Percent Confidence Interval 0.033 0.085
## P-value RMSEA <= 0.05 0.233
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.059
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal =~
## info 2.206 0.200 11.029 0.000 2.206 0.760
## comp 2.042 0.210 9.709 0.000 2.042 0.691
## arith 1.300 0.172 7.555 0.000 1.300 0.565
## simil 2.232 0.225 9.940 0.000 2.232 0.703
## vocab 2.250 0.200 11.225 0.000 2.250 0.770
## digit 1.053 0.212 4.967 0.000 1.053 0.390
## performance =~
## pictcomp 1.742 0.242 7.187 0.000 1.742 0.595
3.1. IMPLEMENT THE CFA, FIRST MODEL 17
info comp arith simil vocab digit pictcomp parang block object coding
info 0.000 -0.058 0.065 -0.021 0.040 0.049 -0.036 -0.010 -0.076 -0.068 -0.025
comp -0.058 0.000 0.002 0.025 0.000 -0.034 0.165 -0.006 0.091 0.092 0.031
arith 0.065 0.002 0.000 -0.028 -0.047 0.048 -0.043 0.069 0.045 -0.145 0.066
simil -0.021 0.025 -0.028 0.000 -0.003 -0.015 0.123 0.102 -0.021 0.034 -0.071
vocab 0.040 0.000 -0.047 -0.003 0.000 -0.006 0.016 -0.082 -0.012 -0.071 0.067
digit 0.049 -0.034 0.048 -0.015 -0.006 0.000 -0.062 0.040 -0.084 -0.095 0.156
pictcomp -0.036 0.165 -0.043 0.123 0.016 -0.062 0.000 -0.033 -0.025 0.026 -0.115
parang -0.010 -0.006 0.069 0.102 -0.082 0.040 -0.033 0.000 0.028 -0.014 0.004
block -0.076 0.091 0.045 -0.021 -0.012 -0.084 -0.025 0.028 0.000 0.013 0.058
object -0.068 0.092 -0.145 0.034 -0.071 -0.095 0.026 -0.014 0.013 0.000 0.012
coding -0.025 0.031 0.066 -0.071 0.067 0.156 -0.115 0.004 0.058 0.012 0.000
norm plot
# extract the residuals from the fit1 model
# get rid of the duplicates and diagonal values
# create a vector for a
res1 <- residuals(fit1, type = "cor")$cov
res1[upper.tri(res1,diag=T)] <- NA
v1 <- as.vector(res1)
v2 <- v1[!is.na(v1)]
qqPlot(v2,id=F)
3.1. IMPLEMENT THE CFA, FIRST MODEL 19
0.15
0.05
v2
−0.05
−0.15
−2 −1 0 1 2
norm quantiles
diagram produced here takes control over font/label sizes, display of residuals, and color of paths/coefficients.
These and many more control options are available. There is a challenge in producing these path diagrams
to have font sizes large enough for most humans to read. I’ve taken control of the font sizes for the “edges”
with a cex argument. But this causes overlap in the values if the default layout is used. I found that “circle2”
worked best here.
# Note that the base plot, including standardized path coefficients plots positive coefficients green
# and negative coefficients red. Red-green colorblindness issues anyone?
# I redrew it here to choose a blue and red. But all the coefficients in this example are
# positive,so they are shown with the skyblue.
# more challenging to use colors other than red and green. not in this doc
semPaths(fit1, residuals=F,sizeMan=7,"std",
posCol=c("skyblue4", "red"),
#edge.color="skyblue4",
edge.label.cex=1.2,layout="circle2")
cdn inf
obj cmp
0.07 0.76
0.57 0.69
blc art
0.68 0.57
prf 0.59 vrb
0.47 0.70
pct vcb
dgt
1.00 1.00
0.59
vrb prf
0.76 0.69 0.57 0.70 0.77 0.39 0.60 0.47 0.68 0.57 0.07
inf cmp art sml vcb dgt pct prn blc obj cdn
0.42 0.52 0.68 0.51 0.41 0.85 0.65 0.78 0.53 0.68 0.99
3.2.1 Add a path (Perf to comp) and Fit the second CFA model
Define the addditional path in the model text string.
wisc.model2 <- 'verbal =~ info + comp + arith + simil + vocab + digit
performance =~ pictcomp + parang + block + object + coding + comp'
## verbal =~
## info 2.256 0.199 11.318 0.000 2.256 0.777
## comp 1.491 0.254 5.877 0.000 1.491 0.504
## arith 1.307 0.172 7.584 0.000 1.307 0.568
## simil 2.205 0.226 9.748 0.000 2.205 0.695
## vocab 2.273 0.201 11.329 0.000 2.273 0.777
## digit 1.075 0.212 5.068 0.000 1.075 0.399
## performance =~
## pictcomp 1.790 0.239 7.495 0.000 1.790 0.612
## parang 1.189 0.224 5.317 0.000 1.189 0.448
## block 1.823 0.219 8.334 0.000 1.823 0.675
## object 1.633 0.233 7.010 0.000 1.633 0.576
## coding 0.200 0.253 0.793 0.428 0.200 0.070
## comp 0.884 0.266 3.324 0.001 0.884 0.299
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal ~~
## performance 0.533 0.081 6.594 0.000 0.533 0.533
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .info 3.343 0.502 6.665 0.000 3.343 0.396
## .comp 4.331 0.554 7.819 0.000 4.331 0.495
## .arith 3.584 0.420 8.533 0.000 3.584 0.677
## .simil 5.217 0.675 7.726 0.000 5.217 0.518
## .vocab 3.383 0.508 6.655 0.000 3.383 0.396
## .digit 6.116 0.677 9.032 0.000 6.116 0.841
## .pictcomp 5.358 0.741 7.231 0.000 5.358 0.626
## .parang 5.620 0.663 8.480 0.000 5.620 0.799
## .block 3.979 0.623 6.385 0.000 3.979 0.545
## .object 5.376 0.707 7.603 0.000 5.376 0.668
## .coding 8.162 0.874 9.337 0.000 8.162 0.995
## verbal 1.000 1.000 1.000
## performance 1.000 1.000 1.000
It is simple to obtain the full parameter list, but I prefer to use kable for tables when I can.
parameterEstimates(fit2,standardized=TRUE)
info comp arith simil vocab digit pictcomp parang block object coding
info 0.000 -0.049 0.053 -0.026 0.021 0.036 -0.023 0.016 -0.050 -0.054 -0.022
comp -0.049 0.000 0.015 0.049 0.015 -0.029 0.059 -0.068 -0.014 -0.005 0.021
arith 0.053 0.015 0.000 -0.025 -0.054 0.043 -0.030 0.091 0.068 -0.131 0.069
simil -0.026 0.049 -0.025 0.000 -0.002 -0.017 0.143 0.132 0.012 0.056 -0.067
vocab 0.021 0.015 -0.054 -0.002 0.000 -0.016 0.032 -0.054 0.018 -0.053 0.071
digit 0.036 -0.029 0.043 -0.017 -0.016 0.000 -0.055 0.053 -0.071 -0.088 0.158
pictcomp -0.023 0.059 -0.030 0.143 0.032 -0.055 0.000 -0.025 -0.031 0.011 -0.115
parang 0.016 -0.068 0.091 0.132 -0.054 0.053 -0.025 0.000 0.049 -0.005 0.006
block -0.050 -0.014 0.068 0.012 0.018 -0.071 -0.031 0.049 0.000 0.011 0.060
object -0.054 -0.005 -0.131 0.056 -0.053 -0.088 0.011 -0.005 0.011 0.000 0.013
coding -0.022 0.021 0.069 -0.067 0.071 0.158 -0.115 0.006 0.060 0.013 0.000
cdn inf
obj art
0.07 0.78
0.58 0.57
blc sml
0.67 0.69
prf 0.53 vrb
0.45 0.78
pct dgt
cmp
26 CHAPTER 3. USING THE LAVAAN PACKAGE FOR CFA
1.00 1.00
0.53
vrb prf
0.78 0.57 0.69 0.78 0.40 0.50 0.30 0.61 0.45 0.67 0.58 0.07
inf art sml vcb dgt cmp pct prn blc obj cdn
0.40 0.68 0.52 0.40 0.84 0.50 0.63 0.80 0.54 0.67 1.00
assumption. Different “dialects” exist for the various software products. The commercial products may use
algorithms that are proprietary and not available to understand their approach. The open source products
(e.g., lavaan) have made source code available for inspection.
The perceptive reader will notice that the default solutions given by the three R packages employed in this
document (lavaan, sem, OpenMx) all give the same values for parameter estimates and goodnes of fit
statistics for the same two models run with each. However, there are slight differences in these quantities
compared to the published LISREL analysis of the same data in the Tabachnik, et al textbook ((2019)).
Other readers will have noticed that the LISREL output in this textbook matches SAS output that they
may have seen with class coverage of the CFA topic. The R packages, while agreeing with each other, vary
slightly from the LISREL and SAS values. The degree of difference is slight, but its existence may puzzle
some students. The answer to understanding these differences goes well beyond the scope of this document
and involves an advanced understanding of the computational algorithms employed. Even though all of the
approaches are Maximum Liklihood methods some optimization and estimation strategies can differ.
The lavaan package permits some insight into this with one of the argments available for the cfa function
used in this chapter. The reader might want to examine the help docs for this function: ?cfa. That help
page directs readers to another help document on Lavaan Options (lavoptions). There, one can find an
argument that can be passed to cfa, called “mimic”. Here is the text from that section, describing the
various “mimic” possibilities:
“If”Mplus“, an attempt is made to mimic the Mplus program. If”EQS“, an attempt is made to mimic the
EQS program. If”default“, the value is (currently) set to to”lavaan“, which is very close to”Mplus“.”
The reader may have been exposed to a SAS Proc Calis approach to this problem that employed the default
Method called LINEQS. The following rerun of the first model from this chapter above, employs the mimic
argument to be specified as “EQS”. The product of this model is a set of parameter values and fit statistics
(e.g., Chi Sq) that match the SAS output (and the Tabachnick et al LISREL output) exactly. Demonstrating
the equivalence with the addition of the mimic argument does not fully explain why such differences originally
existed, but that, again, is well beyond the scope of this document. The reference section to this document
includes some articles that address these differences (El-Sheikh et al., 2017; Narayanan, 2012). Rosseel
(2012) has discussed use of the ‘mimic’ function that guides lavaan to emulate the approach of some of the
commercial products. His website is also a valuable resource on this [http://lavaan.ugent.be/].
wisc.model1 <- "verbal =~ info + comp + arith + simil + vocab + digit
performance =~ pictcomp + parang + block + object + coding"
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.941
## Tucker-Lewis Index (TLI) 0.924
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4497.337
## Loglikelihood unrestricted model (H1) -4462.018
##
## Number of free parameters 23
## Akaike (AIC) 9040.675
## Bayesian (BIC) 9113.465
## Sample-size adjusted Bayesian (BIC) 9040.631
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.060
## 90 Percent Confidence Interval 0.033 0.085
## P-value RMSEA <= 0.05 0.239
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.059
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal =~
## info 2.212 0.201 10.997 0.000 2.212 0.760
## comp 2.048 0.212 9.682 0.000 2.048 0.691
## arith 1.304 0.173 7.534 0.000 1.304 0.565
## simil 2.238 0.226 9.911 0.000 2.238 0.703
## vocab 2.257 0.202 11.193 0.000 2.257 0.770
## digit 1.056 0.213 4.952 0.000 1.056 0.390
## performance =~
## pictcomp 1.747 0.244 7.166 0.000 1.747 0.595
## parang 1.257 0.226 5.566 0.000 1.257 0.473
## block 1.851 0.223 8.287 0.000 1.851 0.683
## object 1.609 0.237 6.780 0.000 1.609 0.566
## coding 0.208 0.256 0.811 0.417 0.208 0.072
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal ~~
## performance 0.589 0.076 7.792 0.000 0.589 0.589
##
## Variances:
3.4. AN ADDITIONAL PERSPECTIVE ON ESTIMATION AND OPTIMIZATION 29
In this chapter, we use the sem package to implement the same two CFA analyses that we produced with
lavaan in chapter 3. sem provides an equally simple way to obtain the models and only the basics are
shown here. The code in this chapter is modeled after a document by James Steiger
30
4.1. EXAMPLE ONE 31
• The unique name specified by the parameter symbol is for free parameters.
• If the parameter value is NA, then its starting point value is system determined.
• A numerical value following the NA can fix a variance at the value. E.g.,: F1 <-> F1, NA, 1 would
fix the factor variance a 1.
• Unique variances can be specified for manifest variables. e.g.,: manifest1 <-> manifest1, e1, NA. If
they are not specified, they default to “free to vary”
• Factors can be set to a fixed relationship to each other, e.g, 0, or 1. Or they can be left free (estimable)
as in the example here.
# this text could have been saved in a file and read in with the file argument to be efficient
# commented here to show the argument options
# m1.model <- specifyModel(file="sem1.txt")
m1.model <- specifyModel(text="
## Factor 1 is Verbal
Verbal -> info, t01, NA
Verbal -> comp, t02, NA
Verbal -> arith, t03, NA
Verbal -> simil, t04, NA
Verbal -> vocab, t05, NA
Verbal -> digit, t06, NA
## Factor 2 is performance
Performance -> pictcomp, t07, NA
Performance -> parang, t08, NA
Performance -> block, t09, NA
Performance -> object, t10, NA
Performance -> coding, t11, NA
## Set factor variances
Verbal <-> Verbal, NA, 1
Performance <-> Performance, NA, 1
## Set factor covariance to be estimable
Verbal <-> Performance, p1, NA"
)
This m1.model is now available to be used in the model function. The first “note” refers to the fact that the
model can be specified interactivly (I think). I found it easier to do this way, and more reproducible.
32 CHAPTER 4. USING THE SEM PACKAGE FOR CFA
##
## Model Chisquare = 70.236 Df = 43 Pr(>Chisq) = 0.0054454
## AIC = 116.24
## BIC = -151.85
##
## Normalized Residuals
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.88413 -0.41375 -0.00001 0.03511 0.45976 2.11208
##
## R-square for Endogenous Variables
## info comp arith simil vocab digit pictcomp parang
## 0.5772 0.4770 0.3193 0.4944 0.5922 0.1524 0.3545 0.2233
## block object coding
## 0.4667 0.3202 0.0052
##
## Parameter Estimates
## Estimate Std Error z value Pr(>|z|)
## t01 2.21249 0.201190 10.99700 3.9507e-28
## t02 2.04806 0.211541 9.68163 3.6090e-22
## t03 1.30357 0.173035 7.53358 4.9367e-14
## t04 2.23845 0.225846 9.91142 3.7135e-23
## t05 2.25691 0.201642 11.19269 4.4281e-29
## t06 1.05579 0.213185 4.95244 7.3287e-07
## t07 1.74699 0.243776 7.16638 7.7008e-13
## t08 1.25683 0.225785 5.56647 2.5995e-08
## t09 1.85123 0.223392 8.28693 1.1621e-16
## t10 1.60918 0.237324 6.78052 1.1974e-11
## t11 0.20761 0.255890 0.81132 4.1718e-01
## p1 0.58883 0.075569 7.79198 6.5965e-15
## V[info] 3.58620 0.511281 7.01414 2.3137e-12
## V[comp] 4.59856 0.590106 7.79277 6.5556e-15
## V[arith] 3.62254 0.423850 8.54675 1.2660e-17
## V[simil] 5.12484 0.667296 7.68001 1.5908e-14
## V[vocab] 3.50720 0.510787 6.86626 6.5906e-12
## V[digit] 6.19783 0.686345 9.03020 1.7136e-19
## V[pictcomp] 5.55770 0.763882 7.27560 3.4488e-13
## V[parang] 5.49428 0.663998 8.27454 1.2895e-16
## V[block] 3.91612 0.645638 6.06550 1.3154e-09
## V[object] 5.49872 0.725619 7.57798 3.5099e-14
## V[coding] 8.20596 0.881552 9.30853 1.2961e-20
##
## t01 info <--- Verbal
## t02 comp <--- Verbal
## t03 arith <--- Verbal
4.1. EXAMPLE ONE 33
The table listing of parameter estimates uses the labeling strategy that was defined in the modelSpecify
function. The names, such as theta01 are arbitrary.
34 CHAPTER 4. USING THE SEM PACKAGE FOR CFA
The semPaths function from semPlot is capable of recognizing a model object from sem. This code is
identical to what was used in chapter 2 for the lavaan object. The fit object name is just changed to m1
here.
# Note that the base plot, including standardized path coefficients plots positive coefficients green
# and negative coefficients red. Red-green colorblindness issues anyone?
# I redrew it here to choose a blue and red. But all the coefficients in this example are
# positive,so they are shown with the skyblue.
# more challenging to use colors other than red and green. not in this doc
semPaths(m1, residuals=F,sizeMan=7,"std",
posCol=c("skyblue4", "red"),
#edge.color="skyblue4",
edge.label.cex=1.2,layout="circle2")
cdn inf
obj cmp
0.07 0.76
0.57 0.69
blc art
0.68 0.57
Prf 0.59 Vrb
0.47 0.70
pct vcb
dgt
1.00 1.00
0.59
Vrb Prf
0.76 0.69 0.57 0.70 0.77 0.39 0.60 0.47 0.68 0.57 0.07
inf cmp art sml vcb dgt pct prn blc obj cdn
0.42 0.52 0.68 0.51 0.41 0.85 0.65 0.78 0.53 0.68 0.99
36 CHAPTER 4. USING THE SEM PACKAGE FOR CFA
##
## Model Chisquare = 60.295 Df = 42 Pr(>Chisq) = 0.033354
## AIC = 108.3
## BIC = -156.63
##
## Normalized Residuals
## Min. 1st Qu. Median Mean 3rd Qu. Max.
4.2. EXAMPLE TWO 37
The table listing of parameter estimates uses the labeling strategy that was defined in the modelSpecify
function. The names, such as theta01 are arbitrary.
4.2. EXAMPLE TWO 39
The semPaths function from semPlot is capable of recognizing a model object from sem. This code is
identical to what was used in chapter 2 for the lavaan object. The fit object name is just changed to m2
here.
# Note that the base plot, including standardized path coefficients plots positive coefficients green
# and negative coefficients red. Red-green colorblindness issues anyone?
# I redrew it here to choose a blue and red. But all the coefficients in this example are
# positive,so they are shown with the skyblue.
# more challenging to use colors other than red and green. not in this doc
semPaths(m2, residuals=F,sizeMan=7,"std",
posCol=c("skyblue4", "red"),
#edge.color="skyblue4",
edge.label.cex=1.2,layout="circle2")
cdn inf
obj art
0.07 0.78
0.58 0.57
blc sml
0.67 0.69
Prf 0.53 Vrb
0.45 0.78
pct dgt
cmp
1.00 1.00
0.53
Vrb Prf
0.78 0.57 0.69 0.78 0.40 0.50 0.30 0.61 0.45 0.67 0.58 0.07
inf art sml vcb dgt cmp pct prn blc obj cdn
0.40 0.68 0.52 0.40 0.84 0.50 0.63 0.80 0.54 0.67 1.00
The OpenMX package in R is a port of the well-respected MX analytical software. It handles SEM and
can easily be used for CFA. Here, the same two models that were run in lavaan will be run again, but an
additional model will be run first.
library(OpenMx)
41
42 CHAPTER 5. USING THE OPENMX PACKAGE FOR CFA
## NULL
names(cfa2a@output)
Now fit a model that is the same as the initial model fit with lavaan in chapter 3. Two factors, verbal and
performance are established with each manifest variable uniquely specified by only one of the two latent
factors.
## NULL
names(cfa2b@output)
The semPlot package is very powerful and can recognize many lm and sem model objects. We can use the
identical code that we used in chapter 2 for the lavaan model.
# Note that the base plot, including standardized path coefficients plots positive coefficients green
# and negative coefficients red. Red-green colorblindness issues anyone?
# I redrew it here to choose a blue and red. But all the coefficients in this example are
# positive,so they are shown with the skyblue.
# more challenging to use colors other than red and green. not in this doc
semPaths(cfa2b, residuals=F,sizeMan=7,"std",
posCol=c("skyblue4", "red"),
#edge.color="skyblue4",
edge.label.cex=1.2,layout="circle2")
cdn inf
obj cmp
0.07 0.76
0.57 0.69
blc art
0.68 0.57
prf 0.59 vrb
0.47 0.70
pct vcb
dgt
1.00 1.00
0.59
vrb prf
0.76 0.69 0.57 0.70 0.77 0.39 0.60 0.47 0.68 0.57 0.07
inf cmp art sml vcb dgt pct prn blc obj cdn
0.42 0.52 0.68 0.51 0.41 0.85 0.65 0.78 0.53 0.68 0.99
5.3. THIRD OPENMX MODEL 49
##
## free parameters:
## name matrix row col Estimate
## 1 TwoFac, Comp Common.A[1,12] A info verbal 2.25606
## 2 TwoFac, Comp Common.A[2,12] A comp verbal 1.49130
## 3 TwoFac, Comp Common.A[3,12] A arith verbal 1.30678
## 4 TwoFac, Comp Common.A[4,12] A simil verbal 2.20475
## 5 TwoFac, Comp Common.A[5,12] A vocab verbal 2.27348
## 6 TwoFac, Comp Common.A[6,12] A digit verbal 1.07476
## 7 TwoFac, Comp Common.A[2,13] A comp perf 0.88414
## 8 TwoFac, Comp Common.A[7,13] A pictcomp perf 1.78962
## 9 TwoFac, Comp Common.A[8,13] A parang perf 1.18881
## 10 TwoFac, Comp Common.A[9,13] A block perf 1.82276
## 11 TwoFac, Comp Common.A[10,13] A object perf 1.63283
## 12 TwoFac, Comp Common.A[11,13] A coding perf 0.20047
## 13 e1 S info info 3.34302
## 14 e2 S comp comp 4.33109
## 15 e3 S arith arith 3.58375
## 16 e4 S simil simil 5.21667
## 17 e5 S vocab vocab 3.38297
## 18 e6 S digit digit 6.11561
## 19 e7 S pictcomp pictcomp 5.35770
## 20 e8 S parang parang 5.62019
## 21 e9 S block block 3.97877
## 22 e10 S object object 5.37584
## 23 e11 S coding coding 8.16174
## 24 latcov1 S verbal perf 0.53322
## Std.Error A
## 1 0.200315
## 2 0.256354
## 3 0.173048
## 4 0.227320
## 5 0.200572
## 6 0.212373
## 7 0.270140
## 8 0.242338
## 9 0.225100
## 10 0.221685
## 11 0.233235
## 12 0.255009
## 13 0.509394
## 14 0.557177
## 15 0.422007
## 16 0.682585
## 17 0.507400
## 18 0.677590
## 19 0.755634
## 20 0.665637
## 21 0.636905
## 22 0.708170
## 23 0.874112
## 24 0.082033
##
## Model Statistics:
5.3. THIRD OPENMX MODEL 51
## NULL
names(cfa2c@output)
The semPlot package is very powerful and can recognize many ‘lm’ and SEM model objects. We can use
the identical code that we used in chapter 2 for the lavaan model.
# Note that the base plot, including standardized path coefficients plots positive coefficients green
# and negative coefficients red. Red-green colorblindness issues anyone?
# I redrew it here to choose a blue and red. But all the coefficients in this example are
# positive,so they are shown with the skyblue.
# more challenging to use colors other than red and green. not in this doc
semPaths(cfa2c, residuals=F,sizeMan=7,"std",
posCol=c("skyblue4", "red"),
#edge.color="skyblue4",
edge.label.cex=1.2,layout="circle2")
cdn inf
obj art
0.07 0.78
0.58 0.57
blc sml
0.67 0.69
prf 0.53 vrb
0.45 0.78
pct dgt
cmp
1.00 1.00
0.53
vrb prf
0.78 0.57 0.69 0.78 0.40 0.50 0.30 0.61 0.45 0.67 0.58 0.07
inf art sml vcb dgt cmp pct prn blc obj cdn
0.40 0.68 0.52 0.40 0.84 0.50 0.63 0.80 0.54 0.67 1.00
54
55
Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W.,
and Iannone, R. (2018). rmarkdown: Dynamic Documents for R. R package version 1.11.
Bache, S. M. and Wickham, H. (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5.
Boker, S. M., Neale, M. C., Maes, H. H., Spiegel, M., Brick, T. R., Estabrook, R., Bates, T. C., Gore, R. J.,
Hunter, M. D., Pritikin, J. N., Zahery, M., and Kirkpatrick, R. M. (2019). OpenMx: Extended Structural
Equation Modelling. R package version 2.12.1.
Brown, T. A. (2015). Confirmatory factor analysis for applied research. Methodology in the social sciences.
The Guilford Press, New York ; London, second edition. edition.
El-Sheikh, A. A., Abonazel, M. R., and Gamil, N. (2017). A review of software packages for structural
equation modeling: A comparative study. Applied Mathematics and Physics, 5(3):85–94.
Epskamp, S. and with contributions from Simon Stuber (2017). semPlot: Path Diagrams and Visual Analysis
of Various SEM Packages’ Output. R package version 1.1.
Fox, J., Nie, Z., and Byrnes, J. (2017). sem: Structural Equation Models. R package version 3.1-9.
Fox, J., Weisberg, S., and Price, B. (2018). car: Companion to Applied Regression. R package version 3.0-2.
Korkmaz, S., Goksuluk, D., and Zararsiz, G. (2018). MVN: Multivariate Normality Tests. R package version
5.5.
Narayanan, A. (2012). A review of eight software packages for structural equation modeling. The American
Statistician, 66(2):129–138.
R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria.
Revelle, W. (2019). psych: Procedures for Psychological, Psychometric, and Personality Research. R package
version 1.8.12.
Rosseel, Y. (2012). Lavaan: An r package for structural equation modeling and more. version 0.5–12 (beta).
Journal of statistical software, 48(2):1–36.
RStudio Team (2015). RStudio: Integrated Development Environment for R. RStudio, Inc., Boston, MA.
Tabachnick, B. G., Fidell, L. S., and Ullman, J. B. (2019). Using multivariate statistics. Pearson, Boston,
seventh edition. edition.
Wei, T. and Simko, V. (2017). corrplot: Visualization of a Correlation Matrix. R package version 0.84.
Wickham, H., François, R., Henry, L., and Müller, K. (2018). dplyr: A Grammar of Data Manipulation. R
package version 0.7.8.
56
BIBLIOGRAPHY 57
Wickham, H. and Henry, L. (2018). tidyr: Easily Tidy Data with ’spread()’ and ’gather()’ Functions. R
package version 0.8.2.
Xie, Y. (2015). Dynamic Documents with R and knitr. Chapman and Hall/CRC, Boca Raton, Florida, 2nd
edition. ISBN 978-1498716963.
Xie, Y. (2018a). bookdown: Authoring Books and Technical Documents with R Markdown. R package version
0.9.
Xie, Y. (2018b). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version
1.21.
Zhu, H. (2019). kableExtra: Construct Complex Table with ’kable’ and Pipe Syntax. R package version 1.0.1.