Coco Data Set
Coco Data Set
William Revelle
Department of Psychology
Northwestern University
Contents
1 Overview of this and related documents 2
1.1 omegah as an estimate of the general factor saturation of a test . . . . . . . 2
1.1.1 But what about α . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Install R for the first time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Install R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Install relevant packages . . . . . . . . . . . . . . . . . . . . . . . . . 6
1
1 Overview of this and related documents
To do basic and advanced personality and psychological research using R is not as compli-
cated as some think. This is one of a set of “How To” to do various things using R (R Core
Team, 2017), particularly using the psych (Revelle, 2017) package.
The current list of How To’s includes:
1. An introduction (vignette) of the psych package
2. An overview (vignette) of the psych package
3. Installing R and some useful packages
4. Using R and the psych package to find omegah and ωt (this document)..
5. Using R and the psych for factor analysis and principal components analysis.
6. Using the scoreItems function to find scale scores and scale statistics.
7. Using mediate and setCor to do mediation, moderation and regression analysis
Cronbach’s coefficient al pha (Cronbach, 1951) is pehaps the most used (and most misused)
estimate of the internal consistency of a test. α may be found in the psych package using
the alpha function. However, two alternative estimates of reliability that take into account
the hierarchical structure of the inventory are McDonald’s ωh and ωt (McDonald, 1999).
These may be found in R in one step using one of two functions in the psych package: the
omega function for an exploratory analysis (See Figure 1) or omegaSem for a confirmatory
analysis using the sem package solution based upon the exploratory solution from omega.
This guide explains how to do it for the non or novice R user. These set of instructions
are adapted from three different sets of notes that the interested reader might find helpful:
A set of slides developed for a two hour short course in R given for several years to the
Association of Psychological Science as well as a short guide to R for psychologists and the
vignette for the psych package.
McDonald has proposed coefficient omega (hierarchical) (ωh ) as an estimate of the general
factor saturation of a test. Zinbarg et al. (2005) and Revelle and Zinbarg (2009) compare
compare McDonald’s ωh to Cronbach’s α and Revelle’s β . They conclude that ωh is the
best estimate. (See also Zinbarg et al. (2006) and Revelle and Zinbarg (2009) .
By following these simple guides, you soon will be able to do such things as find ωh by
issuing just three lines of code:
2
R code
library(psych)
my.data <- read.clipboard()
omega(my.data)
To just find Guttman’s λ3 (Guttman, 1945) which is also known as coefficient α (Cronbach,
1951), you can use the alpha function or the scoreItems function. See the tutorial on
how to use the scoreItems function to find scale scores and scale statistics.
3
Omega
Sentences
0.6
Vocabulary
0.6
F1*
0.5
0.7 Sent.Completion
0.7
0.7 First.Letters
0.6 0.6
g 0.6 4.Letter.Words 0.5 F2*
0.6 0.4
0.2
0.6 Suffixes
0.5
0.6 Letter.Series
0.6
F3*
0.5
Letter.Group
0.3
Pedigrees
4
To use R obviously requires installing R on your computer. This is very easy to do (see
section 1.2.1) and needs to be done once. (The following sections are taken elaborated
in the “getting started” How To. If you need more help in installing R see the longer
version.)
The power of R is in the supplemental packages. There are at least 8,300 packages that
have been contributed to the R project. To do any of the analyses discussed in these “How
To’s”, you will need to install the package psych (Revelle, 2017). To do factor analyses or
principal component analyses you will also need the GPArotation (Bernaards and Jennrich,
2005) package. With these two packages, you will be be able to find ωh using Exploratory
Factor Analysis. If you want to find to estimate ωh using Confirmatory Factor Analysis,
you will also need to add the sem (Fox et al., 2013) package. To use psych to create
simulated data sets, you also need the mnormt (Azzalini and Genz, 2016) package. For
a more complete installation of a number of psychometric packages, you can install and
activate a package (ctv ) that installs a large set of psychometrically relevant packages. As
is true for R, you will need to install packages just once.
install.packages(c("psych","lavaan"), dependencies=TRUE)
5
5. Take a 5 minute break while the packages are loaded.
6. Activate the package(s) you want to use (e.g., psych)
R code
psych will automatically activate the other packages it needs, as long as they are
installed. Note that psych is updated roughly quarterly, the current version is 1.7.12
7. Use R
1.2.1 Install R
Once R is installed on your machine, you still need to install a few relevant “packages”.
Packages are what make R so powerful, for they are special sets of functions that are de-
signed for one particular application. In the case of the psych package, this is an application
for doing the kind of basic data analysis and psychometric analysis that psychologists and
many others find particularly useful.
You may either install the minimum set of packages necessary to do the analysis using an
Exploratory Factor Analysis (EFA) approach (recommended) or a few more packages to do
both an EFA and a CFA approach. It is also possible to add many psychometrically relevant
packages all at once by using the “task views” approach. A particularly powerful package is
the lavaan (Rosseel, 2012) package for doing structural equation modeling. Another useful
one is the sem pacakge (Fox et al., 2013).
Install the minimum set This may be done by typing into the console or using menu
options (e.g., the Package Installer underneath the Packages and Data menu).
6
R code
Install a few more packages If you want some more functionality for some of the
more advanced statistical procedures (e.g., omegaSem) you will need to install a few more
packages (e.g., sem.
R code
install.packages(c("psych","GPArotation","sem"),dependencies=TRUE)
Install a “task view” to get lots of packages If you know that there are a number of
packages that you want to use, it is possible they are listed as a “task view”. For instance,
about 50 packages will be installed at once if you install the “psychometrics” task view.
You can Install all the psychometric packages from the “psychometrics” task view by first
installing a package (“ctv”) that in turn installs many different task views. To see the list
of possible task views, go to https://cran.r-project.org/web/views/.
R code
Make the psych package active. You are almost ready. But first, to use most of the
following examples you need to make the psych package active. You only need to do this
once per session.
R code
library(psych)
There are of course many ways to enter data into R. Reading from a local file using
read.table is perhaps the most preferred. You first need to find the file and then read it.
This can be done with the file.choose and read.table functions:
7
R code
file.choose opens a search window on your system just like any open file command does.
It doesn’t actually read the file, it just finds the file. The read.table command is also
necessary. It assumes that the first row of your table has labels for each column. If this is
not true, specify names=FALSE, e.g.,
R code
2.2 Copy the data from another program using the copy and paste com-
mands of your operating system
However, many users will enter their data in a text editor or spreadsheet program and
then want to copy and paste into R. This may be done by using read.table and spec-
ifying the input file as “clipboard” (PCs) or “pipe(pbpaste)” (Macs). Alternatively, the
read.clipboard set of functions are perhaps more user friendly:
read.clipboard is the base function for reading data from the clipboard.
read.clipboard.csv for reading text that is comma delimited.
read.clipboard.tab for reading text that is tab delimited (e.g., copied directly from an
Excel file).
read.clipboard.lower for reading input of a lower triangular matrix with or without a
diagonal. The resulting object is a square matrix.
read.clipboard.upper for reading input of an upper triangular matrix.
read.clipboard.fwf for reading in fixed width fields (some very old data sets)
For example, given a data set copied to the clipboard from a spreadsheet, just enter the
command
R code
my.data <- read.clipboard()
This will work if every data field has a value and even missing data are given some values
(e.g., NA or -999). If the data were entered in a spreadsheet and the missing values
were just empty cells, then the data should be read in as a tab delimited or by using the
read.clipboard.tab function.
8
R code
my.data <- read.clipboard(sep="\t") #define the tab option, or
my.tab.data <- read.clipboard.tab() #just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format),
copy to the clipboard and then specify the width of each field (in the example below, the
first variable is 5 columns, the second is 2 columns, the next 5 are 1 column the last 4 are
3 columns).
R code
my.data <- read.clipboard.fwf(widths=c(5,2,rep(1,5),rep(3,4))
To read data from an SPSS, SAS, or Systat file, you can probably just use the read.file
function. However, if that does not work, use the foreign package. This should come with
Base R need to be loaded using the library command.
read.spss reads a file stored by the SPSS save or export commands.
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE,
max.value.labels = Inf, trim.factor.names = FALSE,
trim_values = TRUE, reencode = NA, use.missings = to.data.frame)
The read.spss function has many parameters that need to be set. In the example, I have
used the parameters that I think are most useful.
file Character string: the name of the file or URL to read.
use.value.labels Convert variables with value labels into R factors with those levels?
to.data.frame return a data frame? Defaults to FALSE, probably should be TRUE in
most cases.
max.value.labels Only variables with value labels and at most this many unique values
will be converted to factors if use.value.labels = T RUE.
trim.factor.names Logical: trim trailing spaces from factor levels?
trim values logical: should values and value labels have trailing spaces ignored when
matching for use.value.labels = T RUE?
use.missings logical: should information on user-defined missing values be used to set
the corresponding values to NA?
9
The following is an example of reading from a remote SPSS file and then describing the
data set to make sure that it looks ok (with thanks to Eli Finkel).
R code
datafilename <- "http://personality-project.org/r/datasets/finkel.sav"
eli <-read.file(datafilename)
describe(eli,skew=FALSE)
Although you probably want to jump right in and find ω, you should first make sure
that your data are reasonable. Use the describe function to get some basic descriptive
statistics. This next example takes advantage of a built in data set.
my.data <- sat.act #built in example -- replace with your data
describe(my.data)
var n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 1.65 0.48 2 1.68 0.00 1 2 1 -0.61 -1.62 0.02
education 2 700 3.16 1.43 3 3.31 1.48 0 5 5 -0.68 -0.07 0.05
age 3 700 25.59 9.50 22 23.86 5.93 13 65 52 1.64 2.42 0.36
ACT 4 700 28.55 4.82 29 28.84 4.45 3 36 33 -0.66 0.53 0.18
SATV 5 700 612.23 112.90 620 619.45 118.61 200 800 600 -0.64 0.33 4.27
SATQ 6 687 610.22 115.64 620 617.25 118.61 200 800 600 -0.59 -0.02 4.41
There are, of course, all kinds of things you could do with your data at this point, but read
about them in the vignette for the psych package,
Two alternative estimates of reliability that take into account the hierarchical structure of
the inventory are McDonald’s ωh and ωt . These may be found using the omega function
for an exploratory analysis (See Figure 1) or omegaSem for a confirmatory analysis using
the sem based upon the exploratory solution from omega.
10
4.1 Background on the ω statistics
11
that the model is not really well defined. This solution discussed in Zinbarg et al., 2007.
To do this in omega, add the option=“first” or option=“second” to the call.
Although obviously not meaningful for a 1 factor solution, it is of course possible to find
the sum of the loadings on the first (and only) factor, square them, and compare them to
the overall matrix variance. This is done, with appropriate complaints.
In addition to ωh , another of McDonald’s coefficients is ωt . This is an estimate of the total
reliability of a test.
McDonald’s ωt , which is similar to Guttman’s λ6 , (see guttman) uses the estimates of
uniqueness u2 from factor analysis to find e2j . This is based on a decomposition of the
variance of a test score, Vx into four parts: that due to a general factor, g, that due to
a set of group factors, f , (factors common to some but not all of the items), specific
factors, s unique to each item, and e, random error. (Because specific variance can not be
distinguished from random error unless the test is given at least twice, some combine these
both into error).
Letting x = cg + A f + Ds + e then the communality of item j , based upon general as well as
group factors, h2j = c2j + ∑ fi2j and the unique variance for the item u2j = σ 2j (1 − h2j ) may be
used to estimate the test reliability. That is, if h2j is the communality of item j , based upon
general as well as group factors, then for standardized items, e2j = 1 − h2j and
It is important to distinguish here between the two ω coefficients of McDonald, 1978 and
Equation 6.20a of McDonald, 1999, ωt and ωh . While the former is based upon the sum of
squared loadings on all the factors, the latter is based upon the sum of the squared loadings
on the general factor.
1cc0 1
ωh =
Vx
Another estimate reported is the omega for an infinite length test with a structure similar
to the observed test. This is found by
1cc0 1
ωinf =
1cc0 1 + 1AA0 10
It can be shown In the case of simulated variables, that the amount of variance attributable
to a general factor (ωh ) is quite large, and the reliability of the set of items is somewhat
greater than that estimated by α or λ6 .
12
4.2 Yet another alternative: Coefficient β
This is R. Just call it. For the next example, we find ω for a data set from Thurstone. To
find it for your data, replace Thurstone with my.data.
R code
omega(Thurstone)
Omega
Call: omega(m = Thurstone)
Alpha: 0.89
G.6: 0.91
Omega Hierarchical: 0.74
Omega H asymptotic: 0.79
Omega Total 0.93
Compare this with the adequacy of just a general factor and no group factors
The degrees of freedom for just the general factor are 27 and the fit is 1.48
13
Measures of factor score adequacy
g F1* F2* F3*
Correlation of scores with factors 0.86 0.73 0.72 0.75
Multiple R square of scores with factors 0.74 0.54 0.52 0.56
Minimum correlation of factor score estimates 0.49 0.08 0.03 0.11
>
The omegaSem function will do an exploratory analysis and then take the highest loading
items on each factor and do a confirmatory factor analysis using the sem package. These
results can produce slightly different estimates of ωh , primarily because cross loadings are
modeled as part of the general factor.
R code
omegaSem(r9,n.obs=500)
14
Compare this with the adequacy of just a general factor and no group factors
The degrees of freedom for just the general factor are 27 and the fit is 0.21
The number of observations was 500 with Chi Square = 103.64 with prob < 6.4e-11
The root mean square of the residuals is 0.05
The df corrected root mean square of the residuals is 0.08
RMSEA index = 0.076 and the 90 % confidence intervals are 0.06 0.091
BIC = -64.15
15
References
Azzalini, A. and Genz, A. (2016). The R package mnormt: The multivariate normal and t
distributions (version 1.5-5).
Bernaards, C. and Jennrich, R. (2005). Gradient projection algorithms and software for
arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement,
65(5):676–696.
Cooksey, R. and Soutar, G. (2006). Coefficient beta and hierarchical item clustering - an
analytical procedure for establishing and displaying the dimensionality and homogeneity
of summated scales. Organizational Research Methods, 9:78–98.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,
16:297–334.
Fox, J., Nie, Z., and Byrnes, J. (2013). sem: Structural Equation Models. R package version
3.1-3.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4):255–
282.
McDonald, R. P. (1999). Test theory: A unified treatment. L. Erlbaum Associates, Mahwah,
N.J.
R Core Team (2017). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria.
Revelle, W. (2017). psych: Procedures for Personality and Psychological Research. North-
western University, Evanston, https://cran.r-project.org/web/packages=psych. R pack-
age version 1.7.12.
Revelle, W. and Zinbarg, R. E. (2009). Coefficients alpha, beta, omega and the glb:
comments on Sijtsma. Psychometrika, 74(1):145–154.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of
Statistical Software, 48(2):1–36.
Zinbarg, R. E., Revelle, W., Yovel, I., and Li, W. (2005). Cronbach’s α, Revelle’s β , and
McDonald’s ωH : Their relations with each other and two alternative conceptualizations
of reliability. Psychometrika, 70(1):123–133.
Zinbarg, R. E., Yovel, I., Revelle, W., and McDonald, R. P. (2006). Estimating gener-
alizability to a latent variable common to all of a scale’s indicators: A comparison of
estimators for ωh . Applied Psychological Measurement, 30(2):121–144.
16