Flora, David
Flora, David
research-article2020
AMPXXX10.1177/2515245920951747FloraWhich Omega Is Right?
ASSOCIATION FOR
Tutorial PSYCHOLOGICAL SCIENCE
Advances in Methods and
Reliability Estimates
David B. Flora
Department of Psychology, York University
Abstract
Measurement quality has recently been highlighted as an important concern for advancing a cumulative psychological
science. An implication is that researchers should move beyond mechanistically reporting coefficient alpha toward more
carefully assessing the internal structure and reliability of multi-item scales. Yet a researcher may be discouraged upon
discovering that a prominent alternative to alpha, namely, coefficient omega, can be calculated in a variety of ways.
In this Tutorial, I alleviate this potential confusion by describing alternative forms of omega and providing guidelines
for choosing an appropriate omega estimate pertaining to the measurement of a target construct represented with a
confirmatory factor analysis model. Several applied examples demonstrate how to compute different forms of omega in R.
Keywords
alpha, psychometrics, reliability, R, confirmatory factor analysis, assessment, omega, measurement, open data, open
materials
Measurement is an important aspect of the replication estimates calculated from parameter estimates of factor-
crisis facing psychology and related fields (Fried & analytic models specified to represent associations
Flake, 2018; Loken & Gelman, 2017), and it is well between a test’s items and the test’s target construct.
known that measurement error produces biased esti- Thus, being informed of a test’s internal factor structure
mates of the associations among constructs that observed is inherent in choosing an appropriate omega estimate.
variables represent (e.g., Cole & Preacher, 2014). Yet The main purposes of this Tutorial are to clarify distinc-
researchers often present very little reliability and valid- tions among different omega estimates and to demon-
ity evidence for their variables, frequently reporting strate how they can be calculated using routines readily
only coefficient alpha to convey the psychometric qual- available in R (R Core Team, 2018). Throughout, I use
ity of tests (Flake, Pek, & Hehman, 2017). Furthermore, example data to illustrate these reliability estimates.
psychometricians have established that alpha is based
on a highly restricted (and thus unrealistic) psychomet-
Disclosures
ric model and consequently can provide misleading
reliability estimates (e.g., Sijtsma, 2009). The persistent The complete R code and output (as a single .rmd file
popularity of alpha suggests that applied researchers and resulting .pdf file) for the examples presented, data
are not aware of its limitations or alternative reliability
estimates.
Although many reliability estimates have been pre-
sented in the literature, distinguishing among them and Corresponding Author:
David B. Flora, Department of Psychology, 101 Behavioural Sciences
their software implementations can be confusing. In this Building, York University, 4700 Keele St., Toronto, ON M3J 1P3,
Tutorial, I describe the calculation of different forms of Canada
coefficient omega (McDonald, 1999), which are reliability E-mail: [email protected]
Which Omega Is Right? 485
files, and a supplementary document describing additional It is important to recognize that the CTT true score
analyses are available on OSF, at https://osf.io/m94rp/. does not necessarily equate to a construct score (Borsboom,
2005). Thus, a true score may be determined by a con-
struct that a test is designed to measure (the target
What Is Reliability?
construct) as well as by other systematic influences.
Observed scores on any given psychological test or Most often, researchers want to know how reliably a
scale are determined by a combination of systematic test measures the target construct itself, and for this
(signal) and random (noise) influences. Reliability is reason it is important to establish the dimensionality,
defined as a population-based quantification of mea- or internal structure, of a test before estimating reli-
surement precision (e.g., Mellenbergh, 1996) as a func- ability (Savalei & Reise, 2019). Factor analysis is com-
tion of the signal-to-noise ratio. Measurement error, or monly used to investigate and confirm the internal
unreliability, produces biased estimates of effects meant structure of a multi-item test, and as shown throughout
to represent true associations among constructs (Bollen, this Tutorial, parameter estimates of factor-analytic
1989; Cole & Preacher, 2014), and measurement error models lead to reliability estimates representing how
is a culprit in the replication crisis (Loken & Gelman, precisely test scores measure target constructs repre-
2017). Thus, using tests with maximally reliable scores sented by the models’ factors. In this framework, a
and using statistical methods to account for measure- one-factor model is adequate to explain the item-
ment error (e.g., Savalei, 2019) can help psychology response data of a unidimensional test measuring a
progress as a replicable science; calculating and report- single target construct; conversely, poor fit of a one-
ing accurate reliability estimates is integral to this goal. factor model is evidence of multidimensionality. The
Although the reliability concept pertains to any formal definition of reliability from CTT can be adapted
empirical measurement, this Tutorial focuses on com- to this context so that reliability is the proportion of a
posite reliability, that is, the reliability of observed scale score’s variance explained by a target construct
scores calculated as composites (i.e., the sum or mean) (Savalei & Reise, 2019). Therefore, it is crucial to deter-
of individual test components. These individual com- mine how a test represents that construct with respect
ponents are most commonly items within a test or scale. to its internal factor structure. If reliability is estimated
A formal definition of composite reliability based on using the parameters of an incorrect (i.e., misspecified)
classical test theory (CTT; e.g., Lord & Novick, 1968) factor model, then the reliability estimate is likely to be
first posits that an observed score x for individual test biased with respect to the measurement of the target
taker i on item j equals the individual’s true score t for construct. The key idea to this Tutorial is that a reli-
that item plus an error score e: ability coefficient should estimate how well an observed
test score measures a target construct, which does not
xij = tij + eij . necessarily correspond to how well the score captures
replicable variation because some replicable variation
Next, if Xi denotes an individual’s observed total may be irrelevant to the target construct; thus, it is criti-
score, calculated by summing1 the observed item scores cal for the target construct to be accurately represented
(i.e., Xi = ∑ j = 1 xij ), and if Ti denotes the individual’s
J
in a factor model for the test.
total true score, which is the sum of the unobserved Because the reliability estimates presented herein
are calculated from factor-analytic models, familiarity
true scores (Ti = ∑ j = 1 tij ), then the reliability ρX of total-
J
Continuous
Observed Item
Responses
x1 e1
λ1
x2 e2
λ2
x3 e3
λ3
x4 e4
λ4
λ5 x5 e5
f
(Target Construct) λ6
x6 e6
λ7
λ8 x7 e7
λ9 x8 e8
Fig. 1. One-factor model for a unidimensional test consisting of 10 continuously scored items. See
the text for further explanation.
2006). Establishing these item-construct associations is after addressing the unidimensional case, I present
critical for reliability to be meaningfully estimated omega estimates of total-score reliability for multidi-
because choice of an appropriate reliability estimate mensional tests; the online supplement addresses reli-
depends on the interpretation of the final measurement ability assessment for subscale scores.
model chosen for a test (Savalei & Reise, 2019). For
readers unfamiliar with CFA, I present basic principles
Reliability of Unidimensional Tests:
that should enable use of the procedures presented
here, but I strongly encourage such readers to acquire Omega to Alpha
further background knowledge (e.g., see Brown, 2015). Figure 1 shows a path diagram of a one-factor model
This Tutorial focuses on forms of coefficient omega for a hypothetical 10-item test. Path diagrams represent
that estimate how reliably a total score for a test mea- latent variables, or factors, with ovals and represent
sures a single construct that is common to all items in observed variables (i.e., item-response variables in the
the test, even if the test is multidimensional (e.g., a test present context) with rectangles. Linear associations
designed to produce a total score as well as subscale are represented by straight, unidirectional arrows. For
scores). First, I demonstrate reliability estimation for example, because the factor f and an error term e1 are
unidimensional tests represented by one-factor models. the two entities with arrows pointing at item x1 in
Often, a test is designed to measure a single construct, Figure 1, f and e1 are the two linear influences on x 1.
but a one-factor model does not adequately represent The arrow from f to x1 is labeled as λ1, which is a factor-
the test’s internal structure. In this situation, reliability loading parameter giving the strength of the association
estimates based on a one-factor model are likely to be between f and x1. The effects implied by these two
inaccurate, and instead reliability estimates should be arrows in the path diagram combine to form the linear
based on a multidimensional (i.e., multifactor) model. regression equation
In other situations, a test may be explicitly designed to
measure multiple constructs (i.e., a test with subscales),
but a meaningful total score is still of interest. Thus, x1 = λ1 f + e1 ,
Which Omega Is Right? 487
which indicates that item x1 is a dependent variable λj fi ) and the factor variance is fixed to 1, then reliability
regressed on the factor f as a single independent vari- is a function of the factor-loading parameters (e.g.,
able with slope coefficient λ1 and error e1. This one- Jöreskog, 1971; McDonald, 1999). Conceptually, because
factor model consists of an analogous equation for each the factor loading quantifies the strength of the associa-
observed item such that tion between an item and a factor, the extent to which
a set of items (as represented by their total score) reli-
xj = λj f + ej, ably measures the factor is a function of the items’
factor loadings. Therefore, the reliability of the total
score on a unidimensional test can be estimated from
where xj is the jth item regressed on factor f with factor parameter estimates of a one-factor model fitted to the
loading λj (the intercept term is omitted from this and item scores. I refer to this reliability estimate as ωu, and
other equations without loss of generality). This equa- its formula is presented in Table 1: The numerator of
tion in which item scores are influenced by a single ω u represents the amount of total-score variance
factor, but to varying degrees (i.e., λj varies across explained by the common factor f as a function of the
items), is known as the congeneric model in the psy- estimated factor loadings (i.e., the numerator estimates
chometric literature. As shown later, when a model 2
the σT term in the population reliability formula given
consists of more than one factor, this equation expands earlier), and the denominator, σ 2X , is the estimated vari-
to a multiple regression equation with each x j simulta- ance of the observed total score. This formula gives a
neously regressed on multiple factors. form of coefficient omega that is appropriate under the
Because scores on the factor f are unobserved, the strong assumption that the one-factor model is correct,
λj factor-loading parameters cannot be estimated with that is, the set of items is unidimensional, so that ω u
the usual ordinary least squares method for linear represents the proportion of total-score variance that
regression. Instead, the factor loadings (and other is due to the single factor. 3 The subscript u indicates
model parameters) are estimated as a function of a that this variant of omega is based on a unidimensional
covariance (or correlation) matrix among the observed model; different forms of coefficient omega introduced
item scores, typically using a maximum likelihood (ML) later are distinguished by different subscripts.
function. In addition to factor loadings, other model The σ 2X term in the denominator of ωu can be calcu-
parameters include the factor variance and the vari- lated either as the sample variance of the total score X
ances of the individual error terms. Because the factor or as the model-implied variance of X. Generally, this
is unobserved, its scale is arbitrary, and the model can- choice is unlikely to have a large effect on the magni-
not be estimated (i.e., the model is not identified) tude of omega estimates if the estimated model is not
unless a parameter is constrained to define the factor’s badly misspecified (as evidenced by good model fit). 4
scale. In all the examples in this Tutorial, I have set the Specifically, σ 2X can be calculated as the sum of all ele-
scale of each factor by constraining its variance to be ments in the variance-covariance matrix of item scores
equal to 1 (which also serves to simplify equations for (which equals the sample variance of X) or as the sum
omega reliability estimates). 2 A variety of statistics is of all elements in the model-implied covariance matrix
available to evaluate how well a CFA model fits the among the observed items (i.e., the model-implied vari-
item-level data (and thereby evaluate the tenability of ance of X). Representing the total-score variance as a
unidimensionality). In the examples presented, I report function of the entire model-implied covariance matrix
the root mean square error of approximation (RMSEA), incorporates any free covariances among the individual
comparative-fit index (CFI), and Tucker-Lewis index item-level error terms, ej. Thus, when free error covari-
(TLI); smaller values (e.g., .08 or lower) of RMSEA are ances are explicitly specified, the model-implied total-
indicative of better model fit, whereas larger values score variance used to calculate ωu is a function of both
(e.g., .90 or greater) of CFI and TLI indicate better error variances and the free error covariances. Although
model fit. error covariances may represent replicable variation in
observed test scores, these parameters are separate
from variance due to the target construct represented
Coefficient omega by the common factor f, and thus error covariances
Numerous authors have shown that if the equation for contribute to the total observed variance (i.e., the
the CTT true score is reexpressed as the one-factor denominator of ωu) but not to variance due to the factor
model such that an individual’s true score t ij for item j (i.e., the numerator of ωu). The demonstration later in
is presumed to equal the product of the item’s factor this discussion of ωu shows how to obtain an ωu esti-
loading λj and the individual’s factor score f i (i.e., tij = mate in R that accounts for the contribution of error
488 Flora
Table 1. Formulas for Coefficient Omega Estimates for Three Underlying Factor Models
(unidimensional, ∑ λ j
ωu = 2
congeneric model), j =1
continuous items σ X
One-factor model xj = c if τjc < x *j ≤ τjc+1,
(unidimensional,
x = λj f + ej
*
C –1 C –1
( )
∑ ∑ Φ2 τ x jc , τ x j ′c ′ , λ j λ j ′ −
J J
c = 1 c′ = 1
congeneric model), ∑ ∑
j
C –1 C –1
categorical items j =1 j′ =1 ∑ Φ (τ x ) ∑ Φ (τ x )
jc j ′c
c =1 1 1
c′ = 1
ωu-cat = 2
X
σ
Bifactor model 2
K J
(hierarchical factor x j = λ jg g + ∑ λ jk sk + e j ∑ λ jg
model), continuous items k =1
ωh =
j =1
2
X
σ
Note: For the one-factor model, λj is the factor loading for generic item xj, f is an unobserved factor (or latent variable), and ej is the error
term for item j. For categorical items, τjc refers to a threshold parameter used to link continuous latent-response variable x j* to observed
ordered, categorical item-response variable xj; for all items, the minimum threshold = −∞, and the maximum threshold = ∞ (see Wirth &
Edwards, 2007). For the bifactor model, λjg is the factor loading for item xj on general factor g; λjk is the factor loading for item xj on the
J
kth specific factor sk. Total score X = ∑ xj and total score variance σ̂X2 may be estimated from either the model-implied variance of X or
j =1
the observed sample variance of X. Φ1 is the univariate normal cumulative distribution function; Φ2 is the bivariate normal cumulative
distribution function. Reliability equations assume that all factor variances are fixed to 1 for model identification.
covariances (when they are specified as free parameters uncorrelated (see McDonald, 1999, or Green & Yang,
rather than fixed to 0 by software defaults) to total 2009a, for details). Taken together, alpha is an estimate
variance by calculating the denominator of ωu as a of total-score reliability for the measurement of a single
function of the entire model-implied covariance matrix construct common to all items in a test under the con-
among items. ditions that (a) a one-factor model is correct (i.e., the
After the one-factor model is estimated, one can test is unidimensional), (b) the factor loadings are equal
obtain a residual covariance (or correlation) matrix to across all items (i.e., essential tau equivalence), and (c)
diagnose the presence of large error covariance between the errors ej are independent across items. Because it
any pair of items; a residual covariance between two is unlikely for all of these conditions to hold in any real
items is the difference between their observed covari- situation, some researchers have called for abandoning
ance and the corresponding model-implied covariance. alpha in favor of alternative reliability estimates (e.g.,
Green and Yang (2009a) described scenarios in which McNeish, 2018). However, Savalei and Reise (2019) con-
a unidimensional test might produce correlated errors, tended that only severe violations of essential tau
although large residual correlations may also be evi- equivalence cause alpha to produce a notably biased
dence of multidimensionality. Large residual correla- reliability estimate, whereas multidimensionality and
tions diminish the fit of the one-factor model to data, error correlation are more likely to be problematic for
which can prompt researchers to modify their hypothe- the interpretation of alpha as a reliability estimate for
sized CFA model to explicitly specify free error-covariance the measurement of a single target construct (Green &
parameters.5 Yang, 2009a; Zinbarg et al., 2006), largely because alpha
does not disentangle replicable variation due to a target
construct from other sources of replicable variation.
Coefficient alpha Overall, estimates of ωu are unbiased with varying
If the population factor loadings are equal across all factor loadings (i.e., violation of tau equivalence), when
items, then the j subscript can be dropped from λ in alpha underestimates population reliability (e.g.,
the equation for the one-factor model, which leads to Zinbarg, Revelle, Yovel, & Li, 2005). Furthermore, Yang
what is known as the essential tau-equivalence model. and Green (2010) showed that ωu estimates are largely
If this model is correct, it can be shown that ω u is robust when the estimated model contains more factors
equivalent to alpha as long as the errors e j remain than the true model, even with samples as small as 50.
Which Omega Is Right? 489
Trizano-Hermosilla and Alvarado (2016) found that direct ML estimation to incorporate cases with incomplete
increasing levels of skewness in the univariate item data (see Brown, 2015). The one-factor model (here
distributions produced increasingly negatively biased named mod1f) is specified using a plain text string:
ωu estimates, especially for short tests (i.e., six items in
their study), but that item skewness caused greater bias > mod1f <- 'openness =~ O1 + O2 + O3 +
for alpha than for ωu. O4 + O5'
Continuous Categorical
Latent-Response Observed Item
Variables Threshold Responses
e1 Parameters
x ∗1 x1
λ1
e2
x ∗2 x2
λ2
e3
x ∗3 x3
λ3
e4
λ4 x ∗4 x4
λ5 e5
f x ∗5 x5
(Target Construct) λ6
e6
λ7 x ∗6 x6
e7
λ8
x ∗7 x7
e8
λ9
x ∗8 x8
e9
λ10
Model Equations: x9
e10 x ∗9
xj = c if τjc < xj∗ ≤ τjc+1,
xj∗ = λj f + ej x10
x ∗10
Fig. 2. One-factor model for a unidimensional test consisting of 10 ordinally scaled items. See the text for further explanation.
four-item psychoticism scale by Jonason and Webster data frame. This mod1f model is also estimated using
(2010) online via an open-access personality-testing the cfa function:
website (https://openpsychometrics.org/about). With
these data, alpha for the psychoticism scale was .77, > fit1f <- cfa(mod1f, data=potic, std.
but it should not be assumed that the scale is unidi- lv=T, ordered=T, estimator='WLSMV')
mensional and conforms to the essential tau-equiva-
lence model; instead, its factor structure should be But now, the ordered option is set to TRUE so that
tested to determine an appropriate reliability estimate. all items are treated as ordered, categorical variables;
Because the psychoticism items have a 4-point response consequently, lavaan is told to fit the model to poly-
scale, I fitted the one-factor model to the interitem choric correlations using WLSMV. As before, the std.
polychoric correlations using WLSMV, a robust weighted lv option is set to TRUE so that the variance of the
least squares estimator that is recommended over the psyctcsm factor is fixed equal to 1.
ML estimator for CFA with polychoric correlations Again, the results can be viewed using the summary
(Finney & DiStefano, 2013). function. The model-fit statistics under the Robust
Specifically, the one-factor model is specified in the column indicate that this one-factor model fits the data
same way as in the previous example with items treated adequately, CFI = .99, TLI = .97; although the RMSEA
as continuous variables: has a high value of .11, the residual correlations are all
small (< .07). Thus, it seems reasonable to estimate reli-
> mod1f <- 'psyctcsm =~ DDP1 + DDP2 + ability of the psychoticism scale from the parameter esti-
DDP3 + DDP4' mates of this model (i.e., the scale can be considered a
unidimensional test). As for ωu, the ωu-cat estimate can be
This code indicates that the factor psyctcsm is mea- obtained by executing the semTools::reliability
sured by observed variables DDP1 through DDP4, function on the fit1f one-factor model object;
which are the names of the psychoticism items in the because this model was fitted to polychoric correla-
Which Omega Is Right? 493
tions, semTools::reliability automatically cal- potic data frame. The results return a point estimate
culates ωu-cat instead of ωu: of .79 with percentile bootstrap 95% CI of [.75, .83].
In sum, because the psychoticism items have only
> reliability(fit1f) four response categories, any reliability estimate based
on a factor-analytic model should account for the items’
This call to reliability produces the following categorical nature. Because the one-factor model ade-
output: quately explains the polychoric correlations among the
items, it is reasonable to consider the psychoticism
psyctcsm scale a unidimensional test. Therefore, ωu-cat = .79 (95%
alpha 0.8007496 CI = [.75, .83]) is an appropriate estimate of the propor-
omega 0.7902953 tion of the psychoticism scale’s total-score variance that
omega2 0.7902953 is due to a single psychoticism factor.
omega3 0.7932682
avevar 0.5289638
Reliability Estimates
Results listed in the omega and omega2 rows give the for Multidimensional Scales
ωu-cat estimate based on Green and Yang’s (2009b) for- Often, tests are designed to measure a single construct
mula, in which the denominator equals the model- but end up having a multidimensional structure, espe-
implied variance of the total score X, and results listed cially as the content of the test broadens. Occasionally,
in the omega3 row are based on a variation of Green multidimensionality is intentional, as when a test is
and Yang’s formula in which the denominator equals designed to produce subscale scores in addition to a total
the observed, sample variance of X. Thus, the ωu-cat score. In other situations, the breadth of the construct’s
estimate indicates that .79 of the scale’s total-score vari- definition or the format of items produces unintended
ance is due to the single psychoticism factor. multidimensionality, even if a general target construct
These results also give an estimate of alpha (.80) that that influences all items is still present. In either case, the
differs from the estimate reported earlier for the psy- one-factor model presented earlier is incorrect, and thus
choticism scale (i.e., .77); this alpha is an example of it is generally inappropriate to use alpha or ωu to repre-
ordinal alpha (Zumbo, Gadermann, & Zeisser, 2007) sent the reliability of a total score from a multidimen-
because it is based on the model estimated using poly- sional test. Instead, reliability estimates for observed
choric correlations, whereas the first alpha estimate scores derived from multidimensional tests should be
used the traditional calculation from interitem product- interpretable with respect to the target constructs.
moment covariances. Note that ordinal alpha is a reli- For example, Flake, Barron, Hulleman, McCoach, and
ability estimate for the sum of the continuous, Welsh (2015) developed a 19-item test to measure a
latent-response variables (i.e., x* variables described broad construct, termed psychological cost, from Eccles’s
earlier) rather than for X, the sum of the observed, (2005) expectancy-value theory of motivation. Although
categorical item-response variables (Yang & Green, this psychological-cost scale (PCS) was designed to
2015). Additionally, ordinal alpha still carries the produce a meaningful total score representing a general
assumption of equal factor loadings (i.e., tau equiva- cost construct, the item content was derived from sev-
lence). For these reasons, I advocate ignoring the eral more specific content domains (termed task-effort
alpha results reported by semTools::reliability cost, outside-effort cost, loss of valued alternatives, and
when the factor model has been fitted using polychoric emotional cost). Consequently, although a general cost
correlations (see Chalmers, 2018, and Zumbo & Kroc, factor is expected to influence responses to all 19 items,
2019, for further discussion). it may be best to consider the PCS multidimensional
As with ωu, the ci.reliability function can also because of excess covariance among items from the
be used to obtain a confidence interval for ωu-cat. Spe- same content domain beyond the covariance explained
cifically, the command by a general construct.
ci.reliability(data=potic,
Bifactor models
type="categorical",
interval.type="perc") One way to represent such a multidimensional structure
is with a bifactor model, in which a general factor influ-
includes the option type="categorical" to invoke ences all items and specific factors (also known as
estimation of ωu-cat based on fitting a one-factor model group factors) capture covariation among subsets of
to the polychoric correlations among items in the items that remains over and above the covariance due
494 Flora
to the general factor. Specific factors do not represent estimates. Notably, Zinbarg et al. (2006) showed that
subscales per se but instead represent the shared ωh estimates calculated using the CFA method described
aspects of a subset of items that are independent from here are largely unbiased and are more accurate reli-
the general factor (in fact, in some situations, specific ability estimates than alpha, showing trivial effects of
factors may be used to capture method artifacts, such design factors including the magnitude and heterogene-
as item-wording effects). A bifactor model for the PCS ity of factor loadings, variation in sample size ranging
includes a general cost factor influencing all items from 50 to 200, the number of items, the number of
along with four specific factors capturing excess covari- specific factors, and the presence of cross-loadings. Yet
ance among items from the same content domain. A no research to date has directly addressed whether ω h
path diagram of this model is in Figure 3, which shows estimates are robust to model misspecification (e.g.,
that each item has a nonzero general-factor loading incorrectly using a bifactor model for a test with a dif-
(i.e., the general factor, g, influences all items) along ferent multidimensional structure).
with a nonzero loading on a specific factor pertaining As with ωu, when the bifactor model is fitted to
to the item’s content domain (i.e., each specific factor, polychoric correlations among ordered, categorical
sk, influences only a subset of items). Because each item items, applying the formula for ωh leads to a reliability
is directly influenced by two factors, the equation for estimate in the x* latent-response-variable metric rather
this model is a multiple regression equation with each than the metric of the observed total score X. Instead,
item simultaneously regressed on the general factor and the approach of Green and Yang (2009b) can be applied
one of the specific factors. The general factor must be to produce a version of ωh that gives a reliability esti-
uncorrelated with the specific factors to guarantee mate of the proportion of total observed score variance
model identification (Yung, Thissen, & McLeod, 1999), due to the general factor; this estimate is referred to as
whereas in other CFA models, all factors freely corre- ωh-cat. Although ωh-cat is not presented in Table 1, its
late with each other (allowing the general factor in a equation is simply an adaptation of the equation for
bifactor model to correlate with one of the specific ωu-cat in which the loadings from the one-factor model
factors causes errors such as nonconvergence or are replaced with general-factor loadings from a bifac-
improper solutions). tor model.
λ2s1 x2
e2 λ
s1 λ3s1 1g
x4 e4 λ3g
λ5s1
x5 e5 λ4g
λ5g
λ6s2 x6 e6
λ7s2 λ6g
s2 x7 e7
(Specific Outside Effort) λ8s2 λ7g
x8 e8
λ9s2 λ8g
e9 λ
x9 9g
g
λ10g (Psychological Cost)
λ10,s3 x10
e10 λ11g
λ11,s3 x11 λ12g
s3 e11
(Specific Loss of λ12,s3 λ13g
Valued Alternatives) x12 e12
λ13,s3
λ14g Generic Equation for All Items:
e13 K=4
x13
xj = λjg g + ( ∑ λjk sk ) + ej
k=1
λ15g
λ14,s4 x14 e14
λ16g Example Equation (for Item x 1):
e15
λ15,s4 x15 λ17g x1 = λ1g g + λ1s1 s1 + 0s2 + 0s3 + 0s4 + e1
λ16,s4 e16 x1 = λ1g g + λ1s1 s1 + e1
x16 λ18g
s4
(Specific Emotional Cost) λ17,s4 e17
x17 λ19g
λ18,s4
e18
λ19,s4 x18
x19 e19
Fig. 3. Bifactor model for the psychological-cost scale. See the text for further explanation.
495
496 Flora
where gen is the general cost factor measured by all the estimates listed under the s1 to s4 columns is
19 items (i.e., TE1 through EM6), s1 is the specific described in the online supplement.
factor for items pertaining to task-effort content (TE1
through TE5), s2 is the specific factor for items per-
Higher-order models
taining to outside-effort content (OE1 through OE4),
s3 is the specific factor for loss-of-valued-alternatives In this bifactor-model example, I considered multidi-
items (LVA1 through LVA4), and s4 is the specific mensionality among the PCS items a nuisance for the
factor for emotional-cost items (EM1 through EM6). The measurement of a broad, general psychological-cost
model is estimated with target construct. In other situations, researchers may
hypothesize an alternative multidimensional structure
> fitBf <- cfa(modBf, data=pcs, such that there is a broad, overarching construct indi-
std.lv=T, estimator='MLR', rectly influencing all items in a test through more con-
orthogonal=T) ceptually narrow constructs that directly influence
different subsets of items. Such hypotheses imply that
where the orthogonal=TRUE option forces all inter- the item-level data arise from a higher-order model, in
factor correlations to equal 0, which, as discussed ear- which a higher-order factor (also known as a second-
lier, is important for identification of bifactor models. order factor) causes individual differences in several
The results from the summary function indicate that more conceptually narrow lower-order factors (or first-
the bifactor model fits the PCS data well, with robust order factors), which in turn directly influence the
model-fit statistics, CFI = .98, TLI = .97, RMSEA = .05. observed item responses. In this context, researchers
Thus, it is reasonable to calculate ω h to estimate how may evaluate the extent to which the test produces
reliably the PCS total score measures the general reliable total scores (as measures of the construct rep-
psychological-cost factor. Applying semTools:: resented by the higher-order factor) as well as subscale
reliability to this fitted model, scores (as measures of the constructs represented by
the lower-order factors).
> reliability(fitBf), When item scores arise from a higher-order model,
a reliability measure termed omega-higher-order, or ωho,
produces the following output: represents the proportion of total-score variance that
is due to the higher-order factor; parameter estimates
gen s1 s2 s3 s4 from a higher-order model are used to calculate ω ho.
alpha 0.9638781 0.92504205 0.8992820 0.9052459 0.9405882 As does ωh, ωho represents the reliability of a total score
omega 0.9741033 0.56377307 0.7884791 0.6766430 0.7816839 for measuring a single construct that influences all
omega2 0.9094893 0.09237594 0.3666293 0.1880759 0.2054075 items, despite the multidimensional nature of the test.
omega3 0.9077636 0.09240479 0.3666634 0.1878380 0.2053012 Thus, the conceptual distinction between ω h and ωho
avevar NA NA NA NA NA owes to the subtle difference between the interpretation
of the general factor in the bifactor model and the
Estimates listed under the gen column pertain to the higher-order factor in the higher-order model: In short,
general cost factor. The omega estimate, .97, ignores whereas the bifactor model’s general factor influences
the contribution of the specific factors to calculation of all items directly (while the specific factors are held
the implied variance of the total score in its denomina- constant), a higher-order factor influences all items
tor ( Jorgensen et al., 2020); thus, this value is not a indirectly via the lower-order factors (see Yung et al.,
reliability estimate for the PCS total score. Instead, the 1999, for further detail). The online supplement to this
omega2 and omega3 values under gen are ωh esti- article presents a formula for ωho and demonstrates the
mates; omega2 is calculated using the model-implied estimation of a higher-order model for the PCS and
variance of the total score in its denominator, and subsequent calculation of ωho as a function of the mod-
omega3 is calculated using the observed sample vari- el’s parameter estimates. When the semTools::
ance of X (this distinction is analogous to the one reliability function is applied to a fitted higher-
between omega2 and omega3 described earlier for order model, it does not return ωho; instead, the reli-
ωu). Thus, the proportion of PCS total-score variance a b i l i t y L 2 function of the semTools package
that is due to a general psychological cost factor over calculates ωho, as shown in the online supplement,
and above the influence of effects that are specific to which also describes reliability estimation for
the different content domains is .91. 11 Interpretation of subscales.12
Which Omega Is Right? 497
Fig. 4. Flowchart for determining the appropriate omega estimate for measurement of a hypothetical construct influencing all items in a scale. CFA = confirmatory factor analysis; EFA =
exploratory factor analysis.
Which Omega Is Right? 499
Cronbach, L. J. (1951). Coefficient alpha and the internal Lord, F. M., & Novick, M. R. (1968). Statistical theories of
structure of tests. Psychometrika, 16, 297–334. mental test scores. Reading, MA: Addison-Wesley.
Eccles, J. S. (2005). Subjective task value and the Eccles et al. McDonald, R. P. (1999). Test theory: A unified approach.
model of achievement-related choices. In A. J. Elliot & C. S. Mahwah, NJ: Erlbaum.
Dweck (Eds.), Handbook of competence and motivation McNeish, D. (2018). Thanks coefficient alpha, we’ll take it
(pp. 105–121). New York, NY: Guilford Press. from here. Psychological Methods, 23, 412–433.
Finney, S. J., & DiStefano, C. (2013). Nonnormal and categori- Mellenbergh, G. J. (1996). Measurement precision in test
cal data in structural equation modeling. In G. R. Hancock score and item response models. Psychological Methods,
& R. O. Mueller (Eds.), A second course in structural 1, 293–299.
equation modeling (2nd ed., pp. 439–492). Charlotte, NC: R Core Team. (2018). R: A language and environment for
Information Age. statistical computing. Vienna, Austria: R Foundation for
Flake, J. K., Barron, K. E., Hulleman, C., McCoach, B. D., Statistical Computing.
& Welsh, M. E. (2015). Measuring cost: The forgotten Revelle, W. (2020). psych: Procedures for psychological, psy-
component of expectancy-value theory. Contemporary chometric, and personality research (R package Version
Educational Psychology, 41, 232–244. 2.0.9) [Computer software]. Retrieved from https://
Flake, J. K., Ferland, M., & Flora, D. B. (2017, April). cran.r-project.org/web/packages/psych/index.html
Trajectories of psychological cost in gatekeeper classes: Revelle, W., & Condon, D. M. (2019). Reliability from α to
Relationships with expectancy, value, and performance. ω: A tutorial. Psychological Assessment, 31, 1395–1411.
Paper presented at the annual meeting of the American Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual dif-
Educational Research Association, San Antonio, TX. ferences in cognition: New methods for examining the
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation personality-cognition link. In A. Gruszka, G. Matthews, &
in social and personality research: Current practice and B. Szymura (Eds.), Handbook of individual differences in
recommendations. Social Psychological and Personality cognition: Attention, memory and executive control (pp.
Science, 8, 370–378. 27–49). New York, NY: Springer.
Flora, D. B., & Flake, J. K. (2017). The purpose and practice of Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta,
exploratory and confirmatory factor analysis in psychologi- omega, and the glb: Comments on Sijtsma. Psychometrika,
cal research: Decisions for scale development and valida- 74, 145–154.
tion. Canadian Journal of Behavioural Science, 49, 78–88. Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012).
Flora, D. B., LaBrish, C., & Chalmers, R. P. (2012). Old and When can categorical variables be treated as continu-
new ideas for data screening and assumption testing for ous? A comparison of robust continuous and categorical
exploratory and confirmatory factor analysis. Frontiers in SEM estimation methods under suboptimal conditions.
Psychology, 3, Article 55. doi:10.3389/fpsyg.2012.00055 Psychological Methods, 17, 354–373.
Fried, E. I., & Flake, J. K. (2018). Measurement matters. Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016).
Observer, 31(3), pp. 29–30. Evaluating bifactor models: Calculating and interpreting
Green, S. B., & Yang, Y. (2009a). Commentary on coefficient statistical indices. Psychological Methods, 21, 137–150.
alpha: A cautionary tale. Psychometrika, 74, 121–135. Rosseel, Y. (2012). lavaan: An R package for structural equa-
Green, S. B., & Yang, Y. (2009b). Reliability of summed item tion modeling. Journal of Statistical Software, 48(2).
scores using structural equation modeling: An alternative Savalei, V. (2018). On the computation of the RMSEA and
to coefficient alpha. Psychometrika, 74, 155–167. CFI from the mean-and-variance corrected test statistic
Jonason, P., & Webster, G. (2010). The dirty dozen: A concise with nonnormal data in SEM. Multivariate Behavioral
measure of the dark triad. Psychological Assessment, 22, Research, 53, 419–429.
420–432. Savalei, V. (2019). A comparison of several approaches
Jöreskog, K. G. (1971). Statistical analysis of sets of conge- for controlling measurement error in small samples.
neric tests. Psychometrika, 36, 109–133. Psychological Methods, 24, 352–370.
Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Savalei, V., & Reise, S. P. (2019). Don’t forget the model in your
Rosseel, Y. (2020). semTools: Useful tools for structural model-based reliability coefficients: A reply to McNeish
equation modeling (R package Version 0.5-3) [Computer (2018). Collabra: Psychology, 5, Article 36. doi:10.1525/
software]. Retrieved from https://CRAN.R-project.org/ collabra.247
package=semTools Sijtsma, K. (2009). On the use, the misuse, and the very
Kelley, K. (2019). MBESS: The MBESS R package (R pack- limited usefulness of Cronbach’s alpha. Psychometrika,
age Version 4.6.0) [Computer software]. Retrieved from 74, 107–120.
https://CRAN.R-project.org/package=MBESS Trizano-Hermosilla, I., & Alvarado, J. M. (2016). Best alterna-
Kelley, K., & Pornprasertmanit, S. (2016). Confidence inter- tives to Cronbach’s alpha reliability in realistic conditions:
vals for population reliability coefficients: Evaluation of Congeneric and asymmetrical measurements. Frontiers
methods, recommendations, and software for composite in Psychology, 7, Article 769. doi:10.3389/fpsyg.2016
measures. Psychological Methods, 21, 69–92. .00769
Loken, E., & Gelman, A. (2017). Measurement error and the Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis:
replication crisis: The assumption that measurement error Current approaches and future directions. Psychological
always reduces effect sizes is false. Science, 355, 584–585. Methods, 12, 58–79.
Which Omega Is Right? 501
Yang, Y., & Green, S. B. (2010). A note on structural equa- Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P.
tion modeling estimates of reliability. Structural Equation (2006). Estimating generalizability to a latent variable
Modeling, 17, 66–81. common to all of a scale’s indicators: A comparison of
Yang, Y., & Green, S. B. (2015). Evaluation of structural estimators for ωh. Applied Psychological Measurement,
equation modeling estimates of reliability for scales with 30, 121–144.
ordered categorical items. Methodology, 11, 23–34. Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal
Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the rela- versions of coefficients alpha and theta for Likert rating
tionship between the higher-order factor model and the scales. Journal of Modern Applied Statistical Methods, 6,
hierarchical factor model. Psychometrika, 64, 113–128. 21–29.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice
Cronbach’s α, Revelle’s β, and McDonald’s ωh: Their re- and Stevens’ scales of measurement do not help make it:
lations with each other and two alternative conceptuali A response to Chalmers. Educational and Psychological
zations of reliability. Psychometrika, 70, 123–133. Measurement, 79, 1184–1197.