Cronbach
Cronbach
Cronbach
Joseph A. Gliem
Rosemary R. Gliem
Abstract: The purpose of this paper is to show why single-item questions pertaining to a construct are
not reliable and should not be used in drawing conclusions. By comparing the reliability of a summated,
multi-item scale versus a single-item question, the authors show how unreliable a single item is; and
therefore it is not appropriate to make inferences based upon the analysis of single-item questions which
are used in measuring a construct.
Introduction
Spector (1992) identified four characteristics that make a scale a summated rating scale
as follows:
First, a scale must contain multiple items. The use of summated in the name implies that
multiple items will be combined or summed. Second, each individual item must measure
something that has an underlying, quantitative measurement continuum. In other words,
it measures a property of something that can vary quantitatively rather than qualitatively.
82
Refereed Paper: Gliem & Gliem
An attitude, for example, can vary from being very favorable to being very unfavorable.
Third, each item has no “right” answer, which makes the summated rating scale different
from a multiple-choice test. Thus summated rating scales cannot be used to test for
knowledge or ability. Finally, each item in a scale is a statement, and respondents are
asked to give rating about each statement. This involves asking subjects to indicate
which of several response choices best reflects their response to the item. (pp. 1-2)
Nunnally and Bernstein (1994), McIver and Carmines (1981), and Spector (1992) discuss
the reasons for using multi-item measures instead of a single item for measuring psychological
attributes. They identify the following: First, individual items have considerable random
measurement error, i.e. are unreliable. Nunnally and Bernstein (1994) state, “Measurement error
averages out when individual scores are summed to obtain a total score” (p. 67). Second, an
individual item can only categorize people into a relatively small number of groups. An
individual item cannot discriminate among fine degrees of an attribute. For example, with a
dichotomously scored item one can only distinguish between two levels of the attribute, i.e. they
lack precision. Third, individual items lack scope. McIver and Carmines (1981) say, “It is very
unlikely that a single item can fully represent a complex theoretical concept or any specific
attribute for that matter” (p. 15). They go on to say,
The most fundamental problem with single item measures is not merely that they tend to
be less valid, less accurate, and less reliable than their multiitem equivalents. It is rather,
that the social scientist rarely has sufficient information to estimate their measurement
properties. Thus their degree of validity, accuracy, and reliability is often unknowable.
(p. 15).
Blalock (1970) has observed, “With a single measure of each variable, one can remain blissfully
unaware of the possibility of measurement [error], but in no sense will this make his inferences
more valid” (p. 111).
Given this brief background on the benefits of Likert-type scales with their associated
multi-item scales and summated rating scores, many individuals consistently invalidate research
findings due to improper data analysis. This paper will show how data analysis errors can
adversely affect the inferences one wishes to make.
83
Refereed Paper: Gliem & Gliem
reliability, but then opted to conduct data analysis using individual items. This is particularly
troubling because single item reliabilities are generally very low, and without reliable items the
validity of the item is poor at best and at worst unknown. This can be illustrated using a simple
data set of actual data collected from a class of graduate students enrolled in a Winter Quarter,
2003, research design course. Cronbach’s alpha is a test reliability technique that requires only a
single test administration to provide a unique estimate of the reliability for a given test.
Cronbach’s alpha is the average value of the reliability coefficients one would obtained for all
possible combinations of items when split into two half-tests.
84
Refereed Paper: Gliem & Gliem
0
0 1 2 3 4 5 6
First Administration
Table 2: Multi-item statements to measure students pleasure with their graduate program
at The Ohio State University
85
Refereed Paper: Gliem & Gliem
Table 3 shows the item-analysis output from SPSS for the multi-item scale of student
attitude towards their graduate program. A description of the sections and related terms are as
follows:
1. Statistics for Scale—These are summary statistics for the 8 items comprising the scale.
The summated scores can range from a low of 8 to a high of 40.
2. Item means—These are summary statistics for the eight individual item means.
3. Item Variances—These are summary statistics for the eight individual item variances.
4. Inter-Item Correlations—This is descriptive information about the correlation of each
item with the sum of all remaining items. In the example in Table 2, there are 8
correlations computed: the correlation between the first item and the sum of the other
seven items, the correlation between the second item and the sum of the other seven
items, and so forth. The first number listed is the mean of these eight correlations (in our
example .3824), the second number is the lowest of the eight (.0415), and so forth. The
mean of the inter-item correlations (.3824) is the r in the _ = rk / [1 + (k -1) r] formula
where k is the number of items considered.
5. Item-total Statistics—This is the section where one needs to direct primary attention. The
items in this section are as follows:
a. Scale Mean if Item Deleted—Excluding the individual item listed, all other scale
items are summed for all individuals (48 in our example) and the mean of the
summated items is given. In Table 2, the mean of the summated scores excluding
item 2 is 25.1.
b. Scale Variance if Item Deleted—Excluding the individual item listed, all other scale
items are summed for all individuals (48 in our example) and the variance of the
summated items is given. In Table 2, the variance of the summated scores excluding
item 2 is 25.04.
c. Corrected Item-Total Correlation—This is the correlation of the item designated with
the summated score for all other items. In Table 2, the correlation between item 2
and the summated score is .60. A rule-of-thumb is that these values should be at least
.40.
d. Squared Multiple Correlation—This is the predicted Multiple Correlation Coefficient
squared obtained by regressing the identified individual item on all the remaining
items. In Table 2, the predicted Squared Multiple Regression Correlation is .49 by
regressing item 2 on items 4, 5, 6, 7, 8, 9, and 10.
e. Alpha if Item Deleted—This is probably the most important column in the table.
This represents the scale’s Cronbach’s alpha reliability coefficient for internal
consistency if the individual item is removed from the scale. In Table 2, the scale’s
Cronbach’s alpha would be .7988 if item 2 were removed for the scale. This value is
then compared to the Alpha coefficient value at the bottom of the table to see if one
wants to delete the item. As one might have noted, the present scale has only 8 items
where the original scale had 10 items. Using the above information, removing items
1 and 2 resulted in an increase in Cronbach’s alpha from .7708 to .8240.
f. Alpha—The Cronbach’s alpha coefficient of internal consistency. This is the most
frequently used Cronbach’s alpha coefficient.
g. Standardized Item Alpha—The Cronbach’s alpha coefficient of internal consistency
when all scale items have been standardized. This coefficient is used only when the
individual scale items are not scaled the same.
86
Refereed Paper: Gliem & Gliem
N Mean Variance SD
87
Refereed Paper: Gliem & Gliem
Conclusions
When using Likert-type scales it is imperative to calculate and report Cronbach’s alpha
coefficient for internal consistency reliability for any scales or subscales one may be using. The
analysis of the data then must use these summated scales or subscales and not individual items.
If one does otherwise, the reliability of the items is at best probably low and at worst unknown.
Cronbach’s alpha does not provide reliability estimates for single items.
References
Blalock, H. M., Jr. (1970). Estimating measurement error using multiple indicators and several
points in time. American Sociological Review, 35(1), 101-111.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Thousand Oaks,
CA: Sage.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and
reference. 11.0 update (4th ed.). Boston: Allyn & Bacon.
Likert, R. (1931). A technique for the measurement of attitudes. Archives of Psychology. New
York: Columbia University Press.
McIver, J. P., & Carmines, E. G. (1981). Unidimensional scaling. Thousand Oaks, CA: Sage.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-
Hill.
Spector, P. (1992). Summated rating scale construction. Thousand Oaks, CA: Sage.
Warmbrod, J. R. (2001). Conducting, interpreting, and reporting quantitative research.
Research Pre-Session, New Orleans, Louisiana.
________________________
Joseph A. Gliem, Associate Professor, Dept. of Human and Community Resource Development,
The Ohio State University, 208 Ag. Admin Bldg., 2120 Fyffe Rd., Columbus, OH
43210; [email protected]
Rosemary R. Gliem, The Ohio State University; [email protected]
88