Henson (E&PM-2001)
Henson (E&PM-2001)
Henson (E&PM-2001)
http://epm.sagepub.com/
A Reliability Generalization Study of the Teacher Efficacy Scale and Related Instruments
Robin K. Henson, Lori R. Kogan and Tammi Vacha-Haase
Educational and Psychological Measurement 2001 61: 404
DOI: 10.1177/00131640121971284
Published by:
http://www.sagepublications.com
Additional services and information for Educational and Psychological Measurement can be found at:
Subscriptions: http://epm.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://epm.sagepub.com/content/61/3/404.refs.html
ROBIN K. HENSON
University of North Texas
A previous draft of this article was presented at the annual meeting of the American Educa-
tional Research Association, April 26, 2000, New Orleans. Correspondence concerning this arti-
cle should be directed to the first author at Department of Technology and Cognition, P.O. Box
311337, Denton, TX 76203-1337; e-mail: [email protected].
Educational and Psychological Measurement, Vol. 61 No. 3, June 2001 404-420
© 2001 Sage Publications, Inc.
404
persist longer with students who struggle (Gibson & Dembo, 1984), and are
less critical of student errors (Ashton & Webb, 1986).
Although the study of teacher efficacy has borne much fruit, the meaning
and appropriate methods of measuring the construct have become the subject
of recent debate (Tschannen-Moran et al., 1998). This dialogue has centered
on two issues. First, based on the theoretical nature of the self-efficacy con-
struct (Bandura, 1977, 1997), researchers have argued that self-efficacy is
best measured within context regarding specific behaviors (see, e.g., Pajares,
1996). Second, the construct validity of scores from a variety of instruments
purporting to measure teacher efficacy and related constructs has been ques-
tioned (Coladarci & Fink, 1995; Guskey & Passaro, 1994).
Other tests have also been developed to assess teacher efficacy and related
constructs. For example, because self-efficacy is most appropriately mea-
sured in specific contexts, Riggs and Enochs (1990) developed a subject mat-
ter instrument to measure efficacy for teaching science, the Science Teaching
Efficacy Belief Instrument (STEBI). This instrument was based on the TES
and also consisted of two largely uncorrelated subscales: Personal Science
Teaching Efficacy (PSTE) and Science Teaching Outcome Expectancy
(STOE). In most applications, the STEBI consists of 25 items with a 5-point
Likert-type scale.
Furthermore, several tests have evolved from a slightly different, but
related, theoretical orientation than Bandura’s (1997) social cognitive theory.
Specifically, Rotter’s (1966) locus of control theory has played an important
historical role in the conceptualization of teacher efficacy as a construct (cf.
Tschannen-Moran et al., 1998). Intuitively, one’s locus of control orientation
may affect one’s perceived beliefs in his or her ability to execute actions that
lead to success in a given attainment. Instruments in this locus of control tra-
dition have informed the study of teacher efficacy from a construct validity
standpoint (Coladarci & Fink, 1995) and are often used in teacher efficacy
studies.
Two of the more frequently used instruments in the Rotter (1966) tradition
are the Teacher Locus of Control (TLC) (Rose & Medway, 1981) and the
Responsibility for Student Achievement (RSA) (Guskey, 1981b). The TLC
consists of 28 forced-choice items that present situations of student success
(14 items) and student failure (14 items). The two forced-choice options
allow for either an internal (teacher) or external (student) explanation for the
student outcome. The TLC yields two subscale scores, one reflecting internal
locus of control for student success (I+) and the other, internal locus for stu-
dent failure (I–). Similarly, the RSA consists of 30 items also presenting two
possible explanations (internal vs. external) for student success and failure.
However, the RSA asks respondents to weight each explanation by dividing
100 percentage points between the options. Scoring results in two subscales,
one assessing responsibility for student success (RSA+) and the other
responsibility for student failure (RSA–).
In an important article, Tschannen-Moran et al. (1998) reviewed the his-
tory and measurement methods for teacher efficacy. They challenged both
current conceptualization of teacher efficacy as a construct and the psycho-
metric properties of predominate instruments in the field. Particularly,
Tschannen-Moran et al. presented a thoughtful critique of the construct
validity of scores from the TES (Gibson & Dembo, 1984). They disagreed
with Gibson and Dembo’s claim that the PTE and GTE subscales of the TES
reflect Bandura’s (1977) self-efficacy and outcome expectancy dimensions
of social cognitive theory. Other researchers have made similar claims as
regards construct validity (cf. Coladarci & Fink, 1995; Guskey & Passaro,
Purpose
people across time on the trait of interest is critical in obtaining high reliabil-
ity estimates.
Unfortunately, researchers often fail to cite reliability estimates for their
data and often assume that estimates from prior studies or test manuals suf-
fice for their current study (Vacha-Haase, Kogan, & Thompson, 2000). How-
ever, as Pedhazur and Schmelkin (1991) noted, “Such information may be
useful for comparative purposes, but it is imperative to recognize that the
relevant reliability estimate is the one obtained for the sample used in the
study under consideration” (p. 86). Empirical studies confirm that very few
researchers actually report reliability estimates for their data (cf. Caruso,
2000; Vacha-Haase, 1998; Yin & Fan, 2000). For example, Yin and Fan
(2000) observed that only 7.5% of articles employing the Beck Depression
Inventory reported precise reliability estimates for the data in hand.
Because sample characteristics can impact score reliability, researchers
who only report reliability from prior studies or test manuals should at least
make explicit comparisons concerning their sample’s composition and vari-
ability to the sample referenced in the prior study. As Dawis (1987) ex-
plained, “Because reliability is a function of sample as well as of instrument,
it should be evaluated on a sample from the intended target population—an
obvious but sometimes overlooked point” (p. 486). As the current sample dif-
fers from that referenced, the current reliability estimates may also differ. Re-
garding this comparison between samples, Thompson and Vacha-Haase
(2000) suggested that
the crudest and barely acceptable minimal evidence of score quality in a sub-
stantive study would involve an explicit and direct comparison (Thompson,
1992) of (a) relevant sample characteristics (e.g., age, gender), whatever these
may be in the context of a particular inquiry, with the same features reported in
the manual for the normative sample or in earlier research and (b) the sample
score SD with the SD reported in the manual or in other earlier research.
(p. 190, emphasis in original)
Vacha-Haase et al. (2000) termed the process of using a prior study’s reli-
ability estimates for one’s own data “reliability induction,” suggesting that
researchers inductively generalize from specific instances to a broader con-
clusion. That is, researchers assume that because reliable scores were ob-
tained in prior instances, reliable scores will be obtained in entirely new data
(which, of course, is not necessarily the case). Vacha-Haase et al. argued that
reliability induction was only reasonable when the sample composition and
variability between the current and referenced samples are comparable. Fur-
thermore, they presented data illustrating the frequent incongruence between
current and prior samples when prior reliability coefficients are inducted in
new samples.
Because reliability may, and does, vary on different administrations of a
test, Vacha-Haase (1998) employed a meta-analytic method called “reliabil-
Method
subscales. Subscales on the other tests had many fewer reported estimates
from data in hand (13 PSTE, 11 STOE, 3 I+, 3 I–, 5 RSA+, 5 RSA–).
The 52 articles selected were each read, and 15 study characteristics were
coded. Of the 52 articles, 43 were dually coded by two independent raters.
Interrater reliability was examined by calculating the percentage of perfect
agreement between raters out of all possible ratings. This percentage was
computed for each of the 15 coded variables and ranged from 76.09% to
100% agreement (M = 91.35%, SD = 6.92%). In addition, accuracy of coding
was checked by a third rater, who examined and corrected observed discrep-
ancies between the independent raters. The third rater also audited the 9 arti-
cles that were not dually coded and made minor corrections.
Although multiple study characteristics were coded, the small percentage
of studies actually reporting reliability coefficients (all internal consistency
estimates) limited the number of variables that could be used for analysis. As
such, selected bivariate correlational analyses were conducted in lieu of mul-
tiple regression. Variables were selected for use in the present study based on
their potential for capturing differences in sample homogeneity as regards
the variable of interest. These variables were the following:
Estimating Reliability
Reliability was estimated with KR-21 (Kuder & Richardson, 1937) for the
dichotomously scored TLC subscales (I+ and I–). KR-21 requires knowledge
of the mean, standard deviation, and number of items on the test. The formula
assumes that all item difficulties are equal, and, as a matter of degree, the
coefficient may be expected to be an underestimate of reliability when this
assumption is not met. Because only two cases using the TLC reported both
reliability from data in hand and means and standard deviations, a compari-
son of the accuracy of the KR-21 estimate was not possible. Because KR-21
is likely to underestimate reliability, the KR-21 estimates were used as the
reliability estimate for all analyses concerning the subscales of the TLC. This
was necessary to ensure that the reliability estimates maintained their relative
position in the distribution, despite potentially underestimating score
reliability.
To obtain the uncorrected total score variance estimates necessary for
KR-21, we converted the reported standard deviation with the following
formula:
σ2 = [SD2 * (n – 1)] / n,
where SD is the standard deviation of total scores reported for the subscale
and n is the sample size for which the SD was reported. This estimate was
then used in the KR-21 formula. It should be noted that KR-21 was not ap-
plied to the other subscales because their response formats were nondichoto-
mous. In its traditional form (Kuder & Richardson, 1937), KR-21 does not
generalize to this type of data (e.g., Likert-type scales).
Results
Figure 1 characterizes the distributions of reliability estimates with box
plots. Table 1 presents descriptives for the subscales. Examination of Figure
1 indicates considerable variation of score reliability between subscales and
within some subscales, particularly the two subscales of the TES (PTE and
GTE) and the Internal Failure (I–) subscale of the TLC. Reliabilities had
ranges of .26 or higher on each of these subscales, representing at least 26%
fluctuation in true score variance from minimum to maximum estimates. Fig-
ure 1 also suggests that several subscales were relatively consistent in their
ability to yield reliable scores, particularly the PSTE subscale of the STEBI
and the Internal Success (I+) subscale of the TLC.
Figure 1. Box plot of reliability estimates for each subscale from four instruments.
Note. RSA+ = Responsibility for Student Success (RSA); RSA– = Responsibility for Student Failure (RSA);
I+ = Internal Success (TLC); I– = Internal Failure (TLC); PSTE = Personal Science Teaching Efficacy
(STEBI); STOE = Science Teaching Outcome Expectancy (STEBI); PTE = Personal Teaching Efficacy
(TES); GTE = General Teaching Efficacy (TES).
Table 1
Reliability Estimates and Correlations Between Study Characteristics and Score Reliability
Estimates
Note. RSA = Responsibility for Student Achievement; RSA+ = Responsibility for Success; RSA– = Responsi-
bility for Failure; TLC = Teacher Locus of Control; I+ = Internal Success, I– = Internal Failure; STEBI = Sci-
ence Teaching Efficacy Belief Instrument; PSTE = Personal Science Teaching Efficacy; STOE = Science
Teaching Outcome Expectancy; TES = Teacher Efficacy Scale; PTE = Personal Teaching Efficacy; GTE =
General Teaching Efficacy.
a. Correlation between continuous or coded predictor variable and reliability estimates for the given subscale.
b. n for correlation after pairwise deletion of missing data.
c. Mean for the continuous or coded predictor variable for given subscale.
d. Standard deviation for the continuous or coded predictor variable for given subscale.
Discussion
Considerable variability was observed between instruments as regards to
their ability to yield reliable scores. Mean reliability coefficients tended to be
acceptable for the instruments, although what is acceptable is a somewhat ar-
bitrary decision and ultimately determined by the context of a study. Potential
fluctuation of reliability coefficients was also evident within all instruments,
particularly for the TES’s PTE and GTE subscales and the TLC’s Internal
Failure subscale. Because reliability may fluctuate, researchers should al-
ways examine the reliability of their data in hand and report it. Thus, the APA
Task Force on Statistical Inference emphasized,
It is insufficient to assume that a test will yield reliable scores solely be-
cause reliable scores have been obtained in the past. An even more egregious
error is to assume a test will yield reliable scores when reliability has been
marginal in the past, such as for the GTE subscale of the TES (see Figure 1).
Furthermore, even in substantive studies, reporting reliability coefficients is
critical because effect sizes are attenuated by the observed reliabilities
(Reinhardt, 1996).
Regarding the TES, the PTE subscale tended to maintain stronger score
integrity than the GTE subscale. This finding suggests that the GTE subscale
may be susceptible to measurement error problems in addition to its ques-
tioned construct validity (Coladarci & Fink, 1995; Guskey & Passaro, 1994;
Tschannen-Moran et al., 1998). Accordingly, use of the GTE subscale as a
measure of teacher efficacy is questionable at best. Correlational analyses
revealed no clear patterns regarding the relationship between reliability coef-
ficients and study characteristics for the TES. However, the failure of many
authors to report reliability information limited the number of characteristics
examined and sensitivity of the analyses used. Therefore, the present results
are inconclusive regarding the relationship between study characteristics and
score reliability on the TES. What is clear, however, is that total score vari-
ance was consistently related to reliability coefficients. Range restriction for
homogeneous samples is likely to lower reliability estimates and appeared to
do so in the present study. The negative relationship between reliability and
gender homogeneity also provided limited evidence of this possibility.
Because the STEBI was developed from the TES, its performance was
similar to the TES. Looking at the results for both the TES and the STEBI in
Figure 1, it is clear that the personal teaching efficacy subscales tend to yield
less measurement error in their scores. The tests consistently yielded lower
score reliabilities for the GTE or Outcome Expectancy subscales. These find-
ings are consistent with the current debate surrounding the TES and the PTE
and GTE constructs. Although prior debate has focused on construct validity
of scores from these tests (Tschannen-Moran et al., 1998), the present study
suggests that the psychometric difficulties of the general teaching efficacy
subscales are also problematic as regards measurement error. Furthermore,
with one subscale exception, the TES yielded the most variable reliability
coefficients of all the instruments.
In sum, although the PTE subscale tended to include less measurement
error in its scores, the reported reliability estimates were quite variable across
studies with low estimates in the marginal range. Coefficients from the GTE
subscale were consistently lower and also highly variable. The TES, if it is to
see continued use in the study of teacher efficacy, likely should undergo revi-
sion with an eye to measurement integrity. Given the debate over the con-
struct validity and current evidence of poor reliability of scores for the GTE
subscale, the subscale should potentially be abandoned and replaced with
efforts to more reliably measure the outcome expectancy dimension of
Bandura’s (1997) social cognitive theory. Tschannen-Moran et al. (1998)
have presented a new model of teacher efficacy that may serve to advise
development of new measurements in the field. Henson, Bennett, Sienty, and
Chambers (2000) reported some support for this model and its application of
the relevant constructs. Researchers of teacher efficacy would do well to pur-
sue measurement strategies in this direction, and if tests are developed to aid
the process, researchers should be certain to examine score reliability for data
in hand, even in substantive studies. After developing their tests, researchers
would also do well not to then erroneously claim that their “test is reliable.”
References
*Articles used in the meta-analysis are marked with an asterisk.
*Allinder, R. M. (1994). The relationship between efficacy and the instructional practices of spe-
cial education teachers and consultants. Teacher Education and Special Education, 17,
86-95.
*Anderson, R., Greene, M., & Loewen, P. (1988). Relationships among teachers’ and students’
thinking skills, sense of efficacy, and student achievement. Alberta Journal of Educational
Research, 34, 148-165.
Armor, D., Conroy-Oseguera, P., Cox, M., King, N., McDonnell, L., Pascal, A., Pauly, E., &
Zellman, G. (1976). Analysis of the school preferred reading programs in selected Los An-
geles minority schools (Report No. R-2007-LAUSD). Santa Monica, CA: RAND. (ERIC
Document Reproduction Service No. ED 130 243)
Ashton, P., & Webb, R. B. (1982, March). Teachers’ sense of efficacy: Toward an ecological
model. Paper presented at the annual meeting of the American Educational Research Associ-
ation, New York.
Ashton, P., & Webb, R. B. (1986). Making a difference: Teachers’ sense of efficacy and student
achievement. New York: Longman.
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychologi-
cal Review, 84, 191-215.
Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman.
*Benninga, J. S., Guskey, T. R., & Thornburg, K. R. (1981). The relationship between teacher at-
titudes and student perceptions of classroom climate. Elementary School Journal, 82, 66-75.
Berman, P., McLaughlin, M., Bass, G., Pauly, E., & Zellman, G. (1977). Federal programs sup-
porting educational change: Vol. VII. Factors affecting implementation and continuation
(Report No. R-1589/7-HEW). Santa Monica, CA: RAND. (ERIC Document Reproduction
Service No. 140 432)
Caruso, J. C. (2000). Reliability generalization of the NEO personality scales. Educational and
Psychological Measurement, 60, 236-254.
*Coladarci, T. (1992). Teachers’ sense of efficacy and commitment to teaching. Journal of Ex-
perimental Education, 60, 323-337.
*Coladarci, T., & Breton, W. (1997). Teacher efficacy, supervision, and the special education re-
source-room teacher. Journal of Educational Research, 90, 230-239.
Coladarci, T., & Fink, D. R. (1995, April). Correlations among measures of teacher efficacy: Are
they measuring the same thing? Paper presented at the annual meeting of the American Edu-
cational Research Association, San Francisco.
Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34, 481-489.
*Enochs, L. G., & Riggs, I. M. (1990). Further development of an elementary science teaching
efficacy belief instrument: A preservice elementary scale. School Science and Mathematics,
90, 694-706.
*Enochs, L. G., Scharmann, L. C., & Riggs, I. M. (1995). The relationship of pupil control to
preservice elementary science teacher self-efficacy and outcome expectancy. Science Edu-
cation, 79, 63-75.
*Gibson, S., & Dembo, M. (1984). Teacher efficacy: A construct validation. Journal of Educa-
tional Psychology, 76, 569-582.
*Grafton, L. G. (1987, November). The mediating influence of efficacy on conflict strategies
used in educational settings. Paper presented at the annual meeting of the Speech Communi-
cation Association, Boston. (ERIC Document Reproduction Service No. ED 288 215)
*Greenwood, G. E., Olejnik, S. F., & Parkay, F. W. (1990). Relationships between four teacher
efficacy belief patterns and selected teacher characteristics. Journal of Research and Devel-
opment in Education, 23, 102-106.
Gronlund, N. E., & Linn, R. L. (1990). Measurement and evaluation in teaching (6th ed.). New
York: Macmillan.
*Guskey, T. R. (1981a). Differences in teachers’ perceptions of the causes of positive versus neg-
ative student achievement outcomes. Paper presented at the annual meeting of the American
Educational Research Association, Los Angeles. (ERIC Document Reproduction Service
No. ED 200 624)
*Minke, K. M., Bear, G. G., Deemer, S. A., & Griffin, S. M. (1996). Teachers’ experiences with
inclusive classrooms: Implications for special education reform. Journal of Special Educa-
tion, 30, 152-186.
Moore, W., & Esselman, M. (1992, April). Teacher efficacy, power, school climate and achieve-
ment: A desegregating district’s experience. Paper presented at the annual meeting of the
American Educational Research Association, San Francisco.
*Mumaw, C. R., Sugawara, A. I., & Pestle, R. (1995). Teacher efficacy and past experiences as
contributors to the global attitudes and practices among vocational home economics teach-
ers. Family and Consumer Sciences Research Journal, 24, 92-109.
*Paese, P. C., & Zinkgraf, S. (1991). The effect of student teaching on teacher efficacy and
teacher stress. Journal of Teaching in Physical Education, 10, 307-315.
Pajares, F. (1996). Self-efficacy beliefs in academic settings. Review of Educational Research,
66, 543-578.
*Parkay, F. W., Greenwood, G., Olejnik, S., & Proller, N. (1988). A study of the relationships
among teacher efficacy, locus of control, and stress. Journal of Research and Development in
Education, 21, 13-22.
*Payne, B. D., & Manning, B. H. (1991). Self-talk of student teachers and resulting relation-
ships. Journal of Educational Research, 85, 47-51.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated
approach. Hillsdale, NJ: Lawrence Erlbaum.
*Pratt, D. L. (1985). Responsibility for student achievement and observed verbal behavior
among secondary science and mathematics teachers. Journal of Research in Science
Teaching, 22, 807-816.
*Podell, D. M., & Soodak, L. C. (1993). Teacher efficacy and bias in special education referrals.
Journal of Educational Research, 86, 247-253.
*Poole, M. G., & Okeafor, K. R. (1989). The effects of teacher efficacy and interactions among
educators on curriculum implementation. Journal of Curriculum and Supervision, 4,
146-161.
Reinhardt, B. (1996). Factors affecting coefficient alpha: A mini Monte Carlo study. In B. Thomp-
son (Ed.), Advances in social science methodology (Vol. 4, pp. 3-20). Greenwich, CT: JAI.
*Rich, Y., Smadar, L., & Fischer, S. (1996). Extending the concept and assessment of teacher ef-
ficacy. Educational and Psychological Measurement, 56, 1015-1025.
*Riggs, I., & Enochs, L. (1989, March). Toward the development of an elementary teacher’s sci-
ence teaching efficacy belief instrument. Paper presented at the annual meeting of the Na-
tional Association for Research in Science Teaching, San Francisco. (ERIC Document Re-
production Service No. ED 308 068)
*Riggs, I., & Enochs, L. (1990). Toward the development of an elementary teacher’s science
teaching efficacy belief instrument. Science Education, 74, 625-638.
*Rose, J. S., & Medway, F. J. (1981). Measurement of teachers’ beliefs in their control over stu-
dent outcome. Journal of Educational Research, 74, 185-190.
*Ross, J. A. (1992). Teacher efficacy and the effect of coaching on student achievement. Cana-
dian Journal of Education, 17, 51-65.
Ross, J. A. (1994). The impact of an inservice to promote cooperative learning on the stability of
teacher efficacy. Teaching and Teacher Education, 10, 381-394.
Rotter, J. B. (1966). Generalized expectancies for internal versus external control of reinforce-
ment. Psychological Monographs, 80, 1-28.
*Saklofske, D. H., Michayluk, J. O., & Randhawa, B. S. (1988). Teachers’ efficacy and teaching
behaviors. Psychological Reports, 63, 407-414.
*Scharmann, L. C., & Orth Hampton, C. M. (1995). Cooperative learning and preservice ele-
mentary teacher science self-efficacy. Journal of Science Teacher Education, 6, 125-133.
*Schoon, K. J., & Boone, W. J. (1998). Self-efficacy and alternative conceptions of science of
preservice elementary teachers. Science Education, 82, 553-568.
*Soodak, L. C., & Podell, D. M. (1993). Teacher efficacy and student problem as factors in spe-
cial education referral. Journal of Special Education, 27, 66-81.
*Soodak, L. C., & Podell, D. M. (1994). Teachers’ thinking about difficult-to-teach students.
Journal of Educational Research, 88, 44-51.
*Soodak, L. C., & Podell, D. M. (1996). Teacher efficacy: Toward the understanding of a
multi-faceted construct. Teaching and Teacher Education, 12, 401-411.
Stein, M. K., & Wang, M. C. (1988). Teacher development and school improvement: The pro-
cess of teacher change. Teaching and Teacher Education, 4, 171-187.
Thompson, B. (1990). ALPHAMAX: A program that maximizes coefficient alpha by selective
item deletion. Educational and Psychological Measurement, 50, 585-589.
Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation.
Journal of Counseling and Development, 70, 434-438.
Thompson, B. (1994). Guidelines for authors. Educational and Psychological Measurement, 54,
837-847.
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable.
Educational and Psychological Measurement, 60, 174-195.
*Tracz, S. M., & Gibson, S. (1986, November). Effects of efficacy on academic achievement. Pa-
per presented at the annual meeting of the California Educational Research Association, Ma-
rina Del Rey, CA. (ERIC Document Reproduction Service No. ED 281 853)
Tschannen-Moran, M., Woolfolk Hoy, A., & Hoy, W. K. (1998). Teacher efficacy: Its meaning
and measure. Review of Educational Research, 68, 202-248.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error af-
fecting score reliability across studies. Educational and Psychological Measurement, 58,
6-20.
Vacha-Haase, T., Kogan, L. R., & Thompson, B. (2000). Sample compositions and variabilities
in published studies versus those in test manuals: Validity of score reliability inductions. Ed-
ucational and Psychological Measurement, 60, 509-522.
Viswesvaran, C., & Ones, D. S. (2000). Measurement error in “Big Five Factors” personality as-
sessment: Reliability generalization across studies and measures. Educational and Psycho-
logical Measurement, 60, 224-235.
*Warren, L. L., & Payne, B. D. (1997). Impact of middle grades’ organization on teacher effi-
cacy and environmental perceptions. Journal of Educational Research, 90, 301-308.
*Wenner, G. (1995). Science knowledge and efficacy beliefs among preservice elementary
teachers: A follow-up study. Journal of Science Education and Technology, 4, 307-315.
Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psy-
chology journals: Guidelines and explanations. American Psychologist, 54, 594-604. (Re-
print available through the APA home page: http://www.apa.org/journals/amp/amp548594.
html.)
*Woolfolk, A. E., & Hoy, W. K. (1990). Prospective teachers’ sense of efficacy and beliefs about
control. Journal of Educational Psychology, 82, 81-91.
*Woolfolk, A. E., Rosoff, B., & Hoy, W. K. (1990). Teachers’ sense of efficacy and their beliefs
about managing students. Teaching and Teacher Education, 6, 137-148.
Yin, P., & Fan, X. (2000). Assessing the reliability of Beck Depression Inventory scores: Reli-
ability generalization across studies. Educational and Psychological Measurement, 60,
201-223.