Chapter 5 Reliability

RELIABILITY
RELIABILITY
• Reliability - is a synonym for dependability or
consistency.
• For example, a reliable friend who is always there for us
in a time of need.
RELIABILITY
• Reliability - refers to the consistency in measurement
• It is the ability of the test to give consistent results
• Refers to the proportion of the total variance attributed
to true variance. The greater the proportion of the total
variance attributed to true variance, the more reliable
the test.
What causes
Inconsistency?
CONCEPT OF RELIABILITY
• A score on an ability test is presumed to reflect not
only the testtaker’s true score on the ability being
measured but also error.
• Error refers to the component of the observed test
score that does not have to do with the testtaker’s
ability.
• Observed Score = True Score + Error
• A statistic useful in describing sources of test score
variability is the variance —the standard deviation
squared.
• Variance from true differences is true variance, and
variance from irrelevant, random sources is error
variance.
• Total variance = True Variance + Error Variance
• Error variance represents any condition that is
irrelevant to the purpose of the test.
• It is reduced by controlling the test environment,
instructions, time limit, rapport, etc.
• A systematic error source does not change the
variability of the distribution or affect reliability.
• For example, a weighing scale consistently
under weighed everyone who stepped on it by 5
pounds, then the relative standings of the
people would remain unchanged.
SOURCES OF ERROR VARIANCE
Test Construction
• Item sampling or content sampling, or the
variation among items within a test, as well as to
variation among items between tests.
• The extent to which a testtaker’s score is affected
by the content sampled on the test and by the way
the content is sampled (that is, the way in which
the item is constructed) is a source of error
variance.
Test Administration (Testing procedures)
• Test environment
• room temperature, level of lighting, ventilation,
changes in weather, broken pencil point, and noise
• Test-Taker variables
• emotional problems, physical discomfort, lack of
sleep, illness, fatigue, drugs or medications taken,
and worry
• Examiner-related variables
• physical appearance and demeanor, manner of
speaking, emphasis on certain words (unknowingly
providing clues), eye nodding, other nonverbal
gestures.
Test Scoring and Interpretation (Scoring System)
• Hand-scoring versus machine scoring
• Objective versus subjective scoring (projective)
• In some tests of personality, examinees are
asked to supply open-ended responses to
stimuli such as pictures, words, sentences, and
inkblots, and it is the examiner who must then
quantify or qualitatively evaluate responses.
TEST-RETEST RELIABILITY ESTIMATES
• One way of estimating the reliability of a measuring
instrument is by using the same instrument to
measure the same thing at two points in time.
• Test-retest reliability is an estimate of reliability
obtained by correlating pairs of scores from the
same people on two different administrations of
the same test.
• The test-retest measure is appropriate when
evaluating the reliability of a test that purports to
measure something that is relatively stable over
time, such as a personality trait.
• Consider possible intervening factors between
test administrations to attain proper conclusions
about the reliability of the measuring
instrument
• An estimate of test-retest reliability may be most
appropriate in gauging the reliability of tests that
employ outcome measures such as reaction time
or perceptual judgments (including discriminations
of brightness, loudness, or taste).
• This may still be confounded due to experience,
practice, memory, fatigue, and motivation)
may intervene and confound an obtained
measure of reliability.
PARALLEL-FORMS AND ALTERNATE-FORMS
RELIABILITY ESTIMATES
*Make up Exams
• Parallel forms of a test exist when, for each form of
the test, the means and the variances of observed
test scores are equal.
• More practically, scores obtained on parallel tests
correlate equally with other measures.

• Alternate forms are simply different versions of a
test that have been constructed so as to be
parallel.
• Although they do not meet the requirements for
the legitimate designation “parallel,” alternate
forms of a test are typically designed to be
equivalent with respect to variables such as
content and level of difficulty.
1. Two test administrations with the same group are
required
2. Test scores may be affected by factors such as
motivation, fatigue, or intervening events such as
practice, learning, or therapy (although not as
much as when the same test is administered
twice).
• An additional source of error variance, item
sampling, is inherent in the computation of an
alternate- or parallel-forms reliability coefficient.
• Testtakers may do better or worse on a specific
form of the test not as a function of their true
ability but simply because of the particular items
that were selected for inclusion in the test.
• Developing alternate forms of tests can be time-
consuming and expensive.
• An estimate of the reliability of a test can be
obtained without developing an alternate form of
the test and without having to administer the test
twice to the same people.
• internal consistency estimate of reliability or as an
estimate of inter-item consistency.
SPLIT-HALF RELIABILITY ESTIMATES
• Split-half reliability is obtained by correlating two

pairs of scores obtained from equivalent halves of a
single test administered once.

Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two
halves of the test.
Step 3. Adjust the half-test reliability using the
Spearman-Brown formula

• Dividing the test in the middle is not recommended

Acceptable ways to Split half:
1. randomly assign items to one or the other half of
the test
2. assign odd-numbered items to one half of the test
and even-numbered items to the other half also
referred to as odd-even reliability
3. divide the test by content so that each half contains
items equivalent with respect to content and
difficulty
• The Spearman-Brown formula allows a test

developer or user to estimate internal consistency
reliability from a correlation of two halves of a test.

• Usually, but not always, reliability increases as test

length increases.
• If test developers or users wish to shorten a test, the
Spearman-Brown formula may be used to estimate
the effect of the shortening on the test’s reliability.
OTHER METHODS OF ESTIMATING INTERNAL
CONSISTENCY
• Inter-item consistency refers to the degree of
correlation among all the items on a scale.
• An index of interitem consistency, in turn, is useful in
assessing the homogeneity of the test.
• homogeneous - test measures a single factor.
• heterogeneity - describes the degree to which a test
measures different factors.
CONSISTENCY
• KR-20 is the statistic of choice for determining the
inter-item consistency of dichotomous items,
primarily those items that can be scored right or
wrong
• multiple-choice items

CONSISTENCY
CONSISTENCY
CONSISTENCY
• Coefficient alpha is the preferred statistic for
obtaining an estimate of internal consistency
reliability
• Coefficient alpha is appropriate for use on tests
containing nondichotomous items.
CONSISTENCY
CONSISTENCY
MEASURES OF INTER-SCORER RELIABILITY
• Inter-scorer reliability is the degree of agreement or

consistency between two or more scorers (or judges
or raters) with regard to a particular measure.
MEASURES OF INTER-SCORER RELIABILITY

Chapter 5 Reliability

Uploaded by

Copyright:

Available Formats

Chapter 5 Reliability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 Reliability

Uploaded by

Copyright:

Available Formats

RELIABILITY

• Split-half reliability is obtained by correlating two

• Dividing the test in the middle is not recommended

• The Spearman-Brown formula allows a test

• Usually, but not always, reliability increases as test

• Inter-scorer reliability is the degree of agreement or

You might also like