UNIT 05: Reliability: Module Overview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9
At a glance
Powered by AI
The key takeaways are that reliability refers to consistency in measurement and is important to know for tests. Different types of reliability include test-retest, alternative forms, split-half, and inter-scorer reliability.

The different types of reliability estimates discussed are test-retest reliability, alternative forms reliability, split-half reliability, and inter-scorer reliability.

Some sources of error variance in test construction are item sampling/content sampling, differences in how items are worded, and the specific content sampled.

MODULE OVERVIEW

Reliability refers to the consistency of the test


scores obtained by the same persons when
UNIT 05: they are reexamined with the same test on
different occasions, or with different sets of
RELIABILITY equivalent items, or under other variables
examining conditions ( Anastasi & Urbina,
1997). This unit will explore the different kinds
of reliability co-efficients, including those for
measuring test – retest reliability, alternative
forms reliability, split half reliability, and
inter-score reliability.
DR. EVA MARIE P. GACASAN
DR. GWENDELINA A. VILLARANTE
1
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
LEARNING OUTCOMES OF THE
MODULE
• Explain the concept of reliability.
• Identify the different reliability estimates
• Describe the purpose of using and interpreting a coefficient of reliability
• Discuss about reliability and individual scores with respect to the types
of standard errors.

2
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
LECTURE CONTENT
UNIT 5: RELIABILITY
Reliability refers to consistency in measurement. It is also a synonym for dependability or consistency.
It is important for us, as users of tests and consumers of information about tests, to know how reliable tests and
other measurement procedures are.
A reliability coefficient is an index of reliability, a proportion that indicates the ratio between the true score
variance on a test and the total variance.

THE CONCEPT OF RELIABILITY


Because true differences are assumed to be stable, they are presumed to yield consistent scores on repeated
administrations of the same tests as well as on equivalent forms of tests.

SOURCES OF ERROR VARIANCE TEST CONSTRUCTION


• Item sampling or Content sampling refers to variation among items within a test as well as to variation
among items between tests.
• Differences are sure to be found in the way the items are worded and in the exact content sampled.
• The higher score would be due to the specific content sampled, the way the items were worded, and so on.

3
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
Test administration
• Examples of untoward influences during administration of a test include factors related to the test environment: the
room temperature, the level of lighting, and the amount of ventilation and noise, for instance.
• Other environment-related variables includes the instrument used to enter responses and even the writing surface
on which responses are entered.
• administration takes into consideration testtaker variables. Pressing emotional problems, physical discomfort, lack
of sleep, and the effect of drugs or medication can all be sources of error variance.
• Examiner-related variables are potential sources of error variance.

TEST SCORING AND INTERPRETATION


The advent of computer-scoring and a growing reliance on objective, computer-scorable items virtually have eliminated
error variance caused by scorer differences in many tests
• In some tests of personality, examinees are asked to supply open-ended responses to stimuli such as pictures,
words, sentences, and inkblots, and it is the examiner who must then quantify or qualitatively evaluate responses.
• Scorers and scoring systems are potential sources of error variance.
• Examiner/scorers occasionally still are confronted by situations where an examinee’s response is in a gray area.

4
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
RELIABILITY ESTIMATES
Test-retest Reliability estimates
• One way of estimating the reliability of a measuring instrument is by using the same instrument to measure the
same thing at two points in time.
• Test-retest reliability is an estimate of reliability obtained by correlating pairs of scores from the same people on
two different administrations of the same test. The test-retest measure is appropriate when evaluating the reliability
of a test that purports to measure something that is relatively stable over time, such as a personality trait.
• When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred
to as the coefficient of stability.
• An estimate of test-retest reliability may be most appropriate in gauging the reliability of tests that employ outcome
measures such as reaction time or perceptual judgements.

5
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
Parallel-Forms and Alternate-Forms Reliability Estimates
• The degree of the relationships between various forms of a test can be evaluated by means if an alternate-forms or
parallel-forms coefficient of reliability, which is often termed the coefficient of equivalence.
• Parallel forms of a test exist when, for each form of the test, the means and the variance of observed test scores are
equal.
• Altername forms are simply different versions of a test that have been constructed so as to be parallel. Alternate forms
of a test are typically designed to be equivalent with respect to variables such a content and level of difficult
• Developing alternate forms of test can be time-consuming and expensive.
• It minimizes the effect of memory for the content of a previously administered form of the test.
Logically enough, it is referred to as an internal consistency estimate of reliability or as an estimate of inter-item consistency
estimates of reliability.

Split-half estimate
• An estimate of split-half reliability is obtained by correlating two pairs of scores obtained from equivalent halves of a
single test administered once.
• One acceptable way to split a test is to randomly assign items to one or the other half of the test. Another acceptable
way to split a test is to assign odd-numbered items to one half of the test and even-numbered items to the other half.

6
The Spearman-Brown Formula

DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY


• The Spearman-Brown formula allows a test developer or user to estimate internal consistency reliability from a correlation
of two halves of a test.
• Usually, but not always, reliability increases as test length increases. Ideally, the additional test items are equivalent with
respect to the content and the range of difficulty of the original items.
• A Spearman-Brown formula could also be used to determine the number of items needed to attain a desired level of
reliability.

Other Methods of Estimating Internal Consistency


• Developed by Kuder and Richardson (1937) and Cronbach(1951). Inter-item consistency refers to the degree of
correlation among all the items on a scale. A measure of inter-item consistency is calculated from a single administration
of a single form of a test.
• An index of inter-item consistency, in turn, is useful in assessing the homogeneity of the test.
• Homogeneity is the extent to which items in a scale are unifactorial. Homogeneity describes the degree to which is a test
measures different factors. A heterogeneous test is composed of items that measure more than one trait.
• Testtakers with the same score on a homogeneous test probably have similar abilities in the area tested.
The Kuder-Richardson formulas
G. Frederic Kurder and M. W. Richardson.
Kuder-Richardson formula test items are highly homogeneous, KR-20 and split-half reliability estimates will be similar.

7
Coefficient alpha

DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY


Developed by Cronbach (1951)
• Coefficient Alpha may be thought of as the mean of all possible split-half correlations, corrected by the
Spearman-Brown formula.
• Coefficient alpha is the preferred statistic for obtaining an estimate of internal consistency reliability.

❖ Essentially, this formula yields an estimate of the mean of all possible test-retest, split-half coefficients. Coefficient
alpha is widely used as a measure of reliability, in part because it requires only one administration of the test.
❖ Scorer reliability, judge reliability, observer reliability. Inter-scorer reliability is the degree of agreement or
consistency between two or more scorers (or judges or raters) with regard to a particular measure.

Homogeneity versus heterogeneity of test items


• Recall that a test is said to be homogeneous in items if it is functionally uniform throughout.
• By contrast, if the test is heterogeneous in items, an estimate of internal consistency might be low relative to a
more appropriate estimate of test-retest reliability.

Criterion-referenced test
• A Criterion-referenced test is designed to provide an indication of where a test taker stands with respect to some
variable or criterion, such as an educational or vocational objective.,
• Scores on criterion-referenced tests tends to be interpreted in pass-fail terms, and any scrutiny of performance on
individual items tends to be for diagnostic and remedial purposes.
8
DEPARTMENT OF PSYCHOLOGY| CEBU NORMAL UNIVERSITY
REFERENCES AND MATERIALS
Main Text:
• Cohen, R. J. & Swerdlik, M.E. (2018). Psychological testing andassessment. New
York, NY: McGraw-Hill.

Supplementary Text:
• Anastasi, A. & Urbina, S. (2001). Psychological testing. Singapore: Pearson
Education Asia PTE. LTD.
For other books and materials e.g. Psychological Assessment Report Template that are
used in this course, they are found in our Google Classroom.

You might also like