Evaluasi Pembelajaran
Evaluasi Pembelajaran
Evaluasi Pembelajaran
data (e.g., test scores, ratings, ranked data) because stanines are computed
like percentile ranks but are expressed in standard score form. Thus, the
conversion to stanines is simple, and the standard score feature makes it
possible to add together stanine scores from various measures to obtain a
perfect match with the theoretical distributions shown in Table 14.4. Because all
pupils with the same raw score must be assigned the same stanine score, all we
can reasonably expect to do is approximate the theoretical distribution in Table
14.4 as closely as possible. The step-by-step procedure for assigning stanines to
test scores is shown in Table 14.5.
7. In some cases, it may be desirable to form composite scores from weighted raw scores and then convert
to stanines. Less error is introduced with this procedure.
2, test score såoutd be interpreted in light Of atl of the pupil's characteristics. Test
performance is by the pupil's aptitudes,
,tionat. experiences, cultural background, emotional adjustment, health. and
the -like. Conequendy. when a pupil performs poorly en a tests first consider
the of a cultural disadvantage, a language handicap, improper motivation, or
similar factots that might have interfered with the response tothe let.
test score s*eøld interpreted gceordiag to the ofdecision made: The
meaningfulness ofa test g•ote is determined to a considerable extent by the use
to be made otit. For example, an IQ sore of 100 would have different meanings
ifwe were selecting pupils fot a mentally retarded class, attempting to
•predictachievement in high school, or trying to decide whether a pupil should
encouraged go to college. We will find test scores much more useful when Stop
considering them as bigh or low in general and begin evaluating their
significance in relation .to the decision to be made.
32
I 2 4 6 6 6 4 2 1 73 9 12 15 12
3 5 9 5
I 7 6 I
33 2 4 6 4 2 74
34 3 4 6 6 6 4 3 75
35 3 4 6 7 6 4 3 76 3 5 9 13 16 13 9 5
36 3 6 9 13 15 13 9 6
37 2 3 4 6 7 6 4 3 2 78 3 6 9 13 16 13 9 6
38 1 3 5 6 8 6 5 3 79 3 6 10 13 15 13 6
39 3 5 7 7 7 5 3 80 3 6 9 14 16 14 9 6
40 3 3 6 9 14 17 14 9 6
41 82 3 6 10 14 16 14 10 6
42 2 3 5 7 8 7 5 3 2 83 3 6 14 17 14 10 6
43 2 3 7 9 7 5 3 2 84 4 6 10 14 14 10 6
44 2 3 5 8 8 8 5 3 2 85 3 6 10 17 10 6
45 2 3 5 8 9 8 5 3 2 86 3 6 10 15 18 15 10 6
46 2 3 5 8 10 8 5 3 2 87 4 6 10 15 17 15 6
47 2 3 6 8 9 8 6 3 2 88 3 6 11 15 18 15 11 6
48 2 3 6 8 10 8 6 3 2 89 4 6 15 17 15 11 6
49 4 6 8 9 8 6 4 2 90 4 6 11 15 18 15 11 6
50 2 3 6 9 10 9 6 3 2 91 4 6 11 15 19 15 Il 6
2 3 6 9 9 6 3 2 4 6 11 16 18 16 11 6
52 2 4 6 9 10 9 6 4 2 93 4 6 16 19 16 11 6
53 2 4 6 9 9 6 4 2 94 4 7 I l 16 18 7
54 2 4 7 9 10 9 7 4 2 95 7 11 16 19 16
55 2 4 7 9 9 7 4 2 96 4 7 16 20 16 7
56 2 4 7 9 12 9 7 4 2 97 4 7 12 16 19 16 7
57 2 4 7 10 7 4 2 98 4 7 12 16 20 16 7
58 2 4 7 10 12 7 4 2 99 4 7 12 17 19 17 7
1 on
59 3 4 7 10 Il 7 4 3 4 7 12 17 20
17 7
60 3 4 7 10 12 7 4 3
Stanine Table Showing the Be Assigned Number Eachof Pupils to 1 2 3 Stanine Scores 8 9
90
Adapted from W.N. Durost, The Characteristics. Use. and Camputation
Jovanovich, Inc. Used by rxrmission.
Directions For Illustration I
t. Arrange test papers or answer sheets
in rank order from high to low. On a
separate piece of paper list every score in
a column from the highest obtained
score to the lowest. column (A).
Opposite each score write the number of
individuais who obtained that score. This
may be done by counting the papers
or answer sheets having the same score, or
it may be done by tallying the scores in
the manner shown in column
S. A test score should be verified by supplementary evidence. When interpret ing test scores, it is
impossible to determine fully the extent to which the basic
• been met (i.e.. maximum motivation, equal educational opportunity, and so on).
or to which the conditions of testing have been
precisely controlled (i.e„ administration, scoring, and so on). Consequently, in addition to the
predictable error of measurcmcnt. which can bc taken into account with standard error bands. a test score
may contain an indeterminate amount oferror caused by unmet assumptions or uncontrolled conditions.
Our only protection against such errors is not to rely complctcly on a single test score. As Cronbach8
pointed out:
The most helpful single principle in all testing is that test scores arc data on which to basc further study. They
must be coordinated with background facts, and they must be verified by constant companson with other available
data.
The misinterpretation and misuse of test scores would be substantially reduced if this simple
principle were more widcly recognized. But this caution should not be restricted to test scores; it is
merely a specific application of the more general rule that no important educational decision should
ever be based on one limited sample of performance.
Summary
Test interpretation is complicated because the raw scores obtained for a test
lack a true zero point (point where there is no achievement at all) and equal units
(such as feet, pounds, and minutes). In an attempt to compensate for these missing
properties and to make test scores more readily interpretable, various methods of
expressing test scores have been devised. In general, we can give meaning to a raw
score either by converting it into a description of the specific tasks that the pupil
can perform (criterion-referenced interpretation) or by converting it into some type
of derived score that indicates the pupil's relative position in a clearly aefined
reference group (norm-referencedinterpretation). In some cases both types of
interpretation can be made.
Criterion-referenced test interpretation permits us to describe an individual's
test performance without referring to the performance ofothers. This is typically
done in terms of some universally understood measure of proficiency (e.g.,
speed, precision) or the percentage of items correct in some clearly defined
domain of learning tasks. The percentage-correct score is widely used in
criterion-referenced test interpretation, but it is primarily useful in mastery
testim where a clearly defined and delimited domain of learning tasks can be
mos readily obtained.
Although critcnon-referenced interpretation is frequently possible with stan•
dardized tests, such interpretations must be made with caution because thes€
8. W. Cronbach, Essentials ofPsychological Testing, 3rd ed. (New York: Harper & Row, 1970'.
P. 381.
tests were typically designed to discriminate among individuals rather than to Interpreting
Test Scores describe the specific tasks they can perform. Test PUblishers arc now attempting and Norms
to produce tests that are more amenable to criterion-referenced interpretation.
tation, which involves converting the raw scores to derived scores by means of
Test norms merely represent the typical performance of pupils in thc reference
groups on which the test was standardized and consequently Should not be
viewed as desired goals or standards. The most common types Of norms are
grade norms, age norms, percentile norms, and standard score norms. Each type
has its own characteristics, advantages, and limitations, which must be taken
into account during test interpretation.
Grade norms and age norms describe test performance in terms of the
particular grade or age group in which a pupil's raw score is just average. These
norms are widely used at the elementary school level, largely because of the
apparent ease with which they can be interpreted. Depicting test performance in
terms of grade. and age equivalents can often lead to unsound decisions, how
ever, because the inequality of the units and the invalid assumptions on which
they are based.
Percentile norms and standard score norms describe test performance in terms of
the pupil's relative standing in some meaningful group (e.g., own grade or age group).
A percentile rank indicates the percentage of pupils falling below a particular raw.
score. Percentile units are unequal, but the scores are readüy understood by persons
without. special training. A standard score indicates the number of standard deviation
units a raw score falls above or below the group mean. It has the advantage of
providing equal units that can be treated arithmetically, but persons untrained in
statistics find it difficult to interpret such ;cores. Some of the more common types of
standard scores are zscores, r-scores, deviation IQs, ana stanines.
With a normal distribution of scores, we can readily convert back and forth
)etween standard scores and percentiles, making it possible to utilize the special
Idvantages ofeach. Standard scores can be used to draw on the benefits of equal
mits, and we can convert to percentile equivalents when interpreting test
)erformance to pupils, parents, and those who lack statistical training.
A pupil's performance on several tests that have comparable norms may be
)resented in the form of a profile, making it possible to identify readily areas of
.trength and weakness. Profile interpretation is more apt to be accurate when
tandard error bands are plotted on the profile. Some test profiles also include
larrative reports and/or detailed analysis of the results by content or skill dusters.
This criterion-referenced analysis is especially useful for the instrucional use of
test results.
377
published Selecting and TestsUsing they The are adequacy (l ) relevant, (2) norms
representative, can t,cjudged (3) up by to determining date, (4) comparable,
the extent to and which(5) adequately descnbcd. In
instances, it is more appropriate to use local norms than published norms,
When local norms are desired, percentile and
In addition to a knowledge ofdcrivcd scores and norms, the proper interpretatio•n of test scores
requires an awareness ) what the test measures, (2) the pupil's characteristics and background, (3) the
type ofdecision to be made, (4) thC amount orerror in the score. and (5) the extent to which the score
agrees with other available data. No important educational decision should ever be based on test scores
alone.
Learning Exercises