Evaluasi Pembelajaran

Thus, a pupil with a stanine of 7 in arithmetic and 5 in spelling is
probably demonstrating superior performance in arithmetic.

3. The stanine system makes it possible to readily combine diverse types of
data (e.g., test scores, ratings, ranked data) because stanines are computed
like percentile ranks but are expressed in standard score form. Thus, the
conversion to stanines is simple, and the standard score feature makes it
possible to add together stanine scores from various measures to obtain a
composite Score. A simple summing of stanines will give equal weight to

each measure in the composite.7
4. Because the stanine System uses a single-digit score, it is easily recorded and takes
up less space than other scores. (It was originally developed to fit into a single
column on an IBM card.)
The main limitation of stanine scores is that growth cannot be shown from
one year to the next. If a pupil's progress matches that of the norm group, the same
position in the group will be retained and thus the same stanine assigned. This
shortcoming is, of course, also characteristic of percentile ranks and of other standard
scores used to indicate relative position in a particular group. To determine growth, we
need to examine the increase in the raw scores.
Stanines are sometimes criticized on the grounds that they are rather crude
units, because they divide a distribution of scores into only nine parts. On the plus side,
however, these crude units prevent the overinterpretation of test scores. Although greater
refinement might be desirable for some special pur poses (e.g., identifying gifted pupils),
stanines provide satisfactory discriminations in test performance for most educational
uses oftest results. With a more refined scoring system, there is always the danger that
minor chance differences in scores will be interpreted as significant diftérences.
Assig%ing Stanines to Raw Scores. Transforming raw scores into stanines is
relatively simple ifthe scores are ranked from high to low and there are no ties in rank.
The top 4 percent of the raw scores are assigned a stanine score of 9; the next 7 percent
of the raw scores are assigned a stanine score of 8; the next 12 percent, a stanine score
of 7; ana so on. The percentage of cases falling at each stanine level and the number of
pupils to be assigned each stanine score for any size group from 20 to 100 are shown in
Table 14.4. These figures, showing the number of pupils who should receive each
stanine score, are determined by multiplying the number of cases in the group by the
percentage of cases at each stanine level and rounding off the results.
Distributions of test scores usually contain a number ofpupils with the same
raw score. Consequently, we have ties in rank that prevent us from obtaining a
perfect match with the theoretical distributions shown in Table 14.4. Because all
pupils with the same raw score must be assigned the same stanine score, all we
can reasonably expect to do is approximate the theoretical distribution in Table
14.4 as closely as possible. The step-by-step procedure for assigning stanines to
test scores is shown in Table 14.5.
7. In some cases, it may be desirable to form composite scores from weighted raw scores and then convert
to stanines. Less error is introduced with this procedure.
Cautions in Interpreting Test Scores

iateqyeting test with thc u ofderived in oed and however, a
willingness to to Study keep the mind theetisties Of the
groups addition.
general eautions that apply to the intetvtetation •fatty test
test score $åeetd be interpreted terms Ofthe specifie testfrom it was
derived, NO two scholastic aptitude tests achievement tests exactly the same
things Achievement tests are especial\y Ptone to wide variation. and the
differences are eldom reflected in the test title, For example, one ,aHthmetie test
may bc limited to simple computational skills. whereas another may contain a
number ofrasoning problems. Similarly.one science test maybe confined largely
to Items measuring knowledge ofterminology, whereas another with the same title
stresses the application Of scientific principles. With such variation it is
misleading to interpret a test score as representing achievement in any particular
area, We need to look tryond test titles and to evaluate the pupil's performance in
tenns ofwhat the test .:ctuallydoes
2, test score såoutd be interpreted in light Of atl of the pupil's characteristics. Test
performance is by the pupil's aptitudes,
,tionat. experiences, cultural background, emotional adjustment, health. and
the -like. Conequendy. when a pupil performs poorly en a tests first consider
the of a cultural disadvantage, a language handicap, improper motivation, or
similar factots that might have interfered with the response tothe let.
test score s*eøld interpreted gceordiag to the ofdecision made: The
meaningfulness ofa test g•ote is determined to a considerable extent by the use
to be made otit. For example, an IQ sore of 100 would have different meanings
ifwe were selecting pupils fot a mentally retarded class, attempting to
•predictachievement in high school, or trying to decide whether a pupil should
encouraged go to college. We will find test scores much more useful when Stop
considering them as bigh or low in general and begin evaluating their
significance in relation .to the decision to be made.
test score skag(d be interpreted as g band efseott$ rather than as a

specific value€ Every test score is subject to error which must be allowed for
during test One ofihe best means ofdoing this is to eonsidet a pupil's test
performance as a band of scores i standard error of measurement and the
obtained score; For example; if a pupil earns score of $6 the Standard error is 3,
the test performance should be intemreted asa band rahgiEiBfrom score $9,
Such bands wet* illustrated in the; 9(0fde t;Upresented eartier; roothese Even
error when bands.surrounding they are not plottedy each however, This we witl
preVen!iiSmake
a:'ffromnaking interptvtati00$ that are more precise Ihanthe test tesutts
TABLE 14.4
Containing 20 to 100 cases 4 7 12 Cases 47 12 17 20 17 12 7 a

Cases* 20 61 Il
17 20 17 12 7 4
21 62
22
23
24 8
25 8 4
26 12 1 1
27
8 5
5 5 13 8 5
28 2 3 5 6 5 3 2 69 3 5
29 I 2 4 5 5 5 4 2 70 3 5 8 14 12 8 5
30 2 4 5 6 5 4 2 71
31 2 4 5 7 5 4 2 72 3 5 9 12 14 12 9 5
32
I 2 4 6 6 6 4 2 1 73 9 12 15 12
3 5 9 5
I 7 6 I
33 2 4 6 4 2 74
34 3 4 6 6 6 4 3 75
35 3 4 6 7 6 4 3 76 3 5 9 13 16 13 9 5
36 3 6 9 13 15 13 9 6
37 2 3 4 6 7 6 4 3 2 78 3 6 9 13 16 13 9 6
38 1 3 5 6 8 6 5 3 79 3 6 10 13 15 13 6
39 3 5 7 7 7 5 3 80 3 6 9 14 16 14 9 6
40 3 3 6 9 14 17 14 9 6
41 82 3 6 10 14 16 14 10 6
42 2 3 5 7 8 7 5 3 2 83 3 6 14 17 14 10 6
43 2 3 7 9 7 5 3 2 84 4 6 10 14 14 10 6
44 2 3 5 8 8 8 5 3 2 85 3 6 10 17 10 6
45 2 3 5 8 9 8 5 3 2 86 3 6 10 15 18 15 10 6
46 2 3 5 8 10 8 5 3 2 87 4 6 10 15 17 15 6
47 2 3 6 8 9 8 6 3 2 88 3 6 11 15 18 15 11 6
48 2 3 6 8 10 8 6 3 2 89 4 6 15 17 15 11 6
49 4 6 8 9 8 6 4 2 90 4 6 11 15 18 15 11 6
50 2 3 6 9 10 9 6 3 2 91 4 6 11 15 19 15 Il 6
2 3 6 9 9 6 3 2 4 6 11 16 18 16 11 6
52 2 4 6 9 10 9 6 4 2 93 4 6 16 19 16 11 6
53 2 4 6 9 9 6 4 2 94 4 7 I l 16 18 7
54 2 4 7 9 10 9 7 4 2 95 7 11 16 19 16
55 2 4 7 9 9 7 4 2 96 4 7 16 20 16 7
56 2 4 7 9 12 9 7 4 2 97 4 7 12 16 19 16 7
57 2 4 7 10 7 4 2 98 4 7 12 16 20 16 7
58 2 4 7 10 12 7 4 2 99 4 7 12 17 19 17 7
1 on
59 3 4 7 10 Il 7 4 3 4 7 12 17 20
17 7
60 3 4 7 10 12 7 4 3
Stanine Table Showing the Be Assigned Number Eachof Pupils to 1 2 3 Stanine Scores 8 9
Stanine Scores Stanine for Groups No. of No. of Percentage of Cases

• Adapted from W.N. Durost, The Characteristics, Use, and Computation ofStanines. Copyrigl
1961 by Harcourt Brace Jovanovich, Inc. Used by permission.
Illustration l. Tally sheet for distributi
Scores tk
90
Adapted from W.N. Durost, The Characteristics. Use. and Camputation
Jovanovich, Inc. Used by rxrmission.
Directions For Illustration I
t. Arrange test papers or answer sheets
in rank order from high to low. On a
separate piece of paper list every score in
a column from the highest obtained
score to the lowest. column (A).
Opposite each score write the number of
individuais who obtained that score. This
may be done by counting the papers
or answer sheets having the same score, or
it may be done by tallying the scores in
the manner shown in column
2. Add the frequencies and write the

total at the bottom of the column (D).
This is shown to be 90.
3. Beginning at the bottom. count up

(cumulate) to one-half the total number
of scores. in this case 45 (one-halfof
90). This falls opposite the score of 34
(E). which is the median to the nearest
whole number.
4. In the column at the extreme left of

thc Stanine Table (Table 14.4), look up
the total number of cases (90). In this row
are the theoretical frequencies of cases at
each stanine level for 90 cases. In the
middle Of this row you will find the
number of cases (18) to which a stanine
of 5 should be assigned. Starting with
median (in Illustra,ion l), lay off as nearly
this number (18) of scores as you can.
Here it is 20.
5. Working upward and downward

from scores falling in stanine 5, assign
scores to stanine levels so as to give the
closest approximation possible to the
theoretical values. It is helpful to bracket
these scores in the manner shown in
column (A).
After having made a tentative

assignment, make any adjustments
necessary to bring the actual
frequencies at each level into the
closest possible agreement with the
theoretical values. Remember,
however, that all equal scores must
be assigned the same stanines.
n of scores.
Selecting and Using Treating small chance between test scores as though they were Published Tests significant
can only lead to erroneous decisions,
S. A test score should be verified by supplementary evidence. When interpret ing test scores, it is
impossible to determine fully the extent to which the basic
• been met (i.e.. maximum motivation, equal educational opportunity, and so on).
or to which the conditions of testing have been
precisely controlled (i.e„ administration, scoring, and so on). Consequently, in addition to the
predictable error of measurcmcnt. which can bc taken into account with standard error bands. a test score
may contain an indeterminate amount oferror caused by unmet assumptions or uncontrolled conditions.
Our only protection against such errors is not to rely complctcly on a single test score. As Cronbach8
pointed out:
The most helpful single principle in all testing is that test scores arc data on which to basc further study. They
must be coordinated with background facts, and they must be verified by constant companson with other available
data.
The misinterpretation and misuse of test scores would be substantially reduced if this simple
principle were more widcly recognized. But this caution should not be restricted to test scores; it is
merely a specific application of the more general rule that no important educational decision should
ever be based on one limited sample of performance.
Summary
Test interpretation is complicated because the raw scores obtained for a test
lack a true zero point (point where there is no achievement at all) and equal units
(such as feet, pounds, and minutes). In an attempt to compensate for these missing
properties and to make test scores more readily interpretable, various methods of
expressing test scores have been devised. In general, we can give meaning to a raw
score either by converting it into a description of the specific tasks that the pupil
can perform (criterion-referenced interpretation) or by converting it into some type
of derived score that indicates the pupil's relative position in a clearly aefined
reference group (norm-referencedinterpretation). In some cases both types of
interpretation can be made.
Criterion-referenced test interpretation permits us to describe an individual's
test performance without referring to the performance ofothers. This is typically
done in terms of some universally understood measure of proficiency (e.g.,
speed, precision) or the percentage of items correct in some clearly defined
domain of learning tasks. The percentage-correct score is widely used in
criterion-referenced test interpretation, but it is primarily useful in mastery
testim where a clearly defined and delimited domain of learning tasks can be
mos readily obtained.
Although critcnon-referenced interpretation is frequently possible with stan•
dardized tests, such interpretations must be made with caution because thes€
8. W. Cronbach, Essentials ofPsychological Testing, 3rd ed. (New York: Harper & Row, 1970'.
P. 381.
tests were typically designed to discriminate among individuals rather than to Interpreting
Test Scores describe the specific tasks they can perform. Test PUblishers arc now attempting and Norms
to produce tests that are more amenable to criterion-referenced interpretation.
Expectancy tables also provide a type of criteriOn-rcfcrenced

Instead of describing an individual's performanCC on the test tasks,
interpretation.it indicates
expected performance in some situation beyond the test (e.g., success in college)'
Expectancy tables provide a simple and direct means Of interpreting test
results without the aid of test norms.
Standardized tests have typically been designed for norm-rcfcrcnccd interpre-
tation, which involves converting the raw scores to derived scores by means of
tables of norms. These derived scores indicate a pupil's relative position in a

particular reference group. They have the advantage over raw scores of provid-
ing more uniform meaning from one test to another and from one situation to
another.
Test norms merely represent the typical performance of pupils in thc reference
groups on which the test was standardized and consequently Should not be
viewed as desired goals or standards. The most common types Of norms are
grade norms, age norms, percentile norms, and standard score norms. Each type
has its own characteristics, advantages, and limitations, which must be taken
into account during test interpretation.
Grade norms and age norms describe test performance in terms of the
particular grade or age group in which a pupil's raw score is just average. These
norms are widely used at the elementary school level, largely because of the
apparent ease with which they can be interpreted. Depicting test performance in
terms of grade. and age equivalents can often lead to unsound decisions, how
ever, because the inequality of the units and the invalid assumptions on which
they are based.
Percentile norms and standard score norms describe test performance in terms of
the pupil's relative standing in some meaningful group (e.g., own grade or age group).
A percentile rank indicates the percentage of pupils falling below a particular raw.
score. Percentile units are unequal, but the scores are readüy understood by persons
without. special training. A standard score indicates the number of standard deviation
units a raw score falls above or below the group mean. It has the advantage of
providing equal units that can be treated arithmetically, but persons untrained in
statistics find it difficult to interpret such ;cores. Some of the more common types of
standard scores are zscores, r-scores, deviation IQs, ana stanines.
With a normal distribution of scores, we can readily convert back and forth
)etween standard scores and percentiles, making it possible to utilize the special
Idvantages ofeach. Standard scores can be used to draw on the benefits of equal
mits, and we can convert to percentile equivalents when interpreting test
)erformance to pupils, parents, and those who lack statistical training.
A pupil's performance on several tests that have comparable norms may be
)resented in the form of a profile, making it possible to identify readily areas of
.trength and weakness. Profile interpretation is more apt to be accurate when
tandard error bands are plotted on the profile. Some test profiles also include
larrative reports and/or detailed analysis of the results by content or skill dusters.
This criterion-referenced analysis is especially useful for the instrucional use of
test results.
377
published Selecting and TestsUsing they The are adequacy (l ) relevant, (2) norms
representative, can t,cjudged (3) up by to determining date, (4) comparable,
the extent to and which(5) adequately descnbcd. In
instances, it is more appropriate to use local norms than published norms,
When local norms are desired, percentile and
In addition to a knowledge ofdcrivcd scores and norms, the proper interpretatio•n of test scores
requires an awareness ) what the test measures, (2) the pupil's characteristics and background, (3) the
type ofdecision to be made, (4) thC amount orerror in the score. and (5) the extent to which the score
agrees with other available data. No important educational decision should ever be based on test scores
alone.
Learning Exercises
l. Describe the cautions nccded in making criterion-referenced interpretations of

standardized achievement tests.
2. Describe the meaning of raw scores and derived scores.
3. A pupil an average grade score of 6.8 on a standardized achievement battery administered in the
fall of the year. What arguments might be presented for and against moving the pupil ahead to the sixth grade?
4. What advantages do stanines have over T-scores? What disadvantages?
5. Explain each of the following statements:
a. Standard scores provide equal units.
b. Percentile scores provide systematically unequal units.
c. Grade equivalent scores provide unequal units that vpry unpredictably.
6. Assuming that all of the following test scores were obtained from the same normally distributed group, which
score would indicate the highest performance? Which the lowest?
a. .65
b. T-score = 65.
c. Percentile score = 65.
7. Consult the section on "norming" in the new edition of Standardsfor Educational and
Pschological Testing. and review the types of information that test manuals should
contain. Compare a recent test manual with the Standards. (See the reference in
"Suggestions for Further Reading. ")
8. What is the difference between a norm and a standard? Why shouldn't test norms be
used as standards ofgood performance?
9. What is the value of using national norms? Under what conditions is it
desirable tc use local norms?
10. What are the relative advantages and disadvantages of using local norms for disad•
vantaged pupils? For what purposes are more general norms (e.g., national) usefu with
these pupils?
Suggestions for Further Reading
AMERICAN PSYCHOLOGICAL ASSOCIATION. Standards for Educational
and Psychologica Testing. Washington, D.C.: A.P.A., 1985. See the test
standards in Part I for what V look for in test
manuals and Parts Il and Ill for material on
the effective intepretati01

Evaluasi Pembelajaran

Uploaded by

Copyright:

Available Formats

Evaluasi Pembelajaran

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluasi Pembelajaran

Uploaded by

Copyright:

Available Formats

Thus, a pupil with a stanine of 7 in arithmetic and 5 in spelling is

probably demonstrating superior performance in arithmetic.

composite Score. A simple summing of stanines will give equal weight to

Cautions in Interpreting Test Scores

test score skag(d be interpreted as g band efseott$ rather than as a

Containing 20 to 100 cases 4 7 12 Cases 47 12 17 20 17 12 7 a

Stanine Scores Stanine for Groups No. of No. of Percentage of Cases

2. Add the frequencies and write the

3. Beginning at the bottom. count up

4. In the column at the extreme left of

5. Working upward and downward

After having made a tentative

Expectancy tables also provide a type of criteriOn-rcfcrenced

Standardized tests have typically been designed for norm-rcfcrcnccd interpre-

tables of norms. These derived scores indicate a pupil's relative position in a

l. Describe the cautions nccded in making criterion-referenced interpretations of

You might also like