Sadler
Sadler
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Springer is collaborating with JSTOR to digitize, preserve and extend access to Instructional Science.
http://www.jstor.org
D. ROYCE SADLER
Assessment and Evaluation Research Unit, Department of Education, University of Queensland,
St Lucia, Queensland 4067, Australia
Abstract. The theory of formative assessment outlined in this article is relevant to a broad spectrum of
learning outcomes in a wide variety of subjects. Specifically, it applies wherever multiple criteria are
used in making judgments about the quality of student responses. The theory has less relevance for
outcomes in which student responses may be assessed simply as correct or incorrect. Feedback is
defined in a particular way to highlight its function in formative assessment. This definition differs in
several significant respects from that traditionally found in educational research. Three conditions for
effective feedback are then identified and their implications discussed. A key premise is that for stu
dents to be able to improve, they must develop the capacity to monitor the quality of their own work
during actual production. This in turn requires that students possess an appreciation of what high qual
ity work is, that they have the evaluative skill necessary for them to compare with some objectivity the
quality of what they are producing in relation to the higher standard, and that they develop a store of
tactics or moves which can be drawn upon to modify their own work. It is argued that these skills can
be developed by providing direct authentic evaluative experience for students. Instructional systems
which do not make explicit provision for the acquisition of evaluative expertise are deficient, because
they set up artificial but potentially removable performance ceilings for students.
Introduction
This article is about the nature and function of formative assessment in the devel
opment of expertise. It is relevant to a wide variety of instructional systems in
which student outcomes are appraised qualitatively using multiple criteria. The
focus is on judgments about the quality of student work: who makes the judg
ments, how they are made, how they may be refined, and how they may be put to
use in bringing about improvement. The article is prompted by two overlapping
concerns. The firstis with the lack of a general theory of feedback and formative
assessment in complex learning settings. The second concern follows from the
common but puzzling observation that even when teachers provide students with
valid and reliable judgments about the quality of their work, improvement does
not necessarily follow. Students often show little or no growth or development
despite regular, accurate feedback. The concern itself is with whether some lear
ners fail to acquire expertise because of specific deficiencies in the instructional
system associated with formative assessment.
The discussion begins with definitions of feedback, formative assessment and
qualitative judgments. This is followed by an analysis of certain patterns in
teacher-student assessment interactions. A number of causal and conditional
linkages are then identified. These in turn are shown to have implications for the
design of instructional systems which are intended to develop the ability of
students to exercise executive control over their own productive activities, and
eventually to become independent and fully self-monitoring.
Etymology and common usage associate the adjective formative with forming or
moulding something, usually to achieve a desired end. In this article, assessment
denotes any appraisal (or judgment, or evaluation) of a student's work or perfor
mance. (In some contexts, assessment is given a narrower and more specialized
meaning; some North American readers in particular may prefer to substitute the
term evaluation for assessment.)
formative and summative assessment relates to purpose and effect, not to timing.
associated with success or high quality can be recognized and reinforced, and
unsatisfactory aspects modified or improved.
An important feature of Ramaprasad's definition is that information about the
gap between actual and reference levels is considered as feedback only when it is
used to alter the gap. If the information is simply recorded, passed to a third party
who lacks either the knowledge or the power to change the outcome, or is too
deeply coded (for example, as a summary grade given by the teacher) to lead to
appropriate action, the control loop cannot be closed and "dangling data" substi
tute for effective feedback. In any area of the curriculum where a grade or score
assigned by a teacher constitutes a one-way cipher for students, attention is
diverted away from fundamental judgments and the criteria for making them. A
grade therefore may actually be counterproductive for formative purposes.
In assessing the quality of a student's work or performance, the teacher must
possess a concept of quality appropriate to the task, and be able to judge the stu
dent's work in relation to that concept. But although the students may accept a
teacher's judgment without demur, they need more than summary grades if they
are to develop expertise intelligently. The indispensable conditions for improve
ment are that the student comes to hold a concept of quality roughly similar to that
held by the teacher, is able to monitor continuously the quality of what is being
produced during the act of production itself, and has a repertoire of alternative
moves or strategies from which to draw at any given point. In other words, stu
dents have to be able to judge the quality of what they are producing and be able
to regulate what they are doing during the doing of it. As Shenstone (correctly)
put it over two centuries ago, "Every good poet includes a critick; the reverse will
not hold" (Shenstone, 1768, p. 172).
Stated explicitly, therefore, the learner has to (a) possess a concept of the stan
dard (or goal, or reference level) being aimed for, (b) compare the actual (or cur
rent) level of performance with the standard, and (c) engage in appropriate action
which leads to some closure of the gap. These three conditions form the organiz
ing framework for this article. It will be argued that they are necessary conditions,
which must be satisfied simultaneously rather than as sequential steps. It is never
theless useful to make a conceptual distinction between the conditions. The
(macro) process of grading involves the first two in that it is essentially compar
ing a particular case either with a standard or with one or more other cases.
Control during production involves all three conditions and is, by contrast, a
(micro) process carried out in real time. Judging from assessment practices com
mon in many subjects, information generated without the participation of the lear
ner but made available to the learner from time to time (as intelligence) is
evidently assumed to satisfy these conditions. A detailed examination of the three
conditions shows why this assumption falls short of what is actually necessary.
make the judgments without reference to each other, and in that sense work
independently.
5. If numbers (or marks, or scores) are used, they are assigned after the judgment
has been made, not the reverse. In making qualitative judgments, the final deci
sion is never arrived at by counting things, making physical measurements, or
compounding numbers and looking at the sheer magnitude of the result.
Complex learning outcomes of the type that are assessed by making direct qualita
tive judgments are common in a wide variety of subjects in secondary, vocational,
furtherand higher education. These subjects include English, foreign languages,
humanities, manual and practical arts, social sciences, and the visual and perform
ing arts. They are also important in industrial training and in many areas of
science and mathematics, particularly where students are required to devise exper
iments, formulate hypotheses or explanations, carry out open-ended field or labor
tence separate from the learner. That is, it is an artefact which is open to leisurely
inspection. Examples include essays, musical compositions, welding jobs, and
articles of pottery. If the scaffolding used in the construction of the work is care
fully dismantled, the final product may retain no evidence of false starts, unfruit
ful paths followed in its production, or (if it has not been produced under time
constrained test conditions), the time taken to produce it. The product is, in fact,
infinitely malleable prior to its release, and the author can modify it by any
desired amount. A contrasting type of end "product" is when the learner's work is
transient, such as a live production performed by the learner in real time.
less identical. What is assessed in these situations is essentially the learner's pro
ductive skill. Assessing such outcomes may or may not involve making qualita
tive judgments, depending on the number and nature of the criteria. In other fields
(such as writing), design itself is an integral component of the learning task,
although it may be so closely linked with production that is does not appear as a
distinct phase. In yet other fields (such as fashion and architecture), design itself
may be the primary consideration. Wherever the design aspect is present, qualita
tive judgments are necessary and quite divergent student responses could, in prin
ciple and without compromise, be judged to be of equivalent quality.
Earlier in this article, it was argued that the transition from feedback to self
monitoring can occur only when three conditions are satisfied. The firstof these is
that the student comes to know what constitutes quality. In a teaching setting, this
presupposes that the teacher already possesses this knowledge, and that it must
somehow be shared with the student. In a particular context, however, it is often
difficult for teachers to describe exactly what they are looking (or hoping) for,
although they may have little difficulty in recognizing a fine performance when it
occurs among student responses. Teachers' conceptions of quality are typically
held, largely in unarticulated form, inside their heads as tacit knowledge. By defi
nition, experienced teachers carry with them a history of previous qualitative
judgments, and where teachers exchange student work among themselves or col
laborate in making assessments, the ability to make sound qualitative judgments
constitutes a form of guild knowledge.
While such in-the-head standards exhibit a degree of stability, they are not
immutable but can be shown to adapt to the circumstances. In particular, teachers
are often strongly influenced by the range of quality which exists among a set of
things to be appraised, and typically find it difficult to make an isolated judgment
of quality (that is, without reference to other students' work). Teachers tacitly
acknowledge the difficulty of relying on memory alone when they make a survey
of pieces of student work before assigning grades to them. This survey generates a
loosely quantitative baseline or frame of reference for what is to be regarded as
barely satisfactory and what is to count as excellent in the context. Even after a
survey has been made, however, smaller scale order effects (especially severity,
leniency, and carryover) almost invariably occur. This is a subject of continuing
research (see, for example, the work of Hales and Tokar, 1975, and Daly and
Dickson-Markman, 1982) and can be interpreted in terms of Helson's (1959)
adaptation level theory. It therefore appears that teachers' conceptions of quality
and standards exist in some quiescent and pliable form until they are reconstituted
by fresh evaluative activity.
In an instructional system, an exclusive reliance on teachers' guild knowledge
works against the interests of the learner in two important ways. In the first place,
is often a wide variety of objects within the same genre which are regarded as
excellent. Unless students come to this understanding, and learn how to abstract
the qualities which run across cases with different surface features but which are
judged equivalent, they can hardly be said to appreciate the concept of quality at
all.
The second consideration is that originality and creativity are not usually, con
trary to some opinion, best developed in a completely freewheeling environment.
Bailin (1987) pointed out that there is no essential conflict between creative
processes and the production of something which is generally accepted as of high
quality. Creative productions are mostly highly disciplined, and are almost invari
ably produced not by accident or through random risk taking but when the
producer, by being thoroughly conversant with the characteristics of the discipline
or genre, understands when and how to transcend the normal boundaries.
Knowing the metacriteria, that is, knowing when the suspension of some criterion,
even on occasion a principal one, can be justified in favour of another, is an
important element in creativity. But to return to the issue of exemplars, it is the
experience of many teachers that even if some students do in fact copy, they may
learn something valuable in the process. Emulation is an ancient and still almost
universal learning method. When students have gained whatever they can from, in
the worst case, slavish copying, there is time for the teacher to wean them away
from it.
Students develop a concept of a reference level more readily in some learning
contexts than in others. In the manual, visual and performing arts, for example,
students are usually able to observe, as a matter of course, the results of other stu
dents' efforts together with the teachers' appraisals of those efforts, simply
because the work is produced in workshops, studios, theatres and other open envi
ronments. The best examples, or perhaps exemplary material developed outside
the classroom, serve naturally and unobtrusively as reference points. In the liberal
arts and humanities, however, students often work privately, and do not get to see
or read what other students have produced. What constitutes work of high quality
then remains to some extent unknown. Exceptional cases aside, it is ironic that the
prototypes of competency levels which Myers (1980) recommended as necessary
for assessors using holistic methods for the evaluation of writing are not similarly
considered a general requirement for students learning to write or to master other
complex skills.
at which expectations are raised is consistently greater than the rate of improve
ment, the inability of the student to keep pace results in little or no sense of
accomplishment even though improvement may actually be occurring. This in
turn may lead to a situation where successive attempts are taken less and less seri
(rhetoric, style, persuasiveness); some apply to particular genres of writing but not
to others (referencing); and some logically subsume others (mechanics subsumes
spelling). Many are operationally correlated together, so that whenever an attempt
is made to change a piece of writing according to one dimension, other properties
are inevitably affected at the same time. For example, it may be impossible to
change the vocabulary of a piece of writing without simultaneously affecting the
tone. In short, this set of criteria is large and includes subsets which overlap and
interlock. It is therefore obvious that behind the customary published lists (usually
consisting of from seven to ten criteria) there lies a much larger set of potential
criteria that could be brought into play if and when the need arises. Given this
fact, and the complex interrelations which exist among the criteria, it is clear that
to use the whole set for a particular assessment would be unmanageable. How
the other for purposes of formative assessment. Both can be drawn upon because
evaluative input can take any appropriate form, and in any case is always open to
discussion, clarification, and revision if necessary.
The first general line of attack is to devise and implement a procedure which
begins with identifying a number of relevant criteria, then measures the amount
present on each criterion and combines the various levels or estimates into an
overall measure of merit by means of a formula. The criteria are treated separ
ately, so that the order in which characteristics are considered is arbitrary and has
no effect on the final result. The combining formula may be simple, and require
only the addition of weighted or unweighted component scores or ratings. On the
other hand, the formula may be complicated (taking, for example, conjunctive or
disjunctive form). This so-called analytic approach is common in evaluating
consumer products. The global judgment is made by breaking down the
multicriterion judgment using separate criteria and then following explicit rules. If
necessary, the judgment can be justified by retracing and checking for integrityall
the steps that led to it. In assessing student work, the analytic approach typically
settles on the set of criteria considered to be most relevant to the work of most
students at a particular stage of development. The criteria may be simply selected
by a teacher on the basis of their logical relevance to the task, or may result from
empirical studies (using factor or regression analyses) of the judgmental behavi
ours of competent assessors. Diederich (1974) followed the latter approach. This
component-wise attack on the problem of making multicriterion judgments is
often advocated as the ideal towards which impressionistic, holistic or informal
systems should be made to move. It assumes, however, that the set of criteria
nominated is sufficient for all cases, that the criteria do not overlap, and that use
of the combining formula leads to judgments which would not conflict (except
perhaps rarely) with more holistic approaches. A substantial argument has been
mounted elsewhere (Sadler, 1985) that for complex phenomena, use of a fixed set
of criteria (and therefore the analytic approach) is potentially limiting.
The second approach to making complex judgments is for the evaluator to react
to the work as a whole, making an entire, or what Kaplan called a configurational
(1964, p. 211), assessment firstand then to substantiate it (to whatever extent is
necessary) by referring to separate criteria, which may or may not be drawn from
a prespecified set. In this approach, imperfectly differentiated criteria are com
pounded as a kind of gestalt and projected onto a single scale of quality, not by
means of a formal rule but through the integrative powers of the assessor's brain.
To produce a rationale for such a holistic or global judgment, the assessor
unpacks some of the conceptual unidimensionality. Configurational assessments
do not require the specification of all criteria in advance, neither do they assume
operational independence among the criteria.
work, particularly those which are made progressively as the teacher (more or less
instantaneously) senses positive and negative points worthy of note. Some com
ments (such as "Yes", or "I agree!") are non-specific, or are not related directly to
the quality of the written piece. Other comments are evaluative, and clearly imply
criteria. It can be demonstrated that when a teacher, on two or more separate
find that while such sheets are helpful, they may lead to frustration because of
their inflexibility. The qualities of a piece of work cannot necessarily be dealt
with adequately using a fixed criterion set, and teachers often feel the need to call
upon nonstandard criteria.
A more satisfactory (and less mechanistic) solution to the problem is to con
sider the universe of criteria as notionally partitioned into two subsets called for
convenience manifest and latent criteria (Sadler, 1983). Manifest criteria are those
which are consciously attended to either while a work is being produced or while
it is being assessed. Latent criteria are those in the background, triggered or acti
vated as occasion demands by some (existential) property of the work that devi
ates from expectation. Whenever there is a serious violation of a latent criterion,
the teacher invokes it, and it is added (at least temporarily) to the working set of
manifest criteria. This is possible because competent teachers have a thorough
knowledge of the full set of criteria, and the (unwritten) rules for using them. But
it is precisely this type of knowledge which must be developed within the students
if they are to be able to monitor their own performances with a reasonable degree
of sophistication. The translation of a criterion from latent to manifest should
therefore not be interpreted by either the student or the teacher as unfair or as
some sort of aberration. Because of the practical impossibility of employing all
criteria at once, it is inevitable and perfectly normal. Marshall (1958, 1968)
referred to this as theflotation principle, and advocated its use in evaluation. In an
interesting shift of metaphor, it also formed the basis for Elbow's (1973) so-called
center of gravity approach to appraising student writing for formative purposes.
The art of formative assessment is to generate an efficient and partly reversible
progression in which criteria are translated for the student's benefit from latent to
manifest and back to latent again. The aim is to work towards ultimate submer
gence of many of the routine criteria once they are so obviously taken for granted
that they need no longer be stated explicitly. The necessity to recycle work
through the teacher (for appraisal) can be reduced or eliminated only to the extent
that students develop a concept of quality, and the facility for making
multicriterion judgments. This in turn requires that they be given adequate
evaluative experience themselves.
When students have to rely solely on, say, teachers' writtencomments, not only is
the feedback conveyed in propositional form, but the number of comments and
their content depends upon the willingness of the teacher (and the time available)
to actually make the comments, the ability of the teacher to express the feedback
in words, and the ability of the student to interpret the comments. The student
may not, for instance, know what is implied by references to particular evaluative
criteria. For example, suppose a teacher points out to a student that something
produced is not as coherent as it should be. As a criterion, coherence implies that
how something hangs together is important in appraising it. Coherence is clearly
relevant to evaluating a variety of things: a painting, an essay, a dramatic seg
ment, and so on. The nature of the elements that have to cohere (visual elements,
concepts and ideas, physical movements), the serial and lateral connections
between these elements, and the relation of each part to the whole, may not neces
sarily be clear to the student unless the contextual meaning of coherence is
explained. Exactly what coherence implies in one context does not transfer
directly to another context, although the basic idea is the same. Because much of
the evaluative knowledge underlying teachers' comments is tacit, the learner also
has a need to develop an appropriate body of tacit knowledge to be able to inter
pret formal statements.
Criteria often seem elusive partly because what a criterion means and what it
implies for appraisal cannot necessarily be defined in isolation from concrete
examples of things which possess the property in question, which in any case is
usually only one of many properties. Coming to an understanding of the property
is therefore as much an epistemological as it is a technical matter. To clarify the
meaning and implications of a particular criterion, it would be useful to have a set
of graded examples exhibiting more or less of that property. But for works of art
or pieces of literature, the various properties are inevitably compounded together,
so that one cannot create or collect examples for which all properties other than
the one in question are held constant. This is in contrast with a dichotomous
criterion such as correctness, for which positive and negative instances may
usually be produced on demand.
A novice is, by definition, unable to invoke the implicit criteria for making
refined judgments about quality. Knowledge of the criteria is "caught" through
experience, not defined. It is developed through an inductive process which
involves prolonged engagement in evaluative activity shared with and under the
tutelage of a person who is already something of a connoisseur. By so doing "the
apprentice unconsciously picks up the rules of the art, including those which are
not explicitly known to the master... Connoisseurship... can be communicated
only by example, not by precept" (Polanyi, 1962, p. 53-54). In other words,
providing guided but direct and authentic evaluative experience for students
enables them to develop their evaluative knowledge, thereby bringing them within
the guild of people who are able to determine quality using multiple criteria. It
also enables transferof some of the responsibility for making evaluative decisions
from teacher to learner. In this way, students are gradually exposed to the full set
of criteria and the rules for using them, and so build up a body of evaluative
knowledge. It also makes them aware of the difficulties which even teachers face
of making such assessments; they become insiders rather than consumers.
shortfalls. The disqualification is then due less to a single identifiable cause than
to the combined effects of marginal deficits.
The concept of guild knowledge can be extended beyond the confines of evaluat
ing a piece of work in isolation, to evaluating a piece of work in relation to the
task specifications. In situations where students construct assignments or term
papers according to specifications laid down by the teacher, it is common (and
frustratingfor the teacher) for a proportion of students not to address themselves
to the task set. The student, for example, may do a creditable job of recounting the
story of a novel instead of identifying the theme. Some teachers adopt a policy of
accepting and giving partial credit (deliberately or by default) for a response
which is well put together but is off-target.On the surface, this practice appears to
make a reasonable concession to the hardworking student for the time and effort
put in. In the long run, however, it undermines the learning which is supposed to
take place, and reduces the student's incentive to tackle tasks of the type actually
set. If learning how to address a set task or how to produce something within an
established genre is an important instructional outcome, sticking to the task has to
be a pre-emptive criterion. Meeting the generic requirement is a logical precondi
tion for an appraisal to be made within a particular genre, but the significance of
this fact may be brought home to the students only when they themselves are
faced with deciding whether or not several pieces of work meet the original task
specifications. In addition, it may demonstrate to them how common it is for
students not to respond to the task that is actually set
Some of the variation in quality of different students' responses to a set task
In many contexts, students traditionally have more or less relied on their teachers
to tell them how to effect improvement. This aspect is not dealt with in detail
here, except to observe that if the teacher is to be in a position to suggest remedial
moves, the teacher should ideally possess current productive expertise of the kind
to be developed by the student. Apart from the issue of credibility with students, a
teacher should not be purely a connoisseur who never engages in any disciplined
way in productive activity. Many teachers of writing, for example, do not volun
tarily write prose or poetry for either pleasure or profit apart from personal letters
and other necessities. Their writing experience is vicarious and limited to the
classroom setting. It consists of launching students into writing tasks of various
kinds, and later helping them to improve their work. This anomalous situation
parallels the experience of many students, whose only exposure to evaluative and
editorial activity is as it is received from the teacher. It therefore is also vicarious.
The third condition for self-monitoring to occur is that students themselves be
able to select from a pool of appropriate moves or strategies to bring their own
performances closer to the goal. This requirement warrants separate consideration
because the ability to evaluate others' or one's own work is not necessarily
matched by the ability to produce. It is also consistent with the thesis that the pos
session of evaluative expertise is a necessary (but not sufficient) condition for
improvement. A student in English, for example, may be able to recognize the
theme in a novel once it has been identified by another person, or be able to dis
tinguish between the theme and other nominated characteristics of a novel but be
unable to engage in the abstract thought which is necessary to identify from
scratch the theme or themes in an unseen novel, or to structure a written response
appropriately. This ability to recognize and evaluate but not construct is not an
The most readily available material for students to work on for evaluative and
improve theirs (Beaven, 1977; Pianko and Radzik, 1980; Thompson, 1981;
Chater, 1984). "Students who become conscious of what they're doing by
explaining their decisions to other students also learn new strategies for solving
writing problems. And because students should become progressively more inde
pendent and self-confident as writers, they need to evaluate each other's work and
their own frequently, a practice which teaches constructive criticism, close read
ing, and rewriting" (Lindemann, 1982, p. 234). Boud (1986) reported similar find
ings in higher education when self-assessment and peer-assessment were built
into instructional procedures for law, engineering and architecture students. It is
clear that to build explicit provision for evaluative experience into an instructional
system enables learners to develop self-assessment skills and gap-closing
strategies simultaneously, and therefore to move towards self-monitoring. Some
resistance to this proposition can, however, be expected.
skill and expertise to evaluate student work, and that this skill is not transferable
to the students. Bloom's (1956) influential taxonomy places evaluation at the top
of the hierarchy of cognitive skills, and some learning theorists hold that learners
typically do not (and perhaps cannot) engage in high-level abstract thought while
young. Although the exact position of evaluation in the Bloom hierarchy is
debatable, it almost certainly requires abstract thinking and is situated above
knowledge, comprehension, and application. This may give the impression that
evaluation is some kind of esoteric activity engaged in only by adults or experts.
If so, it ignores the fact that even children (certainly in their hours out of school)
continually engage in evaluative activity and, if asked, can often produce
rudimentary but reasonably sound rationales for theirjudgments.
Some teachers feel threatened by the idea that students should engage openly
and cooperatively in making evaluative judgments. An assessment which results
in a grade is used by many teachers as a tool for the control or modification of
behaviour, for rewards and punishments. To remove some of the responsibility for
assessment from teachers and place it in the hands of students may be considered
to have the potential for undermining the teacher's authority. A less pathological
concern is that many teachers perceive evaluation as the responsibility primarily
of teachers because it constitutes part of the specialized knowledge and expertise
that they have acquired as professionals. Assessment is regarded as strictly the
teachers' prerogative: it sets them apart from their students and to some extent
from parents and the rest of society. Part of the teacher's responsibility is surely,
however, to download that evaluative knowledge so that students eventually
become independent of the teacher and intelligently engage in and monitor their
own development. If anything, the guild knowledge of teachers should consist
less in knowing how to evaluate student work and more in knowing ways to down
load evaluative knowledge to students.
Apart from personal factors, formative assessment can be inhibited by certain
circumstances outside the control of the teacher. School-based or internal exami
nation systems often make use of so-called continuous (or progressive, or peri
odic) assessment. One of the arguments in favour of continuous assessment is that
a series of assessments made over an extended period of time tends to reduce the
high levels of anxiety experienced by some students under formal make-or-break
examinations at the end of a course. (It may, of course, create a different form of
stress.) Another argument is that continuous assessment permits wider and more
varied sampling of a student's knowledge and skills. A third argument is that
continuous assessment provides frequent feedback on progress. Continuous
assessment cannot, however, function formatively when it is cumulative, that is,
when each attempt or piece of work submitted by a student is scored and the
scores are added together at the end of the course. This practice tends to produce
in students the mindset that if a piece of work does not contribute towards the
total, it is not worth doing. The longer-term goal of excellence may therefore be
forfeited because of the drive to accumulate credit. Optional recycling of work for
purposes of improvement becomes an unattractive proposition, and also raises the
question of fairness to other students if a teacher works with some of the students
(but perhaps not others) in helping to raise the standard of performance. Any work
which is to form the basis for a course grade is normally expected, of course, to be
produced by the student without aid from the teacher.
A furtherfactor follows from the widespread policy of allocating course grades
according to some predetermined statistical distribution. This is often considered
to be the best or only practical method of maintaining standards. Such grading-on
the-curve, however, does not allow for the recognition of improvement in
A final factor is associated with curriculum structure. There has been a trend
over recent decades towards breaking up long courses into units or modules in the
time to do it.
Conclusion
To improve their performance, students need to know how they are progressing.
Feedback is commonly defined in terms of information given to the student about
the quality of performance (knowledge of results). But in many educational and
training contexts, students produce work which cannot be assessed simply as
correct or incorrect. The quality of the work is determined by direct qualitative
human judgment. The traditional definition of feedback is then too narrow to be
of much use, and in this article a more appropriate conception is presented. It
requires knowledge of the standard or goal, skills in making multicriterion
comparisons, and the development of ways and means for reducing the
discrepancy between what is produced and what is aimed for.
Improvement can, of course, occur if the teacher provides detailed remedial
advice and the student follows it through. This, however, maintains the learner's
dependence on the teacher. The alternative approach which is described and advo
cated in this article is for students to develop skills in evaluating the quality of
their own work, especially during the process of production. The transition from
teacher-supplied feedback to learner self-monitoring is not something that comes
about automatically. For an important class of learning outcomes, the instructional
system must make explicit provision for students themselves to acquire evaluative
ner outcomes which are judged qualitatively using multiple criteria. The corollary
is that not to design authentic evaluative experience into the instructional system
either places an artificial performance ceiling on many students or limits their rate
of learning.
References
Bailin, S. (1987). Creativity or quality: a deceptive choice. Journal of Educational Thought, 21, 33-39.
Beaven, M. H. (1977). Individualized goal-setting, self-evaluation, and peer evaluation. In C. R.
Cooper and L. Odell (Eds.), Evaluating writing: describing, measuring, judging. Urbana, IL.:
National Council of Teachers of English.
Black, H. D. (1986). Assessment for learning. In D. L. Nuttall (Ed.), Assessing educational achieve
ment .London: Falmer.
Black, H. D. and Dockrell, W. B. (1984). Criterion-referenced assessment in the classroom.
Kaplan, A. (1964). The conduct of inquiry: methodology for behavioral science. San Francisco:
Chandler.
Kulhavy, R. W. (1977). Feedback in written instruction. Review of Educational Research, 47, 211
232.
Kulik, J. A. and Kulik, C-L. C. (1988). Timing of feedback and verbal learning. Review of Educational
Research, 58, 79-97.
Lindemann, E. (1982). A rhetoric for writing teachers. New York: Oxford University Press.
Locke, E. A., Shaw, K. N., Saari, L. M. and Latham, G. P. (1981). Goal setting and task performance:
1969-1980. Psychological Bulletin, 90, 125-152.
Marshall, M. S. (1958). This thing called evaluation. Educational Forum, 23, 41-53.
Marshall, M. S. (1968). Teaching without grades. Corvallis, Oregon: Oregon State University Press.
Myers, M. (1980). A procedure for writing assessment and holistic scoring. Urbana, IL: ERIC
Clearinghouse on Reading and Communication Skills, National Institute of Education, and
National Council of Teachers of English.
Nitko, A. J. (1983). Educational tests and measurement: an introduction. New York: Harcourt Brace
Jovanovich.
Odell, L. and Cooper, C. R. (1980). Procedures for evaluating writing: assumptions and needed
research. College English, 42(1), 35-43.
Pianko, S. and Radzdk, A. (1980). The student editing method. Theory into Practice, 19, 220-224.
Polanyi, M. (1962). Personal knowledge: towards a post-critical philosophy. London: Routledge and
Kegan Paul.
Ramaprasad, A. (1983). On the definition of feedback. Behavioral Science, 28, 4-13.
Rowntree, D. (1977). Assessing students: how shall we know them? London: Harper and Row.
Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic evaluations.
Educational Evaluation and Policy Analysis, J(4), 25-31.
Sadler, D. R. (1982). Evaluation criteria as control variables in the design of instructional systems.
Instructional Science, 11, 265—271.
Sadler, D. R. (1983). Evaluation and the improvement of academic learning. Journal of Higher
Education, 54, 60-79.
Sadler, D. R. (1985). The origins and functions of evaluative criteria. Educational Theory, 35, 285
297.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13, 191-209.
Shenstone, W. (1768). On writing and books, LXXIX. In Works: In verse and prose Vol. 2, (3rd ed.).
London: Dodsley.
Thompson, R. F. (1981). Peer grading: some promising advantages for composition research and the
classroom. Research in the Teaching of English, 15, 172-174.
Thomdike, E. L. (1913). Educational Psychology, Vol.1: The original nature of man. New York:
Teachers College, Columbia University.
Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31-48.
Wittgenstein, L. (1974). Philosophical investigations. (G.E.M. Anscombe, Trans.). Oxford: Basil
Blackwell. (Original work: 3rd ed. published 1967).