Assessment Is Essential 1st Edition
Assessment Is Essential 1st Edition
09/MHSF123:210:GREEN
First Edition
ﱟﱟﱟﱠﱟﱟﱟ
ASSESSMENT IS ESSENTIAL
Susan K. Green
Winthrop University
Robert L. Johnson
University of South Carolina
gre78720_fm_i-xv.indd Page ii 4/7/09 8:17:34 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
Published by McGraw-Hill, an imprint of The McGraw-Hill Companies, Inc., 1221 Avenue of the
Americas, New York, NY 10020. Copyright © 2010. All rights reserved. No part of this publication may
be reproduced or distributed in any form or by any means, or stored in a database or retrieval system,
without the prior written consent of The McGraw-Hill Companies, Inc., including, but not limited to, in
any network or other electronic storage or transmission, or broadcast for distance learning.
1 2 3 4 5 6 7 8 9 0 QPD/QPD 0 9
ISBN: 978-0-07-337872-5
MHID: 0-07-337872-0
The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a
Web site does not indicate an endorsement by the authors or McGraw-Hill, and McGraw-Hill does not
guarantee the accuracy of the information presented at these sites.
www.mhhe.com
gre78720_fm_i-xv.indd Page iii 4/7/09 8:17:36 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
ﱡﱡﱡﱡﱡﱡﱡﱡﱡﱡﱡﱣ
For our students and our mentors, past,
present, and future.
ﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟ ﱲ
TABLE OF CONTENTS
Table of Contents v
vi Table of Contents
Table of Contents ix
Structure Record Keeping to Encourage Student Now It’s Your Turn: Setting Personal Goals for
Self-Monitoring 368 Classroom Assessment 383
Develop an “Assessment Bank” 369 Personal Goal-Setting Steps 383
Enlist Students in Assessment Design 370 Key Chapter Points 385
Assessment in the Context of a Democratic Society: Chapter Review Questions 385
Classroom Examples 370
Helpful Websites 386
Center for Inquiry 371
References 386
Knowledge Is Power Program (KIPP) 375
Key to Assessment in the Context of Democratic Glossary 388
Participation 380
Index 392
Formative Assessment and Equal Access 381
Formative Assessment and Self-Governing Skills 381
Formative Assessment and Critical Thinking 381
gre78720_fm_i-xv.indd Page x 4/7/09 8:17:37 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
ﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟ ﱲ
PREFACE
x
gre78720_fm_i-xv.indd Page xi 4/7/09 8:17:37 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
Preface xi
self-governing skills such as setting goals and working toward them, taking the initia-
tive to gather and analyze information, and critically applying appropriate standards
to one’s own work.
Many teacher candidates do not realize the power their assessment can have to
transform students into active learners and critical thinkers. We discuss assessment
methods that can be employed to help students to internalize criteria for optimal per-
formance and to set their own goals. For example, students who regularly help design
rubrics or other evaluation criteria for their work are more likely to become active,
independent learners.
Similarly, other assessment practices we describe can also motivate students to
internalize the desire to learn and take charge of their own learning, particularly when
they can reflect on and keep track of the skills they have mastered in a systematic way
by using graphs, journals, or portfolios. These kinds of activities can lead to student
confidence and judgment that can be showcased and reinforced by such practices as
setting aside time for student authors to share strategies they have found helpful so
that other students can learn from them, among others.
A second major way to use assessment for promoting participation in a democ-
racy is teaching students to think critically. A key element for promoting critical
thinking is replacing assessments that merely measure rote knowledge with assess-
ments that engage students in critical thinking. Each chapter addressing assessment
design includes practical ways to get beyond measurement of facts. Methods for
developing thoughtful questions and classroom discourse, application of concepts to
new situations, and synthesis of information from many sources are stressed because
they are essential aspects of equipping students with critical thinking skills for active
engagement in democracy.
Third, the text emphasizes ways that assessment can be used to reduce barriers
to providing equal access to knowledge for all students. All people, regardless of
family background, have a right to equal access to education as the source of equal
opportunity. Efforts to provide equal education to all students regardless of their
demographic origins are currently inadequate, particularly for poor and minority
children. Students who lag behind in reading in their early elementary years, for
example, predictably get further and further behind as they progress through
school.
Specific assessment techniques are described in the text to help teachers assist
lower-achieving students. As a first step, methods for disaggregating and analyzing
assessment results are presented to provide the means for exposing achievement
gaps, because a key to teaching for equal access to knowledge is awareness of dif-
ferential performance of subgroups of students. Formative assessment is described
in terms of concrete teacher actions, such as strategically administered “quick
writes,” to let a teacher know who has and who has not mastered a concept, allow-
ing for additional instruction if needed. Setting goals and tracking progress toward
them across time with graphs is another featured technique that has been shown to
help struggling learners catch up. For equal opportunity to materialize, these strat-
egies to ameliorate the achievement gap must be incorporated into classroom
assessment.
The marginalization or negative portrayal of underrepresented groups in assess-
ment can engender other barriers to equal access to knowledge. For example, to estab-
lish a context for an assessment, multiple-choice items and performance tasks often use
narrative and art to portray people in their physical appearance, dress, environment,
and activities. These portrayals may bring a multicultural perspective to instruction and
gre78720_fm_i-xv.indd Page xii 4/7/09 8:17:38 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
xii Preface
assessment, or they may promote biases. Helping teachers represent the diversity of the
classroom and avoid stereotypical representations and other forms of bias in their
assessments is an important component of fair assessment. This text emphasizes ways
teachers can discover and address inequities and provide equal access to knowledge for
all students.
Finally, teachers must be critical thinkers able to evaluate alternative points
of view effectively while keeping in mind the goals of education in a democracy
and ongoing educational renewal for themselves and their schools. They must also
serve as models of critical thinking for their students. The importance of critical
thinking and exercising good judgment when designing and choosing assessment
is stressed, particularly in a section on using action research with assessment data
as a tool for the reflective practitioner. Additionally, discussions of interpreting and
using assessment, issues of ensuring fairness in assessment, and an ongoing focus
on ethics and assessment all highlight the importance of critical thinking for
teachers in their role as models of democratic values and agents of educational
renewal.
Several other innovative features of this text are notable:
• Discussions of specific accommodations for students with several characteristics
(e.g., short attention span, learning English, lower reading skills) are provided
for each type of assessment design (e.g., multiple choice, essay, performance
assessment). This placement allows for more explicit recommendations tailored
to type of assessment and type of student need rather than a chapter with
abstract suggestions spanning all assessment types. It also promotes the “equal
access” theme.
• Each chapter describing assessment design includes a section detailing the five
most common errors made by novices when first attempting to devise that type
of assessment. This feature assists students in separating major issues from minor
details and helps prevent information overload.
• As an additional aid to understanding, case studies describing three teachers’
instructional units, one for elementary science, one for middle level math, and
one for high school English, make concrete the ideas about assessment presented
in each chapter. They are carried throughout the text, providing continuity across
chapters and tangible applications of key themes. At the end of each relevant
chapter, examples from one of these case studies are described to provide context
and concrete illustrations of major assessment concepts.
• A practical section on ethics and assessment in the first chapter focuses on basic
principles connected to classroom practices, with “ethics alerts” throughout the
text to tie ethics to common occurrences in relevant contexts.
• A final chapter pulls together themes from the text and creates a vision of
assessment excellence for teachers. The chapter proposes six key guidelines for
assessment that foster equal access to knowledge, promotion of self-governing
skills, and development of critical thinking. These are illustrated by descriptions
of assessment practices at two schools with contrasting teaching philosophies.
This chapter also provides practical suggestions for efficient use of assessment
and for setting goals related to assessment practices so that teacher candidates
have specific, concrete ideas to apply immediately in their classrooms.
We hope this book will inspire future teachers to discover all the ways their
classroom assessment can help them achieve their goals in preparing knowledgeable,
discerning, active students ready to take their place in a democratic society.
gre78720_fm_i-xv.indd Page xiii 4/7/09 8:17:38 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
Preface xiii
Acknowledgments
This book has been a collaborative effort, not only between the two of us, but also with
numerous colleagues across university and public school settings. We gratefully
acknowledge the willingness of the following people to have dialogues with us about
classroom assessment and to share their hard-earned expertise based on countless
years of working with students of all ages: Kelley Adams, A. J. Angulo, Angela Black,
Barbara Blackburn, Keith Burnam, Bob Cattoche, Susan Chapman, Stevie Chepko,
Tracie Clinton, Beth Costner, Sharon Craddock, Susan Creighton, Olive Crews, Denise
Derrick, Minta Elsman, Caroline Everington, Rebecca Evers, Chris Ferguson, Tiffany
Flowers, Rodney Grantham, Meghan Gray, Stephen Gundersheim, Connie Hale, Gloria
Ham, Lisa Harris, Frank Harrison, Jo Ellen Hertel, Lisa Johnson, Marshall Jones, Shannon
Knowles, Carol Marchel, Stephanie Milling-Robbins, Heidi Mills, Debi Mink, Mark
Mitchell, Carmen Nazario, Marg Neary, Tonya Moon, Tim O’Keefe, Linda Pickett,
Aaron Pomis, Nakia Pope, Frank Pullano, Jim Ross, Kevina Satterwhite, Jesse Schlicter,
Elke Schneider, Seymour Simmons, Julian Smith III, Tracy Snow, Dana Stachowiak,
Tenisha Tolbert, Carol Tomlinson, Jonatha Vare, Enola West, Jane White, Brad Witzel,
and Karen Young.
We want to express our appreciation to the South Carolina Department of Edu-
cation (SCDE) for sharing examples of assessment items. We also thank Scott Hockman
at the SCDE and Ching Ching Yap at the University of South Carolina for the use of
items from the South Carolina Arts Assessment Program.
Our students have also contributed important insights and examples that have
helped us convey some of the important principles and concepts in classroom assess-
ment. They have taught us much about teaching and learning. We want, in particular,
to thank Angie Alexander, Laura Clark, Leslie Drews, Graham Hayes, Lynn McCarter,
Laura McFadden, Krystle McHoney, Maria Mensick, Diana Mîndrilă, Grant Morgan,
Meredith Reid, and Elizabeth Schaefer. The influence of countless other unnamed stu-
dents is also present throughout this book.
As we developed Assessment Is Essential, we received valuable feedback from
faculty who prepare teacher candidates to implement classroom strategies in their
classrooms. The reviews were constructive and guided our fine-tuning of the chapters.
We want to thank the following reviewers:
We also want to thank the public school educators who diligently read drafts
of most or all of our chapters with an eye toward helping us keep them sensible and
gre78720_fm_i-xv.indd Page xiv 4/7/09 8:17:38 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
xiv Preface
useful for practicing teachers: Bob Cattoche, Jo Ellen Hertel, and Dana Stachowiak.
Nick Mills also read several chapters, enhanced our logic and organization, and
provided important suggestions from a college student perspective. Finally, we want
to single out William Mills for special appreciation. He edited every draft of every
chapter, helped us maintain our vision, provided moral support, and made countless
suggestions that improved our efforts to communicate our conviction that assess-
ment is essential.
gre78720_fm_i-xv.indd Page xv 4/7/09 8:17:38 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
ﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟ ﱲ
ABOUT THE AUTHORS
xv
gre78720_ch01_002-031.indd Page 2 3/20/09 11:52:15 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
2
gre78720_ch01_002-031.indd Page 3 3/20/09 11:52:17 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 1
WHY IS ASSESSMENT
ESSENTIAL?
An informed citizenry is the bulwark of democracy.
–Thomas Jefferson
DAEMOCRA
BROAD VIEW: ASSESSMENT AND
TIC VALUES
What’s the first thing that comes to your mind when you think about the word
assessment? The last test you took in history class? That nasty pop quiz in psychol-
ogy? Most of our students think about a recent test or quiz. And that is one of the
main reasons we wanted to write this book. We think assessment has negative con-
notations and is therefore undervalued and pushed to the background in a lot of
people’s thinking about teaching. People equate assessment with tests, and not many
people enjoy being tested.
Teachers know testing is important, but they often feel it takes away from
valuable instructional time. But we have come to see assessment as an amazingly
flexible and comprehensive tool that has measurably improved our own teaching.
Even more important, we believe that learning to design good assessments also
helps teachers prepare their students for participation as citizens in a democracy.
That sounds like a lofty claim, but as we show you all the ways you will use assess-
ment in your classroom, you will gradually come to see its wisdom.
TA B L E 1 . 1
Democratic Themes Related to Teachers’ Assessment Practices
Theme Assessment Practices Examples
our ideal that “all [people] are created equal.” Public education is the single most
important element to ensure a level playing field for children who don’t come from
families of privilege. As a teacher you will make hundreds of decisions every day, and
you will find it easier if you weigh all you do against one important question: “Will
this help my students learn?” If you are constantly thinking about the best ways to
help your students learn more, you are moving them toward their highest potential for
participating as citizens in our democracy. In the following sections we discuss three
key themes that appear throughout the text that link preparation for democratic par-
ticipation to teachers’ assessment practices. These themes are shown in Table 1.1.
door, and we must do what is in our power to maximize learning for all our students
and provide equal access to educational opportunity.
We contend that good assessment practices provide the opportunity for teachers,
working in the realm where they have primary impact—their own classroom—to
maximize learning for their students. Teacher actions are crucial for providing equal
access to educational opportunity (Stiggins & Chappuis, 2005). That’s because equal
access does not mean that every student should receive exactly the same instruction.
Instead, equal access means that some students may need extra accommodations or
differentiated resources and instructional opportunities to be able to reach mastery on
the learning goals for the class. For example, many middle-class children learn the
mechanics of spoken and written language implicitly from their parents, whereas some
poor children may not have these opportunities. Teachers must address these differ-
ences in the classroom so that all students acquire the foundational skills and common
knowledge as well as the more complex understanding and skills necessary for demo-
cratic participation (Poplin & Rivera, 2005). So, we can’t provide equal access without
maximizing learning opportunities, and we can’t maximize learning opportunities
without assessment that lets us know where students are now in relation to where we
want them to go.
Here’s an example of assessment practices that moved some students toward
more equal access. A student teacher we know was interested in encouraging her kin-
dergarteners to read more books. On different days, the student teacher decided to try
different strategies and then see if those strategies caused more students to visit the
reading center. To see if she was making any difference, she started keeping records of
attendance at the reading center. After a week, she was looking at the data and discov-
ered three of the four children who had not been to the reading center at all that week
were also the students with literacy skills lagging behind those of the other children.
This discovery allowed her to see an achievement gap in the classroom and to focus
on those students who weren’t exposed often enough to books at school. She and the
teacher worked hard to draw those students to the reading center with books and
activities tied to their interests. By the end of the semester, those students were visiting
the reading center as much as or more than other children, and their reading skills
were accelerating. If the student teacher hadn’t done a simple assessment—keeping
track of voluntary center attendance—she might not have noticed one of several strat-
egies needed to help close the learning gap for these children.
We strongly believe teachers’ assessment practices play a large role in their
efforts to do their part to help all students have access to the education that is a basic
right in this country. Knowing where students are in the curriculum and what they
need next is a key focus for assessment. Such knowledge is critical for differentiating
your instruction to meet the range of needs your students bring to your classroom.
We will return to this theme in future chapters as we explore further how teacher
assessment practices in the classroom contribute to the fundamental right of equal
access to education.
SINELFA-GDOVERNING
EMOCRACY
SKILLS FOR PARTICIPATION
The founders of our form of government had to assume that all citizens could be
and would be educated enough to make good decisions and to be capable of gov-
erning their own lives. All citizens need to develop responsibility for independence
in their thinking, and they must take responsibility to do their part to participate
gre78720_ch01_002-031.indd Page 6 3/20/09 11:52:17 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
in society. They must, then, be interested in learning and critically analyzing new
information for the rest of their lives. Our government wouldn’t function well if
people couldn’t think for themselves and didn’t expect to have some control over
decisions that affect them. Self-governing skills don’t necessarily come naturally.
Instead, we as teachers must help students develop these capabilities. We believe
your assessment practices can help your students become more active participants
in their own learning and, by extension, more active in making other decisions for
themselves that will prepare them to participate as full citizens in a democracy. To
explain how, we must first look at some of our changing assumptions about the
purpose of schools and learning.
TA B L E 1 . 2
Student Characteristics Associated with Performance Goals and
Mastery Goals
Performance Goals Mastery Goals
Not willing to attempt challenging tasks Willing to attempt challenging tasks and
and give up more easily persist longer
Assume ability is more important than effort Assume effort is more important than ability
Believe that intelligence is a fixed trait Believe that intelligence can be increased
Use shallow cognitive processing strategies Use deeper level cognitive processing
strategies
confirmation of their learning. You can compare the characteristics of students with a
mastery-goal orientation to those with a performance-goal orientation in Table 1.2.
Naturally, every teacher would want a class full of students with mastery goals.
They sound like dream students! But in our competitive society, promoting mastery goals
takes extra effort. The good news is that teachers can play a large role in helping their
students to become more mastery oriented (Ames, 1992; Butler, 2006; O’Keefe et al.,
2008). Becoming more mastery oriented does not mean students give up performance
gre78720_ch01_002-031.indd Page 9 3/20/09 11:52:20 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Ames’s
Classroom
Element Assessment Strategies Outcome
assignments. They value the activity because it is useful or relevant to their lives. They
are more willing to give it their full attention and effort because they find it worthwhile.
For example, demonstrating calculation skills through designing a monthly budget
based on the salary made by people in an interesting career can be more compelling
than completing abstract calculation problems. Similarly, a Spanish teacher who
involves students in projects such as writing and performing scenes from a popular
Spanish novel rather than simply drilling on the questions at the end of the chapter is
also using a meaningful and challenging assessment task.
is how the teacher treats mistakes. If children are embarrassed or made fun of for their
mistakes, they will be afraid to try new things and will participate only if they are sure
they know the answer. As a teacher, you need to help students see that learning takes
hard work and practice, and that mistakes are a valuable part of learning. For example,
think back on important times you have learned new things. Didn’t a lot of your
breakthroughs come from making mistakes first?
In one of our recent classes, the need for teachers to convey the value of mistakes
in learning came up. One student raised her hand and told the class about the fourth-
grade teacher she would never forget. This teacher had a slogan, “We don’t make mistakes,
we make discoveries!” She used this slogan several times a day to promote effort optimism
in her students. It certainly had worked for this student. She would always answer a ques-
tion in class when no one else would give it a try because she knew she would make a
discovery (and not a mistake) about whatever concept the class was trying to master.
All three of these recommendations—varied, meaningful, and challenging tasks,
encouraging student participation in decision making, and evaluation that focuses
on individual progress and values effort and improvement—are strongly linked to the
classroom assessment strategies that we describe in this book. All the research that we
have seen suggests these assessment strategies can have direct impact on the motiva-
tion of students, and they may especially benefit students who are the most likely to
be at risk for school failure (Shepard, 2008). If you want students to love learning, to
persist in the face of obstacles, and to care about their own progress rather than how
they do compared to others, you will want to use assessment practices that have the
characteristics we have just described. You will also be fostering self-governing skills
such as independent thinking and a personal sense of responsibility crucial for citizens
engaged in democratic participation.
Self-assessment
The ability to assess oneself is one of the primary goals of education. We as teachers
must teach our students to function autonomously. Ultimately, they must deal on their
own with making decisions about life in all of its complexity. Only relatively recently,
however, has the strategy of explicit self-assessment been introduced, probably as a
result of the weakening rationale for the traditional sorting function of schools. You
can easily see why self-assessment is important. It teaches objectivity—being able to
get beyond your own point of view and look at yourself in relation to a standard. It
also teaches empowerment—if you eventually understand the standard yourself, you
are not as dependent on an authority to make judgments about your own work. You
can do it yourself. The process of self-assessment also helps you to become open to
feedback from a variety of sources. Gradually, you will be able to decide which sources
of information are valuable and which are not.
Student self-assessment is a key to enhance learning because it requires students to Metacognition
become active in connecting their work to the criteria used for evaluating it (Earl, 2003; The process of analyzing and think-
Pelligrino et al., 2001). The requirement that students self-assess activates what psycholo- ing about one’s own thinking and
enabling skills such as monitoring
gists term metacognition. Metacognition is the ability to step back from merely listening progress, staying on task, and self-
to a lecture or doing an assigned task, to thinking about what is happening in a more correcting errors.
gre78720_ch01_002-031.indd Page 12 3/20/09 11:52:21 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
critical way. It involves directing your awareness to a bird’s-eye view of what you are doing.
This heightened level of self-awareness allows students to monitor whether they are under-
standing a lecture point and then to take action by asking a question if they aren’t. Or, it
pushes them to notice when they calculate an answer that doesn’t make sense, and to
self-correct it. Metacognitive skills such as monitoring progress, staying on task, and self-
correcting errors can and should be taught by teachers (Pelligrino et al., 2001).
We can see the development of these self-assessment skills occurring in the class-
room of Tracie Clinton, a third-grade teacher. To encourage self-assessment, she holds
writing conferences with individual students every nine weeks. First, students respond to
a writing prompt (e.g., “If I could visit anywhere in the United States I would visit . . .”).
They use a writer’s checklist to monitor themselves as they are writing. During the indi-
vidual conferences, Tracie encourages students to critique their own writing. She has them
focus on only one or two areas each time (e.g., staying on topic, using vivid vocabulary).
She finds that students want to improve, and they can identify problem areas on their
own with a little guidance. Next, they discuss how to get the paper back on track, and
Tracie writes down pointers on “sticky” notes for the student to refer to later. She has
found that students are able to transfer what they discuss during writing conferences to
other work. As she says, “I have found that my students get extremely excited when they
catch a mistake in their writing on their own. Writing conferences are a long process, but
it is worth every moment when I see my students blossoming into exceptional writers.”
As shown in this example, when students self-assess, the teacher doesn’t give up
responsibility for assessment but rather shares it with the students. Students learn and
eventually internalize the standards and goals they are working toward. Researchers
have found that students who participate in self-assessment improve the quality of their
work, have a better understanding of their strengths and weaknesses, and have higher
motivation (Andrade, 2008).
B O X 1 . 1
Letter from a Student Observer. In contrast, Ms. R gave a summative test in her senior
Are These Teachers Encouraging English class, today, that seemed to be the antithesis of every-
Mastery or Performance Goals? thing we have learned about assessment. To begin with, the test
was poorly designed. It was a matching test on characters from
Recently, I have had an opportunity to observe how two dif-
the Canterbury Tales that not only had several typos, but also
ferent teachers have used assessment in their classroom. For
had over thirty premises and more than fifty responses, far
the past month, I have been volunteering at a high school.
more than could be handled at one time by most students. The
Two mornings a week I help in Ms. Q’s freshman English
students were allowed to use their notes. To have more space,
class, and one morning a week I help in Ms. R’s senior English
seven students went with me to a spare room where I proc-
class. Aside from the age difference, the classrooms seem to
tored their test. Ms. R had collected their notes for a grade, and
be fairly similar; both have twenty to twenty-five students,
right before the test, she returned the notes to the students.
both have a comparable ethnic make-up, and both are “regu-
Unfortunately, there did not seem to be any feedback on the
lar” English classes. I really enjoy helping the students, but
notes, and even if there had been comments, the students
right now I may be getting more from my volunteer hours,
would not have had an opportunity for addition or revision. I
because I have the chance to observe two different teachers
do not know Ms. R’s learning goals, but if they were knowledge
and the instructional decisions they make, particularly in
based, then I suppose a matching test was appropriate; how-
regard to how they assess their students.
ever, since the students were allowed to use their notes, the test
Ms. Q’s freshman class has been working on a research/
did not really seem to be an assessment of their knowledge, but
writing unit in which the students pick products, research
rather their ability to take notes. The seven students whose test
two different brands of that product, write short comparison/
I proctored did not do well on the test (one of them may have
contrast papers, create PowerPoint presentations of their pa-
passed), and it was obvious while they were taking the test that
pers, and then, finally, make oral presentations. First of all, I
they were frustrated. I saw that their notes were skimpy, and
am impressed with Ms. Q’s ability to pack so much into a
the points they had pulled out about the characters were fre-
single, comprehensive unit. Secondly, I like the way that her
quently irrelevant. Since Ms. P had seen the notes, she had to
different activities gave all her students various opportuni-
know that their notes were insufficient, so it seems that these
ties for success. I know that several students had trouble
students had no real chance of doing well.
with some of the writing aspects of their papers and that
I understand that teachers make mistakes, and so I hope
their paper grades probably reflected some of those issues,
I just happened to catch Ms. R. on a bad day. Still though, it
but many of those same students put together very eye-
reinforces the importance of appropriate assessment. The
catching PowerPoints. Ms. Q assessed the same information
seven students from Ms. R’s class were frustrated. They knew
several times, which should give her a much more reliable
within five minutes that they had no hope of doing well on
picture of each student (it also gave students the opportunity
that test, and in another five minutes they were ready to give
to revise at each stage). Further, since the final activity in the
up, saying the test was too hard and was stupid. All I could tell
unit was a performance, Ms. Q used an analytic rubric to
them was that the test looked tough, but that they needed to
assess the students’ oral presentations. Her rubric had about
keep working and fill in every blank. I wanted to tell them
six criteria (content, number of slides, grammar, speaking,
that, regardless of their notes, the test was just bad. In con-
etc.) and was less than a page in length. Even though it was
trast, the students in Ms. Q’s class who were not doing as well
an oral presentation, Ms. Q focused more on the content
did not complain or exhibit the same level of frustration. They
than the actual speaking since most of the students were
realized, if only at a subconscious level, that they had had
unfamiliar with making presentations. I am sure that she
several chances to exhibit their abilities. We have read and
gave them an assignment with some kind of criteria, but I
discussed the fact that assessment needs to be thoughtful and
am not sure that she gave them the actual rubric that she
planned along with instruction, and recently I have seen ex-
used for grading, which would be my only criticism.
amples that prove those important points.
as one foundational element for helping students acquire the skills needed to function
well as citizens in a democracy. You can begin to see the difference in student reactions
when mastery goals are encouraged by comparing the two classrooms described in Box 1.1
by one of our own students who volunteered in these classrooms. We believe good assess-
ment practices are essential for promoting mastery goals, and so the connection between
assessment and mastery goals is another recurring theme in this text.
gre78720_ch01_002-031.indd Page 14 3/20/09 11:52:22 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Assessment
AN OVERVIEW OF ASSESSMENT
First, let’s lay out a definition of assessment that is broader than the tests we usually
The variety of methods used to think of: Assessment is the variety of methods used to determine what students know
determine what students know and
are able to do before, during, and and are able to do before, during, and after instruction. In fleshing out this definition,
after instruction. we will provide an overview of assessment covering three central purposes.
gre78720_ch01_002-031.indd Page 15 3/20/09 11:52:22 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
An Overview of Assessment 15
Purposes of Assessment
From our definition, you can see that you will be interested in what your students know
and are able to do at many different points during the school year and, more specifically,
during each instructional unit you teach. Now that you also know that the purpose of
assessment is broader than tests or quizzes for grades, we want to describe the other major
purposes teachers have for doing assessments. Keeping your purpose for any assessment
in mind is very important because your purpose dictates the specific kinds of assessments
you will do. For example, as an art teacher, if your purpose is to do a quick check that
your students understand how the color wheel works, you will do a different assessment
than if your purpose is to determine whether they have mastered the concepts in the unit
on perspective drawing. If your purpose is to compare your social studies class to other
ninth graders across the country, you will use a different assessment than if your purpose
is to see how much they learned from the Civil War unit you just completed. Table 1.3
illustrates the three general purposes of assessment with some examples.
TA B L E 1 . 3
The Three Major Purposes of Assessment
Purpose of the
Assessment:
Why Do You Do It? When Do You Do It? How Do You Do It? Examples
Purpose 1: Before instruction School records. Math: Strengths in math skills from
Diagnosic assessment previous statewide testing.
Teacher observations.
Getting a sense of Language arts: Observation of student
Teacher-made questionnaires
strengths and needs level of analysis during first look at a
and pre-tests.
for planning poem in class.
instruction
Music: Last year’s performance rankings
of students at all-state band auditions.
Purpose 2: Formative During instruction Teacher observations. Math: Checklist of new algebra skills
Assessment demonstrated during class or homework.
Quizzes.
Monitoring growth Language arts: Quick write summarizing
Skill checklists.
as you teach key issues in differentiating similes from
(assessment for Homework. metaphors.
learning) Student self-assessments. Music: Self- and teacher-ratings of
Systematic teacher individual performance of a selection
questioning. for an upcoming concert.
Purpose 3: After instruction End of unit test for assigning Math: Final algebra exam that uses novel
Summative grades. problems based on skills taught.
Assessment
Statewide tests at the end of Language arts: Booklet containing a
Determining what the year. student’s poems that incorporate figures
students have learned of speech learned in a poetry unit.
after instruction or Statewide achievement test scores in
for accountability math and language arts.
purposes (assessment
Music: Final graded performance for the
of learning)
semester in band class.
gre78720_ch01_002-031.indd Page 16 3/20/09 11:52:22 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
An Overview of Assessment 17
teaching a math unit on measurement. Rather than checking first, she assumed her
students had all used rulers. Partway through the lesson, when they had difficulty with
one of her planned activities and started using the rulers as weapons, she discovered
that they had not worked with rulers before. She quickly had to interrupt her plans
and teach them what rulers were for and how they worked. You can see how assess-
ment for the purpose of diagnosis is crucial.
TA B L E 1 . 4
Differences Between Summative Assessment at the Classroom
and Large-Scale Levels
Differences Classroom Level Large-Scale Level
Frequency Often, at least once per week Usually once per year
Conditions
of Testing Heavily dependent on context Standardized across contexts
Metaphor Close-up snapshot Panoramic view
assess students in math and reading in grades 3 through 8 each year and once in high
school. They are also required to assess students annually in science at least once in
grades 3–5, 6–9, and 10–12.
One of the key implications of the two different kinds of summative assessment
is that they serve very different functions and must be designed to match their func-
tion. The needs of large-scale assessment users require assessments covering a wide
range of content. Most of these tests are designed to cover material across one or more
years of instruction. To assess this broadly requires that you can ask only a few items
on each concept. This approach is akin to the panoramic view—the details of the
scenery are necessarily fuzzy. And because large-scale tests cover so much ground, they
primarily give a general picture rather than any specific information classroom teach-
ers could use for designing their instruction. We will discuss large-scale assessments
and their strengths and weaknesses at greater length in Chapter 11.
In contrast, for the end-of-unit summative test, classroom teachers want a more
close-up snapshot of a narrower landscape. They want to know what students have
learned from the unit. They need information for a grade. And based on possibly poor
scores on some of the essay questions chosen for this test, they might also decide to
change their instruction the next time they teach the unit. These teacher needs are
quite different from those of the users of large-scale tests. Because the requirements of
the users are so very different, the SAT would not work as an end-of-unit test, and an
end-of-unit test would not be helpful for making college admissions decisions.
Another important difference between the two kinds of summative assessments
is the way they are administered. Teachers can vary the conditions of summative
assessments in their classrooms to make sure all students can show in different ways
what they know and can do. For example, some children may need extra time, other
children may need to have all items read to them, and others may need to talk their
way through a problem. In several later chapters, we discuss more about the accom-
modations that teachers can use for all their assessments.
gre78720_ch01_002-031.indd Page 19 3/20/09 11:52:23 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Because large-scale tests often are used to compare students across schools, dis-
tricts, and even countries, the way they are administered must be standardized. That Standardized
is, they must be administered, scored, and interpreted the same way for everyone tak- Administered, scored, and inter-
ing the exam. The proctor who is administering the tests must read the directions preted exactly the same for all
test takers.
carefully; the time given for students to complete the assessment must be exact. If you
are taking the SAT in New Mexico, it wouldn’t be fair if a college admissions officer
compared you to a student in Iowa who got 10 more minutes to complete the exam
than you did. Review Table 1.4 to compare these two types of summative assessment.
Inquiry Stance
Long after that meeting, we went to a conference where B. J. Badiali (2002) discussed
an “inquiry stance” as something important for teachers’ work in schools. Badiali
pointed out that colleges of education cannot prepare teacher candidates for every
possible classroom situation, so they must give them the tools for problem solving in
the classroom. In contrast to a “caring stance” or a “best practices stance,” an inquiry Inquiry Stance
stance prepares teacher candidates to identify problems, collect relevant data, make An approach to dealing with
judgments, and modify practices to bring about improvement in teaching and learning. challenges in the classroom that
involves identifying problems,
In 1910, John Dewey, one of the founders of modern educational practices, described collecting relevant data, making
a similar process for dealing with problematic situations. He suggested designing a judgments, and then modifying
solution, observing and experimenting to test the solution, and then accepting or practices to improve teaching
and learning.
rejecting it (Dewey, 1910).
One key use of the inquiry stance and problem-solving process is analyzing and
then improving student learning outcomes in each classroom. As discussed in the early
paragraphs of this chapter, we believe all you do should be aimed toward helping your
students learn. You can see that assessment in all of its manifestations is the founda-
tional tool for an inquiry stance. This is because assessment gives you information
every step of the way on what your students know and are able to do and whether
your instructional strategies are working.
gre78720_ch01_002-031.indd Page 20 3/20/09 11:52:23 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
(A) (B)
FIGURE 1.4 One Student’s Work Samples (A) Without hand-strengthening exercises and
(B) with hand-strengthening exercises on the same day.
Reprinted with permission.
skills. She set up two new daily writing activities. Students did the hand-strengthening
exercises before one of them and did nothing before the second. She used students’
tracing quality as her new measure of student learning (see Figure 1.4). As you can
see, when students used the exercises, their writing was much more controlled than
when they did not. She also saw general improvement over several weeks, so she
decided to continue to incorporate the exercises.
In Table 1.5, you can see a sample of action research projects recently completed
by teachers in their classrooms. Problems range from enhancing literacy skills to
reducing transition times so more instruction could take place. These teachers chose
projects based on their own interests and passions connected to specific problems they
saw in their own classes. For example, Janelle Smith recently read that 85 percent of
students who appear in court are illiterate. The article also mentioned that officials
should plan jail cells 15 years from now equal to the number of current third graders
who cannot read. She decided to involve parents as partners in her classroom reading
instruction. She sent a questionnaire home and found that many parents wanted to
help but lacked strategies and were afraid they would do the wrong thing. Each week
she designed and sent home to these parents several strategies to help their children
in the coming week. She subsequently saw much improvement in her kindergarten
students’ reading levels and knowledge of the alphabet.
TA B L E 1 . 5
Recent Action Research Projects Conducted by Teachers in Their Classrooms
Teacher Problem Intervention Data Collected Outcome
Elizabeth Mills Fine-motor deficits Hand-strengthening Work samples of activities Exercises increased
in children with exercises before some with and without legibility and accuracy.
autism. fine-motor tasks and hand-strengthening
not before others. exercises.
Dana Stachowiak Students’ literacy Read alouds 5–7 times STARS levels, end of grade Most students enjoyed
skills below per day. scores, students’ read alouds.
grade level and attitudes on read alouds Most showed more growth
not showing before and after. than in prior 18 weeks.
growth.
Janelle Smith Parents want to Family homework packet Alphabet knowledge, Increased parent
help with with activities; leveled word recognition, participation, increased
reading but books sent home each parent questionnaire, alphabet and word
lacked strategies week. weekly reading logs, recognition.
to assist. completion of activities.
Jocelyn Beaty- Students unable to Guided reading Compared this year’s test Significant increase in test
Gordon read biology and students scores to last year’s. scores, improvement of
text. designing concept student confidence,
posters to illustrate student and teacher
comprehension. attitudes.
A good example of student empowerment came out of the action research project
of Dana Stachowiak (see Table 1.5). Dana had recently attended a staff development
session about the value of reading aloud to students seven times per day. She did not
believe such a simple act could impact her students’ literacy skills, so she set out to do
an action research project to prove the presenter was wrong. She started reading brief
items from the newspaper, poems, and even “The Way I See It” quotes from Starbucks
paper coffee cups, as long as the excerpts were related to a lesson she was about to teach.
One student noticed that the Starbucks quotes were all from famous people. Her class
had a discussion about the quotes and the fact that they had no quotes from children.
Because Dana was doing a unit on activism and persuasive writing at the time, she and
the class decided to write letters to Starbucks supported with good arguments suggest-
ing that Starbucks use quotes from children. She wanted to instill in her students the
knowledge that they could act to change things they didn’t agree with. She was inter-
ested in promoting civic engagement as one skill important to citizens in a democracy.
An example of the letters students sent and quotations they offered appear in Box 1.2.
Incidentally, Dana also found that the frequent readings had a positive impact
on her students’ literacy skills. These findings bring up another advantage of data col-
lection. Data gathered fairly and objectively can help you make decisions in your class-
room that might allow you to get beyond your own perspective and biases and use
some effective practices you never dreamed could work.
From the projects we have discussed, you can also see that action research uses
data addressing all three of the purposes of assessment described in the previous section
gre78720_ch01_002-031.indd Page 23 3/20/09 11:52:24 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
B O X 1 . 2
of this chapter. For example, you can use diagnostic assessment to get a clear sense of
what the problem might be. Janelle Smith needed a concrete picture of what literacy
skills her children needed to work on so that she could send assignments home with
effective suggestions for parents. Similarly, Jocelyn Beaty-Gordon needed to understand
her students’ literacy levels to design appropriate activities for her biology classroom.
Formative assessment is useful during an intervention to see whether it has the
hoped for impact during implementation. If it does not, the formative assessment gives
you information to tweak the intervention. For example, Janelle found that when scis-
sors or crayons were needed for one of the activities she sent home, the book bags
came back with those activities unfinished. When asked, the children told her they did
not have those supplies at home. When she started sending home supplies as well as
activities, the book bags came back with completed projects.
Finally, summative assessment is needed to determine the overall value of the
project. Janelle used a knowledge of the alphabet test and the final literacy assessments
at the end of the year to see how much progress her students made. Similarly, Jocelyn
found that her techniques were working with her biology students when she compared
their scores on biology tests to those of students in previous years.
So action research and an inquiry stance are part of a mindset that many teach-
ers employ to empower themselves to problem solve about their practice and to help
their children learn more. At the heart of all action research projects is the enhance-
ment of student learning, and assessment for all three purposes—diagnostic, formative,
and summative—provides the data to make the decisions.
and enhance student learning and avoid potential harmful effects. They promote well-
being, and they are based on basic beliefs about right and wrong.
Do No Harm
As a starting point for generating principles widely applicable to the various types of
assessment that occur in classrooms, we have found that two general guidelines seem
to address many of the ethical questions raised by teachers (Green et al., 2007). The
first one, dating back to the ancient Greeks, is to do no harm. Avoiding suffering is a
basic, broad ethical principle that people use to govern their lives. It summarizes eth-
ical principles handed down through the centuries, such as the Golden Rule (or “Assess
as Ye Would Be Assessed” [Payne, 2003]). In the realm of the classroom, this principle
manifests itself as the basic duty that teachers have to their students. Teachers must
fulfill the needs of each student as a learner, preserving the dignity of both the student
and the teacher. Teachers must remember that they serve in loco parentis (in the place
of parents), another factor that lends support to the principle that assessment should
protect students and do no harm to them.
For educators to fulfill this obligation, they must be well-versed in the potential
impact of the practices they use, because their assessments and evaluations may have
a variety of unintended consequences for their students. Protecting students from
harm is a general principle that no one contests in the abstract. However, thinking
about causing harm focuses the discussion on implications for everyday practice. So
when we talk about harm, we must emphasize that teacher judgment must be involved.
The first judgment comes in defining what harm is. For example, a teacher who, on a
test, uses surprise items that did not appear on the study guide may do harm by break-
ing the implicit bond of trust between teacher and student. A teacher who passes out
tests from highest grade to lowest may do harm by breaching confidentiality. Such
actions erode the dignity of students.
reflect only the extent to which students have mastered the goals of instruction. When
teachers modify grades or scores because of student effort, late work, or behavior
problems, for example, the scores do not accurately communicate the level of mastery
and can eventually harm students. Similarly, many teachers do not use a blind grading
system and may unconsciously prefer certain students. Such educators may uninten-
tionally engage in score pollution by giving less-favored students lower grades than
they deserve. Everyone has stories about a teacher with biases—the one who won’t ever
give males an A or the one who is always lenient with the kids who play sports. Class-
room grades sometimes seem to be “polluted” by these other factors. Such actions can
result in harms such as mistrust between student and teacher, discouragement, and
lack of student effort.
We had a memorable experience during graduate training that illustrates the
harm that can be done with score pollution. A group of us were asked to listen to
individual students read aloud who were in a special education class for students with
learning disabilities. We were checking to see if any of the students might be getting
closer in literacy skills to those in general education in the same grade. We were sur-
prised when one boy, poorly dressed, unkempt, and a tad defiant, read aloud fluently
and expressively at a level well above typical students in his grade. Our only conclusion
could be that he had been labeled as a student with a learning disability and placed in
that class based on very “polluted” scores of his achievement level. After the testing
information came to light, we were relieved to learn that this student was chosen to
be reintegrated into a general education classroom. Students, their families, and other
stakeholders in the education system need unbiased and factual information regarding
academic achievement to make good decisions about each step in their education.
Judgment Calls
We believe that do no harm and avoid score pollution are two basic guidelines you
can apply when facing real-life decisions about assessment in your classroom. In our
own research, we have asked teachers and teacher candidates for their judgments
about whether a variety of classroom assessment practices are ethical (Green et al.,
2007). We found strong agreement in several areas of assessment (see Table 1.6). For
example, protecting confidentiality of students’ assessment results was judged impor-
tant. Most agreed with our student that a teacher who passed out scored tests to
students in order of points earned was unethical. Communication about grading was
another area with high levels of agreement. Explaining how a task will be graded and
letting students know what material will be on a test are examples of items judged
ethical. Finally, this group agreed on some grading practices, for example, indicating
that lowering report card grades for disruptive behavior was unethical. Although not
all situations you face will be easily resolved, there is general agreement about the
most common ones.
TA B L E 1 . 6
Judgments About the Ethicality of Classroom Assessment Practices
with Strong Agreement by Teachers and Teacher Candidates
Scenario Ethical Unethical
A teacher states how she will grade a task when she assigns it. 98% 2%
A teacher spends a class period to train his students in test-taking 90% 10%
skills (e.g., not spending too much time on one problem,
eliminating impossible answers, guessing).
A teacher lowers report card grades for disruptive behavior. 15% 85%
Adapted from Green, S., Johnson, R., Kim, D., & Pope, N. (2007). Ethics in classroom assessment practices: Issues
and attitudes. Teaching and Teacher Education 23: 999–1011. Reprinted with permission from Elsevier.
gre78720_ch01_002-031.indd Page 27 3/20/09 11:52:24 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Your Turn
When we questioned teachers and teacher candidates about their views about whether
a variety of classroom assessment practices were ethical, we found several areas of
disagreement. Table 1.7 shows a sample of some of the scenarios we used. Cover the
“Ethical” and “Unethical” columns and take a few minutes to write down your own
rating of each practice as ethical or unethical. You can then compare your answer
choices with those of our group of teachers and teacher candidates.
One area that our group disagreed about was grading practices—the area that
created the firestorm in a Midwestern community. You can see we found mixed results
in the grading items shown in Table 1.7. We believe one reason we saw disagreement
about these practices is the complexity and nuance facing teachers involved in making
ethical judgments. For example, teachers often include dimensions not directly related
to the mastery of the learning goals in their grading schemes, such as neatness or class
participation. If the percentage devoted to such factors has minimal impact on the
overall grade, such practices do not pose an ethical dilemma. However, if such factors
are weighted heavily enough to change a grade, they may result in score pollution. As
you can see, score pollution is one issue where theory meets reality in the classroom.
TA B L E 1 . 7
Judgments About the Ethicality of Classroom Assessment Practices
with Disagreement by Teachers and Teacher Candidates
Scenario Ethical Unethical
A teacher always knows the identity of the student whose essay 49% 51%
test she is grading.
To enhance self-esteem, an elementary teacher addresses only 41% 59%
students’ strengths when writing narrative report cards.
As a teacher finalizes grades, she changes one student’s course grade 37% 63%
from a B⫹ to an A because tests and papers showed that the
student had mastered the course objectives even though he had
not completed some of his homework assignments.
Adapted from Green, S., Johnson, R., Kim, D., & Pope, N. (2007). Ethics in classroom assessment practices: Issues
and attitudes. Teaching and Teacher Education 23: 999–1011. Reprinted with permission from Elsevier.
gre78720_ch01_002-031.indd Page 28 3/20/09 11:52:25 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 1 . 8
Recommendations for Making Ethical Assessment Decisions
1. Familiarize yourself with and follow your district’s and your school’s practices regarding
classroom assessment and standardized testing.
2. Discuss potential ethical issues or conflicts with teachers and administrators with whom
you work.
3. Use do no harm and avoid score pollution as guidelines for your actions.
4. Think things through from all perspectives before you act.
also believe that do no harm and avoid score pollution are two useful basic principles
you can apply when faced with making ethical decisions about assessment in your
classroom. Unfortunately, not all situations you face will be easily resolved. You will
develop your professional judgment about ethics and assessment through experiences
in your classroom, discussions with peers and administrators, and analysis of more
news stories that are sure to arise. Our advice is to make sure you first familiarize
yourself with and follow your district’s and school’s practices. Also, as you become
aware of potential ethical issues or conflicts, explicitly discuss them with other teachers.
Finally, make your own practices and decisions around assessment with the benefit of
the students in your care in mind, and make sure you clearly think things through
before you act. Seeing things from students’ or administrators’ perspectives, as well as
your own, can help you clarify your decisions. These recommendations are summarized
in Table 1.8.
Next, we acquainted you with the inquiry stance, which orients teachers to iden-
tify and solve problems in their classrooms. This problem-solving process is often
called action research. We offered several examples of the kinds of questions teachers
ask and the kind of assessment data they can collect to answer them. You saw how
these assessments served the diagnostic, formative, and summative functions. These
examples should help you see a number of the ways that assessment was essential in
these teachers’ classrooms and will become essential in yours.
Finally, we discussed the importance of ethics and assessment, providing two key
guidelines for making ethical decisions regarding assessment. The first was do no harm,
and the second was avoid score pollution. We also emphasized the importance of
teacher judgment when making ethical decisions grounded in the intention to help
students learn.
HELPFUL WEBSITES
http://www.accessexcellence.org/LC/TL/AR/
This site on action research provides background, resources, examples, and a step-by-step
approach for getting started.
http://www.edweek.org/rc/issues/achievement-gap/
This site on the achievement gap provides background information as well as links to data bases,
web resources, and relevant articles from Education Week.
11. Describe a time you or someone you know was graded unfairly. Was a violation
of either of the two ethical guidelines, do no harm or avoid score pollution,
involved? Explain.
12. Examine the practices described in Table 1.7. Which do you believe are ethical?
Which do you believe are unethical? Explain your reasoning.
13. As a teacher, what do you think will be your most difficult ethical dilemma
related to assessment practices? Why do you think so?
REFERENCES
Ames, C. 1992. Classrooms: Goals, structures, and student motivation. Journal of Educational
Psychology 84 (3): 261–271.
Andrade, H. 2008. Self-assessment through rubrics. Educational Leadership 65: 60–63.
Badiali, B. J., and D. J. Hammond. 2002 (March). The power of and necessity for using inquiry in
a PDS. Paper presented at the Professional Development Schools National Conference,
Orlando, FL.
Banks, J., M. Cochran-Smith, L. Moll, A. Richert, K. Zeichener, P. LePage, L. Darling-Hammond,
H. Duffy, and M. McDonald. 2005. Teaching diverse learners. In L. Darling-Hammond and
J. Bransford (Eds.), Preparing teachers for a changing world. San Francisco: Jossey-Bass.
Barrell, J., and C. Weitman. 2007. Action research fosters empowerment and learning
communities. Delta Kappa Gamma Bulletin 73 (3): 36–45. Retrieved April 4, 2008, from
Academic Search Premier database.
Belfield, C. R., and H. Levin. 2007. The price we pay: Economic and social consequences of
inadequate education. Washington, DC: Brookings Institution Press.
Black, P., and D. Wiliam. 1998. Assessment and classroom learning. Assessment in Education:
Principles, Policy, and Practice 5 (1): 7–74.
Butler, R. 2006. Are mastery and ability goals both adaptive? Evaluation, initial goal construction
and the quality of task engagement. British Journal of Educational Psychology 76: 595–611.
Cochran-Smith, M., and S. L. Lytle. 1999. Relationships of knowledge and practice: Teacher
learning in communities. Review of Research in Education 24: 249–305.
Dewey, J. 1910. How we think. Lexington, MA: D. C. Heath.
Earl, L. 2003. Assessment as learning. Thousand Oaks, CA: Corwin Press.
Gordon, E. W. 2008. The transformation of key beliefs that have guided a century of assessment.
In C. A. Dwyer (ed.), The future of assessment: Shaping teaching and learning. New York:
Erlbaum, pp. 3–6.
Green, S., and M. Brown. 2006. Promoting action research and problem solving among teacher
candidates: One elementary school’s journey. Action in Teacher Education 27 (4): 45–54.
Green, S., R. Johnson, D. Kim, and N. Pope. 2007. Ethics in classroom assessment practices:
Issues and attitudes. Teaching and Teacher Education 23: 999–1011.
Haladyna, T. M., S. B. Nolen, and N. S. Haas. 1991. Raising standardized achievement test scores
and the origins of test score pollution. Educational Researcher 20: 2–7.
Lee, J. March, 2008. War on achievement gaps: Redrawing racial and social maps of school
learning. Raymond B. Cattell Early Career Award Lecture presented at the meeting of the
American Educational Research Association Meeting, New York.
Moon, T. 2005. The role of assessment in differentiation. Theory into Practice 44 (3): 226–233.
O’Keefe, P. A., A. Ben-Eliyahu, and L. Linnenbrink-Garcia, March, 2008. Effects of a mastery
learning environment on achievement goals, interest, and the self: A multiphase study.
Paper presented at the annual meeting of the American Educational Research Association,
New York.
Pelligrino, J. W., N. Chudowsky, and R. Glaser. 2001. Knowing what students know: The science
and design of educational assessment. Washington, DC: National Academy Press.
Poplin, M., and J. Rivera. 2005. Merging social justice and accountability: Educating qualified
and effective teachers. Theory into Practice 44 (1): 27–37.
gre78720_ch01_002-031.indd Page 31 3/20/09 11:52:25 PM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
References 31
Salmon, A. K. 2008. Promoting a culture of thinking in the young child. Early Childhood
Education Journal 35: 457–461.
Shepard, L. A. 2008. Formative assessment: Caveat Emptor. In C.A. Dwyer. 2008. The future of
assessment: Shaping teaching and learning. New York: Lawrence Erlbaum Associates,
279–303.
Shute, V. J., 2008. Focus on formative feedback. Review of Educational Research 78: 153–189.
Stiggins, R. 2007. Assessment through the student’s eyes. Educational Leadership 64 (8): 22–26.
Stiggins, R., and J. Chappuis. 2005. Using student-involved classroom assessment to close
achievement gaps. Theory into Practice 44 (1): 11–18.
Tomlinson, C. 2008. Learning to love assessment. Educational Leadership 65 (4): 8–13.
Urdan, T., and Schoenfelder, E. (2006). Classroom effects on student motivation: Goal structures,
social relationships, and competence beliefs. Journal of School Psychology 44: 331–349.
Witkow, M. R., and A. J. Fuligni. 2007. Achievement goals and daily school experiences among
adolescents with Asian, Latino, and European American backgrounds. Journal of
Educational Psychology 99: 584–596.
gre78720_ch02_032-065.indd Page 32 3/21/09 12:37:21 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
32
gre78720_ch02_032-065.indd Page 33 3/21/09 12:37:23 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 2
Learning Goals:
The First Step
Begin with the end in mind.
–Stephen Covey
INTRODUCTION
As a teacher you have the awesome responsibility to chart a course of learning that
will prepare the students in your class to be productive citizens in our society. As
you prepare for the school year, you will establish broad goals your students should
achieve. For example, suppose you want your students to be able to comprehend,
interpret, analyze, and evaluate what they read. You set these goals mentally and
perhaps commit them to paper. In planning a unit or a lesson, you realize that for
the goals to be helpful they must be stated more specifically. For example, to com-
prehend means what? Your reading goal now becomes more tangible, possibly includ-
ing the ability of students to summarize and paraphrase texts, draw conclusions,
make predictions, analyze cause and effect, or interpret figurative expressions.
Clear statements of the aims you have in mind will assist you in systematically
addressing the key understanding and skills your students should know and be able
to do. Without such learning goals, instruction may be a hit-or-miss approach that
will address some key points, but neglect others.
Learning Goals
DEFINING AND USING LEARNING GOALS
Learning goals are referred to with many terms, such as learning outcomes, objec-
Learning objec-
tives, targets, and tives, aims, and targets. Some educators make a distinction between broad learning
outcomes. goals for a unit, and narrower learning objectives for specific lessons. What these
33
gre78720_ch02_032-065.indd Page 34 3/21/09 12:37:23 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
terms all have in common is that they specify what students should learn as the result
of classroom experiences such as instructional lessons and units. We refer to these
statements of instructional intent as learning goals.
Learning goals consist of a verb and a noun phrase. The verb specifies the type
of response students will exhibit to show they have achieved a goal. For example, a
learning goal in language arts might require that a student
Frequently, the verb reflects the cognitive process (e.g., recalling, classifying, cri-
tiquing) that is the intended learning outcome. The noun phrase refers to the subject-
area content that students should learn. In our example, then, as teachers plan
instruction and assessment, they focus on students recognizing figurative expressions
and using the context of the narrative to interpret the meaning of such expressions.
Examples of learning goals from mathematics, social studies, and science are shown
in Table 2.1. Notice that the goals use the verb/noun structure described.
You may be wondering why we include a discussion of learning goals in a book
on assessment. Learning goals allow you to start a unit and each lesson with a clear
understanding of what you ultimately want to achieve in your instruction. As you plan
instruction, you need a fairly concrete mental image of the learning you want students
to accomplish. As Figure 2.1 shows, the learning goals provide a guide for teachers as
they plan learning experiences and prepare the student assessment. The learning goals
clarify what students will do to demonstrate their learning. To continue our example,
TA B L E 2 . 1
Examples of Learning Goals in Various Subject Areas
Structure of Learning Goals
Subject Area Grade Verb Noun Phrase
a learning goal that states that students will explain the figurative use of words in context
requires a teacher to develop a lesson or unit that involves the class in the interpretation
of figurative expressions the students find in their readings. Then the assessment would
similarly require students to analyze figurative language within the context of new
readings, such as stories or a poem.
Using learning goals as a starting point is the key to alignment in the instruc- Alignment
tional process. Alignment occurs when elements that are interrelated (i.e., learning The congruence of one element or
goals, instruction, and assessment) are positioned so that the elements perform prop- object in relation to others.
erly. Aligning your instruction and your assessment with your learning goals ensures
that instruction addresses the content and strategies that you want students to learn.
It also ensures that students are properly assessed on that specific content.
A planning chart offers a teacher further support for alignment of goals, instruc-
tion, and assessment (Gronlund & Brookhart, 2009). As illustrated in Table 2.2, a
planning chart sketches out the learning experiences a teacher might use to develop a
lesson or unit on a topic, such as figurative language. The focus of this lesson is the
learning goal related to interpretation of metaphors. Notice that the activities in the
Teaching Methods column both follow from the learning goal and prepare students
for assessment tasks showing they have accomplished the learning goal.
Misalignment occurs when instruction or assessment tasks are not congruent Misalignment
with learning goals. In our example, misalignment would occur if we haven’t prepared The lack of congruence of learning
students with learning experiences to meet the goal. It would also occur if we required goals, instructions, and/or
assessment.
students to write poems in which they use figurative expressions as the assessment.
Instructionally we have only prepared students to interpret figurative expressions.
Similarly, we have misalignment if an assessment requires students only to list and
define the types of figurative expressions. In this instance, we could not determine
whether students can use context to understand the meaning of figurative expressions
encountered in their readings.
TA B L E 2 . 2
Planning Chart to Align Teaching and Assessment Tasks with the
Learning Goals
Learning Goal Teaching Methods/Learning Experiences Assessment Tasks
Interpret devices Read Carl Sandburg’s “Fog” and other Determine whether
of figurative poems that use metaphor. Discuss the students can
language such meaning of the metaphors. highlight and
as similes, Break into small groups and read interpret new
metaphors, and examples of metaphors and discuss instances of
hyperbole. meaning. Report back to the full class metaphors in
the examples the group found and the poems.
meaning of the examples.
gre78720_ch02_032-065.indd Page 36 3/21/09 12:37:28 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
We have noticed misalignment problems in the lessons and units our teacher
candidates design. They develop interesting ideas for lessons and units (e.g., a zoo
unit or a lesson incorporating cooking), but these lessons or units do not necessarily
relate clearly to the learning goals in the subject area they are teaching. If a teacher
spends the bulk of instructional time making cookies for a lesson with a learning goal
related to understanding fractions, the students will learn more about baking cookies
than about fractions. The teacher’s instructional activities were not aligned with the
learning goals. Students learn what they spend their time attending to or doing. For
alignment of learning goals, instruction, and assessment, you must make sure your
instruction and your assessment guide students to attend to the content of the learn-
ing goals.
A lesson about learning goals comes to us from an article about integrating video
games into instruction (Sandford et al., 2007). As might be expected, the authors
learned that video games were not a panacea for generating student interest and boost-
ing achievement. They discovered that the teachers who had the most success integrat-
ing the games into instruction were those who had clear learning goals in mind and
thought of specific ways they could use these games to reach those goals. Teachers
without clear goals were not able to capitalize on the games or plan ways to use them
to further specific kinds of learning needed. These findings illustrate that learning goals
are the touchstone for your instruction and assessment.
BACKWARD DESIGN
Wiggins and McTighe (1998) also zero in on the importance of alignment by offering
Backward Design a slightly different approach for planning unit lessons. In their backward design pro-
A process of planning in which a cess, the learning goals are identified first (see Figure 2.2). These authors, however,
teacher identifies the learning remind us that those learning goals are the desired end results. The learning goal
goals and the forms of evidence
(assessment) that would indicate a expresses what student outcome is desired. Thus, as Stephen Covey (1989) states, you
student has attained the learning “begin with the end in mind.”
goals. Instruction is then planned In the backward design process, the middle stage differs from our typical thoughts
to address the knowledge and skills
addressed in the goals and about planning a lesson or unit. Instead of designing instruction in this stage, the
assessment. teacher identifies the forms of evidence (performances or other assessment results) that
would indicate a student has attained the desired knowledge, skills, or strategies. At
this stage the teacher asks, “How will we know if students have achieved the desired
results and met the standards?” (Wiggins & McTighe, 1998). Here the authors are
advising us that in the planning stages we should ask ourselves how we will determine
whether students have accomplished the learning goals. Thus, if we consider interpret-
ing figurative language a critical understanding for students, how will we know stu-
dents have mastered this goal? Will students read poems and speeches that employ
figurative language and write brief interpretations of the meaning of the materials?
After specifying these learning outcomes, the final stage involves planning instruc-
tion to prepare students to meet the learning goal, such as interpreting figurative language.
Plan learning
Determine acceptable
Identify desired results. experiences
evidence.
and instruction.
TA B L E 2 . 3
Questions That Assist in Planning Instruction to Address Learning Goals
Key Questions Answers to Key Questions
What enabling knowledge (facts, concepts, and principles) Understanding and skills required for learning goal mastery:
and skills (procedures) will students need to achieve Describe how racism and intolerance contributed to the
desired results? Holocaust.
Locate information using both primary and secondary
sources.
Determine the credibility and bias of primary and secondary
sources. (Arizona Department of Education, 2006)
What activities will equip students with the needed Small group work inspecting primary sources to examine
knowledge and skills? the prejudice associated with the Holocaust.
What will need to be taught and coached, and how should Concepts of racism, intolerance.
it best be taught, in light of performance goals? Use of primary sources to infer social conditions.
Types of bias in primary sources.
What materials and resources are best suited to Photocopies of primary sources. Website of the United
accomplish these goals? States Holocaust Memorial Museum.
Is the overall design coherent and effective? Varied range of instructional and assessment activities
provide opportunity to master the learning goals.
Wiggins and McTighe suggest that planning for instruction should focus on the questions
in Table 2.3.
The table provides an example based on a learning outcome related to the Holo-
caust in which the end goal is Students will use primary sources to reconstruct the
experiences of persecuted groups. In the right-hand column are the answers to the key
questions as they apply to the learning goal related to the Holocaust. In answering each
of the key questions, you can plan instruction to help students achieve the desired
outcomes.
allocated? The learning goals are the best source for criteria for scoring students’ per-
formances. As shown in Table 2.4, a set of communications learning goals can lead
directly to criteria that would be included in a checklist (see Table 2.5) to evaluate
students’ communication skills.
TA B L E 2 . 4
Content Standards for Communication Skills
Communication The student will recognize, demonstrate, and analyze the qualities of
Goal (C) effective communication.
7-C1 The student will use speaking skills to participate in large and small
groups in both formal and informal situations.
7-C1.1 Demonstrate the ability to face an audience, make eye contact, use the
appropriate voice level, use appropriate gestures, facial expressions,
and posture when making oral presentations.
From South Carolina Department of Education. 2003a. South Carolina English language arts curriculum standards 2002.
http://ed.sc.gov/agency/offices/cso/standards/ela/.
TA B L E 2 . 5
Example of a Checklist Based on Learning Goals for Communication
Standards
Student: Hassan Brown Date: June 10, 2011
✓ Criteria Comments
In addition, if a teacher notes which students mastered the skills and which have
not, that information can point to strengths and weaknesses of instruction and guide
plans for further instruction. Figure 2.3 shows the results of a teacher using tally marks
to record the number of students who demonstrated achievement of each learning
goal. The right-hand column shows the percentage of students who mastered each
topic. In the case of the goal faces the audience, 100% of the students achieved this
skill. In contrast, only 10 of the 20 students appropriately use gestures. Thus, when
students are preparing for a subsequent speech, the teacher may want to offer more
examples and provide students time to practice using gestures. We expand on the use
of class summary data in Chapter 5.
TA B L E 2 . 6
State Content Standards for Various Grade Levels and Subject Areas
Grade
Level Subject Area Standard State
K English Language Arts: Identify the front cover, back cover, and title page California (2006)
Reading of a book.
1 Mathematics Use patterns to skip count by twos, fives, and tens. Texas (2006)
2 Social Studies: Economics Give examples of people in the school and community Massachusetts (2003)
who are both producers and consumers.
5 Physical Education Analyze fitness data to describe and improve Virginia (2001)
personal fitness levels (e.g., apply data to own
plan for improvement in at least two components
of health-related fitness).
6 Technology Use a computer system to connect to and access New York (1996)
needed information from various Internet sites.
7 Visual Arts Apply the principles of design to interpret various New Jersey (2004)
masterworks of art.
8 Foreign Language Comprehend and interpret the main ideas and details Florida (2005)
from television, movies, videos, radio, or live
presentations produced in the target language.
9 English Language Arts: Establish a clear, distinctive, and coherent thesis or Georgia (2006)
Writing perspective and maintain a consistent tone and
focus throughout.
10 English Language Arts: Evaluate relationships between and among character, Illinois (1997)
Literature plot, setting, theme, conflict and resolution and their
influence on the effectiveness of a literary piece.1
11 Social Studies: History Compare and evaluate competing historical Washington (n.d.)2
narratives, analyze multiple perspectives, and
challenge arguments of historical inevitability.1
12 Music Explain how music reflects the social events of Ohio (2003)
history.
1
High school standards
2
Office of the Superintendent of Public Instruction (n.d.)
Too Many Learning Goals, Too Little Time: Selecting Learning Goals 41
development of state standards; subsequently, the national standards inform the work
on state-level content standards.
Content standards do not always translate directly into learning goals. Instead,
they sometimes describe competencies at a general level that requires further definition
of the instructional goal. In Table 2.4, for example, the communications goal is speci-
fied at a very general level: The student will recognize, demonstrate, and analyze the
qualities of effective communication. Often standards are accompanied by indicators or
benchmarks that clarify the outcome stated in the standard. One learning outcome
associated with the communication goal is Demonstrate the ability to face an audience,
make eye contact, use the appropriate voice level, use appropriate gestures, facial expres-
sions, and posture when making oral presentations. Such a statement does have clear
implications for instruction and assessment.
Teacher Editions
Learning goals are also found in the teacher editions of textbooks used in the class-
room. These learning goals may serve as examples of the strategies and content knowl-
edge expressed in the state standards and district curriculum guides.
As a teacher, you should not rely solely on the learning goals in teachers’ manu-
als. They may not address the key content identified in your state content standards
and district curriculum materials. The state and district content standards describe the
skills and concepts that your students will experience in the state test. As a teacher,
your responsibility to the student is to develop the knowledge, skills, and strategies
outlined in state and district content standards. Only then will your students have
equal access to educational experiences that prepare them to meet the challenges they
will face on the state test and in life beyond the classroom.
Some teachers take this responsibility to mean they cannot go beyond the con-
tent standards in the state and district documents and expose their students to other
important ideas, but this is not true. The concepts, skills, and strategies in the content
standards should be central to the learning experiences you plan. But teachers often
go beyond these to introduce students to other important ideas. For example, teaching
students to integrate figurative expressions into their writing would be an appropriate
extension of the goal of interpreting figurative language.
goals. For example, review the following brief list of some possible learning goals for
figurative expressions:
1. Demonstrates knowledge of the meaning of terms used to describe types of
figurative language.
2. Identifies figurative expressions in a text.
3. Comprehends the meaning of figurative expressions using context.
4. Incorporates figurative language into writing.
An English language arts teacher would quickly be overwhelmed given a similar
list of learning goals for every new concept—central idea, cause and effect, fact and
opinion, drawing conclusions, making predictions. So, in thinking about the learning
goals that are appropriate for your class, you need to select those of most importance
to your students.
Given the need to sample learning goals, how does one select which should be
addressed? Wiggins and McTighe (1998) offer us a series of filters to guide the selec-
tion of learning goals (see Table 2.7).
The first filter notes that in selecting learning goals, we should focus on the big
ideas. For example, in reading, a critical strategy is for students to identify the main
idea of a text and the details that support the central idea. Such a skill is important in
reading across narrative and expository texts and across subject areas. As shown in
TA B L E 2 . 7
Filters to Aid in the Identification of Key Learning Goals
Filter Description Example
To what extent does the Enduring understanding Importance of
idea, topic, or process focuses on the larger understanding graphic
represent a “big idea” concepts, principles, and representation of data.
having enduring value processes within a field.
beyond the classroom?
To what extent does the Enduring understanding Summarizing data reveals
idea, topic, or process involves students in patterns and trends
reside at the heart of applying the knowledge, across a variety of
the discipline? skills, and procedures used content areas.
by professionals in their
fields.
To what extent does the Enduring understanding Students do not realize
idea, topic, or process requires assisting students that graphs can be
require uncoverage? in uncovering abstract manipulated to deceive
ideas, counterintuitive the reader.
ideas, and misconceptions.
To what extent does the Enduring understanding Use bar graphs to
idea, topic, or process should provoke and document the number
offer potential for connect to students’ of biographies in the
engaging students? interests. school media center
that portray each
ethnic group.
Adapted from Wiggins, G., and J. McTighe. 1998. Understanding by design. Alexandria, VA: Association for Supervision
and Curriculum Development. Used with permission.
gre78720_ch02_032-065.indd Page 43 3/21/09 12:37:31 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Table 2.7, in mathematics and social studies a critical skill is for students to be able to
interpret graphic representations of data and to summarize data graphically.
The second filter addresses the degree to which the new idea or strategy is cen-
tral to the discipline. For students of history, reviewing and interpreting primary
sources is a process central to historical inquiry (VanSledright, 2004). Thus, this second
filter would suggest that students’ time would be well spent working in small groups
to review journal entries, political cartoons, and advertisements for a historical period.
Making meaning from these documents models for students the process that historians
engage in as they research historical periods.
The third filter focuses on the need for instruction to uncover key ideas because
of the abstract quality of the new concepts, common misperceptions of students, or
counterintuitive quality of the new principles. For example, students are familiar with
mixing red and blue paint to develop the color purple. This knowledge, however, may
contribute to student misperceptions about the formation of white light. Thus, in phys-
ics, a demonstration may be a useful method to assist students in the counterintuitive
understanding that white light is formed by the overlap of red, green, and blue light.
Similarly, in foreign languages the demands of learning irregular verbs that do not
follow standard rules of conjugation may require special attention, with well-articulated
learning goals to address this challenging area.
The fourth filter relates to the potential for the concepts to engage students.
Wiggins and McTighe (1998) note that seemingly dry ideas can be more appealing if
the learning activities engage students and relate to their interests. They note that big
ideas may connect for students if presented as questions, issues, simulations, or debates.
For example, Wrinkle in Time by Madeleine L’Engle (1962) is a science fiction book
about a group of children who travel through a wrinkle in time to a planet where the
father of two of the children is held prisoner. On arriving at the planet, the children
find a society in which everyone does everything at the same time, in the same way.
All children bounce balls in rhythm and skip rope in rhythm. The book was quite
challenging for the fifth graders we taught; however, we personalized the book by
exploring the theme of conformity. To engage students, we promoted discussion with
questions such as, “Do we in America prize people who are different, or do we try to
make everyone conform?” Students became more interested as they related the book
to current issues about conformity that they, themselves, were encountering.
Integrating learning goals across subject areas may also promote student engage-
ment. Consider, for example, the integration of social studies and dramatization in
theater. Students may not be interested in learning the characteristics of specific his-
torical periods during social studies; however, incorporating such information into a
dramatization might promote student engagement.
and how each implies the way students will demonstrate achievement. Building on the
idea of alignment of goals, instruction, and assessment, a verb should be selected consis-
tent with instructional intent. For example, if during a lesson on figurative expressions,
instruction focuses on matching expressions with their actual meaning, the relevant verb
is matching. However, if you intend for students to learn to use the context of a story to
explain the figurative use of words, the relevant verb is explaining or translating.
To continue our example of data representation in graphs (Table 2.7), the instruc-
tional intent might be for students to interpret tables and graphs or to summarize data
in tables and graphs. The choice of verbs has implications for student learning experi-
ences. In the case of interpreting tables and graphs, students will require practice in
reviewing graphics and making statements based on the data. In the instance of sum-
marizing data, students will need instruction and practice in compiling data and rep-
resenting the information in tables and graphs.
TA B L E 2 . 8
Three Levels of Learning Goals in Reading
Reading Goal (R) The student will draw upon a variety of strategies to comprehend,
interpret, analyze, and evaluate what he or she reads.
4-R2 The student will use knowledge of the purposes, structures, and elements of
writing to analyze and interpret various types of texts.
4-R2.6 Demonstrate the ability to identify devices of figurative language such as
similes, metaphors, and hyperbole and sound devices such as alliteration and
onomatopoeia. (SCDE, 2003a, p. 35)
South Carolina Department of Education. 2003a. South Carolina English language arts curriculum standards 2002.
http://ed.sc.gov/agency/offices/cso/standards/ela/
gre78720_ch02_032-065.indd Page 45 3/21/09 12:37:32 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
that students will use various strategies to comprehend, interpret, analyze, and evalu-
ate text. Such a statement is useful in thinking about outcomes of student learning over
a school year. However, for the teacher, this broad statement is not helpful in planning
a unit or a lesson plan. The next level (4-R2) focuses on “knowledge of the purposes,
structures, and elements of writing,” but does not guide us about the meaning of “pur-
poses, structures, and elements of writing.” At the third level (4-R2.6), we find that one
area in which we should develop student understanding is figurative language such as
similes, metaphors, and hyperbole. At this level of specificity, the implications for
instruction are clear. So as you review the learning goals in state standards or district
curriculum guides, select learning goals that provide enough specificity for you to
understand the implications for planning instruction.
LAND
EARNING GOALS, CRITICAL THINKING SKILLS,
TAXONOMIES
Learning goals direct everything, so that is where you need to begin in the task of
promoting critical thinking skills. As we have explained, these higher-level skills are
important for optimal participation in a democratic society. Several authors have
devised taxonomies, or classification frameworks, that detail thinking strategies impor- Taxonomy
tant to develop in our students. These taxonomies help teachers consider from several A classification framework.
angles what should be the focus of a lesson or unit.
Cognitive Taxonomies
Frameworks that can help you in thinking about the skills and strategies that are impor-
tant for students to learn include Bloom’s cognitive taxonomy (1956); Anderson and
Krathwohl’s revised Bloom’s taxonomy (2001); Gagné’s framework (1985); Marzano’s new
taxonomy (2001); or Quellmalz’s reading taxonomy (1985). These frameworks share in
gre78720_ch02_032-065.indd Page 46 3/21/09 12:37:32 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
January
I will . . . Recognize how and why databases are used to collect, organize, and analyze
information
I will . . . Understand how to manipulate a database using edit/find, sorting, and filter/search
FIGURE 2.4 Learning Goals Written in Student-Friendly Language with Space for Related
Personal Goals
Cognitive Domain common a focus on skills in the cognitive domain that you should incorporate into learn-
Processes related to thinking and ing goals. In this section we have chosen two commonly used cognitive taxonomies—
learning. Bloom’s taxonomy and the revised Bloom’s taxonomy—to show you how such taxonomies
can assist you.
universals, of the meaning such as rules, in material into its elements to form degree that
methods, and intent of particular elements in such a pattern or materials satisfy
processes, material. situations. a way that the structure that was criteria.
patterns, relationships not evident
3/21/09
select, state, tell, infer, interpret, operate, organize, differentiate, formulate, justify,
underline outline, predict, solve, use discriminate, hypothesize, prioritize, rate,
paraphrase, dissect, invent, plan, recommend,
predict, rephrase, distinguish, group, produce, propose, revise, support,
restate, retell, investigate, outline, write validate
summarize, relate, separate,
translate, subdivide
transform
Examples of Define metaphor, Explain the meaning Complete the Indicate which of Use metaphor to Read the following
Learning Goals simile, and of the following following the following write a poem poem and
hyperbole. metaphor. statement as a statements are modeled after identify the
simile. metaphors, Sandburg’s poem various forms of
similes, and “Fog.” figurative
hyperboles. language and
the effectiveness
of the author’s
usage.
Based on Bloom et al., 1956; Gronlund, 2003; Metfessel, Michael, and Kirsner, 1969.
47
/Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
gre78720_ch02_032-065.indd Page 48 3/21/09 12:37:32 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
language. “Evaluation” occurs when a student determines whether the materials meet
relevant criteria, as in critiquing the use of figurative language in a poem.
Because we present each cognitive strategy in its own category, you might assume
that the different cognitive skills do not overlap. This is a common misconception. If
students develop the research design for an experiment (synthesis), they must remember
the meaning of subject-specific terms (knowledge), apply their understanding of research
designs in general to the specifics of the study (application), and evaluate drafts of the
design as they finalize the product (evaluation). Higher-level cognitive processes auto-
matically require students to use several other thinking skills in achieving the learning
goal. Thus, your learning goals need not state every cognitive process. Instead, the thinking
skills associated with the focus of the particular lesson should be the only ones stated.
Although we learned about Bloom’s taxonomy in college, it took years of experi-
ence in schools to appreciate it as a valuable tool in planning for instruction and assess-
ment. A taxonomy like Bloom’s allows you to focus on the possible kinds of learning
you want for your students. An array of choices helps you move beyond the basic facts
that so often get the attention in classrooms. For example, in history classes, when we
studied wars, we always seemed to focus on memorizing battles and dates—definitely
a knowledge-level focus. Using a thinking skills taxonomy to get past that, you can build
learning goals requiring other kinds of cognitive skills. How about comparing and con-
trasting different elements of the cultures of the two sides that might have led to war
(analysis)? How about applying the lessons learned from this war to examine the later
foreign policy of that nation (application)? Knowing the array of possible thinking skills
helps you think outside the box of knowledge-level learning goals.
When your learning goals become more complex and interesting, your instruction
and assessment must follow. As we have already mentioned, alignment among learning
goals, instruction, and assessment is the key to making sure students learn what they need
to learn. So, the other important function of a taxonomy is assisting with alignment. For
example, if you have learning goals that require students to compare elements of the
cultures on two sides of a war, you must be sure that you include classroom activities
involving learning about the cultures as well as practicing the skills of comparison and
contrast. You can’t just give lectures every day—you must involve the students in practicing
the skills they need to be able to demonstrate. In addition, you must design your assessment
to include these higher skills. As you will see in Chapter 4, you can design really good
questions that tap key elements of your learning goals using verbs associated with Bloom’s
taxonomy. Your assessment and your instruction must mirror the skills and knowledge
in the learning goals, and a taxonomy of thinking skills helps you maintain that focus.
Here’s another example. If you have a learning goal stating that students must be
able to apply their knowledge of multiplication to finding the area of different spaces,
you must go beyond instructional activities such as having them memorize their mul-
tiplication tables. Memorizing multiplication tables is at the knowledge level of Bloom’s
taxonomy. Finding area includes knowledge of multiplication tables, but it goes beyond
that to also require students to apply their multiplication knowledge to a different task.
For learning goals, instruction, and assessment to be aligned, each of them must address
the same levels of the taxonomy. In the case of finding area, the learning goal, the
instruction, and the assessment must all address the application level. Having the skills
readily available to consult in the taxonomy allows you to check alignment.
D. Meta-Cognitive Discuss in an
Knowledge—of how to use authors’
cognition and self-awareness conference the
of one’s own cognition. strengths
and areas for
improvement of
poems she/he
wrote. (10%)
Adapted from A Taxonomy for Learning, Teaching, and Assessing, by Anderson, L., and D. Krathwohl (eds.). Published by Allyn and Bacon, Boston, MA. Copyright 2001. Used with permission.
49
/Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
gre78720_ch02_032-065.indd Page 50 3/21/09 12:37:33 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
the revised taxonomy covers the same thinking skills as were in the original taxonomy.
However, the thinking skills are now all expressed as verbs rather than nouns (e.g.,
“analyze” instead of “analysis”). The use of verbs emphasizes what the student is
expected to be able to do. Also, “create” (formerly “synthesis”) is now at the highest
level of the taxonomy, and “evaluate” is at the second-highest level. This change reflects
the greater cognitive complexity associated with creating.
The newly added dimension down the left side of Table 2.10 describes the differ-
ent kinds of knowledge or content to which these cognitive strategies can be applied
(Anderson & Krathwohl, 2001). The authors identify four types of knowledge: factual,
conceptual, procedural, and metacognitive. Factual knowledge is composed of the basic
elements of a content area or discipline. Conceptual knowledge addresses the interre-
lationships of the elements within a more complex structure. Thus, knowing the forms
of poetry associated with the poetry unit and the definitions of terms (e.g., sonnet,
rhyme, meter) is factual knowledge. Students use this body of factual knowledge as they
interpret the meaning of poems, a learning goal placed at the intersection of conceptual
knowledge and the cognitive skill of understanding (see Table 2.10).
Procedural knowledge is concerned with students’ ability to do something,
and metacognitive knowledge centers on students’ self-awareness of their thinking and
problem-solving strategies. So learning goals related to procedural knowledge address
writing poetry. Metacognitive knowledge is developed when students self-assess the
strengths and areas for improvement in the poetry they have written. Recall that meta-
cognition is key to developing students’ self-governing skills that are critical in par-
ticipating in a democracy.
We believe that for most classroom purposes, these four types of knowledge
can be distilled into two major types. The content (or what is learned) is covered in
knowledge/understanding, which encompasses factual and conceptual knowledge.
The techniques and skills you teach students (the how to use) are skills/strategies/
procedures, addressing procedural and metacognitive knowledge. See Table 2.11 for
examples from several fields of study.
Knowledge/understanding covers factual knowledge that consists of the basic infor-
mation associated with a field. Examples include basic facts such as technical vocabulary
in economics (supply, demand), musical notation (quarter notes, treble clef ), and types of
rocks (igneous, sedimentary). Knowledge/understanding also covers conceptual knowledge
or the interrelationships among the basic information within more complex structures such
as theories or models. Examples include economic principles such as comparative advan-
tage, musical forms (symphonies, folk songs), and periods of geological time.
In contrast, skills/strategies/procedures address the techniques and skills acquired
in different fields of study. Within this category, procedural knowledge deals with methods
and techniques employed in a field, such as learning how to analyze the geographic impact
on the economics of a region, learning to sing a song, or learning how to classify rocks.
It also includes metacognitive knowledge such as analytical skills related to cognition in
general as well as one’s own cognition. This subset of skills/strategies/procedures includes
skills such as knowing when to use a particular strategy for analysis, knowing how to
monitor yourself as you perform, knowing how to catch yourself when you make a mis-
take, and knowing how to work from your strengths and improve your weaknesses.
Most of your units will have both knowledge/understanding learning goals and
skills/strategies/procedures learning goals. You will be teaching your students specific
content, and you will also be teaching your students specific skills to use when manip-
ulating that content. In Chapter 3 we begin to link these two types of knowledge with
specific kinds of assessments, one of the key early steps in aligning assessments with
learning goals.
gre78720_ch02_032-065.indd Page 51 3/21/09 12:37:33 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 2 . 1 1
Examples of the Two Basic Types of Knowledge
Knowledge/Understanding (What) Skills/Strategies/Procedures (How)
Factual Conceptual Procedural Metacognitive
Knowledge Knowledge Knowledge Knowledge
Economics: Technical Economics: Principle of Economics: How to calculate Economics: Choosing the
vocabulary such as comparative advantage growth curves correct mathematical
supply and demand formula for a given
problem
Music: Musical notation such Music: Musical forms such as Music: How to sing a song; Music: When practicing,
as quarter notes, treble sonatas, symphonies, folk how to determine in knowing how to build on
clef songs which period a symphony strengths and adjust for
was written weaknesses
Geology: Types of rocks such Geology: Periods of geological Geology: How to classify rocks Geology: How to determine
as igneous, metamorphic, time and land forms where and when to get
sedimentary additional information to
draw a conclusion about
from which geological
period an artifact comes
Literacy: Letter names and Literacy: Phonemic awareness Literacy: How to read a Literacy: Self-correction
sounds; sight words passage when a word does not
make sense in context
TA B L E 2 . 1 2
A Continuum with Krathwohl’s Affective Categories and Examples of Learning Goals
Affective Receiving— Responding— Valuing—attaches Organization— Characterization—
Categories attends to participates and worth or value to brings together a behaves in a
stimuli. reacts to stimuli. stimulus (e.g., object, complex of manner that is
phenomenon, values, resolves pervasive,
situation, or differences consistent, and
behavior). between them, predictable.
and develops an
internally
consistent value
system.
Verbs for follows, listens, answers, articulates, accepts, appreciates, alters, argues, acts, demonstrates,
Affective locates, contributes, commits, defends, displays,
Learning points to, discusses, finds, cooperates, integrates, maintains,
Goals receives greets, interacts, demonstrates, modifies practices, seeks
marks, recites, devotes, praises,
states, tells selects, shares
Examples of Follows text of Participates in Selects various genres Formulates and Demonstrates
Affective poems as discussions for reading during records learning intellectual
Learning others read. of the meaning sustained silent goals in journal. curiosity in
Goals Tracks with of various reading. Argues the basis writing.
eyes the metaphors. Shares collection of for evolution. Maintains interest in
movement Uses Spanish to rocks and minerals music styles.
of the ball. answer questions with class.
in foreign
languages class.
Based on Krathwohl, D., B. Bloom, and B. Masia. 1984. Taxonomy of educational objectives: Book 2, Affective domain. Boston, MA: Allyn & Bacon; and Gronlund,
N., and S. Brookhart. 2009. Writing instructional objectives for teaching and assessment, 8th ed. Upper Saddle River, NJ: Pearson Education.
Tables of Specifications 53
Notice the potential overlap of cognitive and affective domains in this last example. In
stating their preferences in a particular artist’s style, the students attach worth (i.e.,
value) to the artist’s work and justify (i.e., evaluation) their choices based on their
understanding of artistic elements.
Psychomotor Domain
The psychomotor domain has primarily been used in physical education. But teachers
in arts-related fields, special education, and technical education also find it useful
because it can directly address perceptions and physical movements and skills that are
a part of mastering these disciplines. As you can see in Table 2.13, the categories of
behavior move from “perception” to “origination.” Each category involves increasing
the complexity of decision making related to physical actions. This continuum also
requires increasingly higher-level thinking. For example, a student would move from
“perception,” or understanding the demonstration of a single kick, which requires only
one decision, all the way to “adaptation” and “origination” of kicks during a soccer
game, which involve multiple decisions and problem-solving strategies depending on
positions of teammates and opponents, placement on the field, and other consider-
ations. In music, students in band learn to follow a melody line (i.e., mechanism)
established by lead players, who maintain a steady beat (i.e., complex overt response)
for others to follow.
We show you these various taxonomies to acquaint you with the range of frame-
works available for use in developing learning goals for your students. Cognitive, psy-
chomotor, and affective outcomes are important to consider in promoting the full
educational development of students.
TABLES OF SPECIFICATIONS
In your planning, you are likely to identify numerous learning goals that might be the
focus for learning activities and assessment in a unit. Your next challenge is to find a
way to organize these learning goals to make sure that you have appropriate coverage
of the types of knowledge and cognitive strategies. One way to organize the learning
goals that you’ve identified is to place them into a table of specifications. Thomas Table of Specification
Guskey (1996) describes a table of specifications as “a relatively brief, compact, and yet A chart that lists the test content
concise description of what we want students to learn and be able to do as a result of and specifies the number or
percentage of test items that
their involvement in a particular learning unit.” If kept concise, the table can serve as cover each content area.
an organizer for our development of a unit or our lesson plans.
In developing a table of specifications for instruction, Guskey reminds us to ask,
“What should students learn from this unit?” and “What should students be able to
do with what they learn?” Table 2.10 provides an illustration of the use of these dimen-
sions in a table of specifications for instruction. In the cells of the table, we placed
learning goals that might be relevant in a poetry unit. Placement of the learning goals
in the cells reveals that instruction in the poetry unit requires students to apply the
full range of cognitive skills—from remembering information about poetry forms to
creating poems based on the styles they learned. In addition, the learning goals are
fairly evenly distributed across the forms of knowledge.
The percentages listed with each goal indicate approximately the amount of time
to be allocated to instruction and learning activities. These percentages also guide teach-
ers as they plan the amount of coverage the learning goal should have on a test. The
percentages should be based on the importance of the learning goal as well as the time
required for students to learn the information or skill. Also, the percentages should not
54
TA B L E 2 . 1 3
gre78720_ch02_032-065.indd Page 54
Examples of Relates music Enters and exits Copies hand Prints legibly. Carves sculpture Modifies running Improvises simple
Psychomotor to a dance stage on cue. position for Speaks fluently from wood. speed for terrain in rhythmic
Learning movement. Enters into a chords on a in Spanish. Conducts an a marathon. accompaniments
Goals Moves head debate. guitar. Styles hair. orchestra. Varies strokes to using body
toward Imitates parents’ Gestures to achieve textures in percussion.
sound vocalizations. emphasize a painting.
(infant). points in
speech.
Based on Simpson, E. 1972. The classification of educational objectives in the psychomotor domain. Washington, D.C.: Gryphon House; and Gronlund, N., and S. Brookhart. 2009. Writing instructional
objectives for teaching and assessment (8th ed.). Upper Saddle River, NJ: Pearson Education.
/Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
gre78720_ch02_032-065.indd Page 55 3/21/09 12:37:34 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Tables of Specifications 55
be considered rigid time allotments for teaching the learning goal. They are guideposts
that remind you to spend the majority of instructional time on preparing students to
write poems and relatively less time on teaching the origins of each poetry form.
After completion of the table of specifications for instruction, the planning chart
that we introduced earlier can now be used with each learning goal to align the learn-
ing goals, instructional experiences, and assessments. Notice the alignment here:
As shown in Table 2.14, the learning goals have implications for teaching. In this
instance, the learning goals require the teacher to plan instruction that involves students
in learning about the characteristics of types of poetry and using those characteristics
to determine the types of poetry when new examples are offered. Notice that these two
learning goals are taught together. A teacher might consider it essential that students
are making meaning of the poetry as they determine the forms of the poems. Also
notice that the integration of these two learning goals provides a teacher with some
flexibility because the goals combined account for 40 percent of the time for the unit.
TA B L E 2 . 1 4
Planning Chart
Learning Goal Teaching Methods/Learning Experiences Assessment Tasks
must differ when students (a) read a text to learn the names of the poems and their
origins or (b) share the draft of a poem in a writer’s conference. Instructional materials
change from examples of poems in reading texts to the draft of a poem. Also, the roles
of the teacher and students are likely to change when a student is learning some factual
information about the stylistic elements versus when the student leads the discussion
in an author’s conference.
Tables of Specifications 57
TA B L E 2 . 1 5
Streamlined Table of Specifications for Planning a Test
List Items in Appropriate Column
Higher Order Cognitive
List Each Learning Goal and Percent Strategies
of the Unit It Covers Remember Understand Create
10%
5%
3. Interpret the meaning of poems. Items 7, 8, 11, 12, 15–18
20%
20%
10%
25%
10%
in textbooks and in tables on various websites. For example, we have found that
generating examples of a concept is sometimes placed at the “comprehension/under-
stand” level and sometimes at the “application/apply” level. These nuances and con-
tradictions may frustrate you if you dwell on them too much. The key requirements
are (1) you are addressing skills above the knowledge level in both your instruction
and your assessment, and (2) your learning goals, instruction, and assessment are
all aligned. We advise our students not to obsess about classifying goals or items in
the table. Instead, we advise them to be able to justify their method of classification
to themselves and to ensure that their lessons and assessments cover the range of
cognitive and knowledge levels.
gre78720_ch02_032-065.indd Page 58 3/21/09 12:37:37 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 2 . 1 6
Descriptions of Three Case Studies
Subject Area Grade Level Focus/Theme Learning Goals Forms of Assessment
Maria Mensik’s High School Utopia and The students will compare and contrast Multiple-Choice,
English Class Dystopia characteristics of a utopia and dystopia. Modified True/False,
The students will compare and contrast Performance Tasks
multiple works concerning utopia/
dystopia, paying particular attention to
the role of the government and the role
of the individual.
The students will use persuasive language
in writing and/or speaking.
Lynn McCarter’s Middle Level Solving The students will use geometric concepts Journal Prompts;
Mathematics Problems and modeling to interpret and solve Essays, Multiple-
Class Related to problems.1 Choice, Performance
Proportions The students will solve problems related Task
to similar and congruent figures.
The students will select appropriate units
and tools for measurement tasks within
problem-solving situations; determine
precision and check for reasonableness
of results.
Ebony Sanders’s Early Childhood Distinguishing The students will recognize matter in Fill-in-the-Blank,
Science Class between their daily surroundings. Performance Tasks,
Solids and The students will describe the Journal Prompts
Liquids characteristics of solids and liquids
based on their explorations.
The students will demonstrate the
appropriate steps to test an unknown
solid or liquid.
The students will use the writing process
to describe solids and liquids.
1
North Carolina Department of Public Instruction. 1998. K–8 Mathematics Competency Goals and Objectives, p. 41.
gre78720_ch02_032-065.indd Page 59 3/21/09 12:37:37 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 2 . 1 7
English Learning Goals and Content Standards
Learning Goals State Standards: Reading (R), Writing (W), Communication (C), and Research Goals (RS)
The student will compare and R3.1 Demonstrate the ability to analyze the origin and meaning of new words by using a
contrast characteristics of knowledge of culture and mythology.
a utopia and dystopia. C3.8 Demonstrate the ability to make connections between nonprint sources and his or
her prior knowledge, other sources, and the world.
The student will compare R1.2 Demonstrate the ability to make connections between a text read independently and
and contrast multiple his or her prior knowledge, other texts, and the world.
works concerning utopia/ R1.9 Demonstrate the ability to read several works on a particular topic, paraphrase the
dystopia, paying particular ideas, and synthesize them with ideas from other authors addressing the same topic.
attention to the role of the R2.9 Demonstrate the ability to present interpretations of texts by using methods such as
government and the role Socratic questioning, literature circles, class discussion, PowerPoint presentations, and
of the individual. graphic organizers.
W1.3 Demonstrate the ability to develop an extended response around a central idea, using
relevant supporting details.
W1.6.3 Demonstrate the ability to write essays, reports, articles, and proposals.
C3.6 Demonstrate the ability to compare and contrast the treatment of a given situation or
event in nonprint sources.
The student will use W2.3 Demonstrate the ability to use writing to persuade, analyze, and transact business.
persuasive language in W3.1 Demonstrate the ability to respond to texts both orally and in writing.
writing and/or speaking. C1.3 Demonstrate the ability to use oral language to inform, to explain, to persuade, and
to compare and contrast different viewpoints.
gre78720_ch02_032-065.indd Page 60 3/21/09 2:51:59 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
discussion lacks relevance to their lives. However, the learning goal to compare and
contrast multiple works dovetails with instruction that uses currently popular works as
well as classic literature. Using these broad resources offers potential for students to
make connections with their lives.
Lynn McCarter’s math class shows how the teacher uses problems based on real-
life situations to promote engagement. Students are more likely to be interested when
the learning goals are applied to situations they encounter. Also, Lynn indicates that
student performance will be documented by using a checklist providing her with a
breakdown of student learning levels to be used later for cooperative groups. Students
will also use the checklist to find areas of weakness and explore those areas further in
individual projects. Here, Lynn reminds us of our earlier statement that students begin
to learn to self-assess when provided with the learning goals for a lesson and when
given a scoring guide (e.g., a checklist) based on those learning goals. As we mentioned
in Chapter 1, the development of self-assessment, or self-governing skills, is critical to
prepare students for participation in a democracy.
A review of Ebony Sanders’s learning goals and state standards in Table 2.18
shows how to integrate subject matter, such as science concepts, and writing skills.
Notice that in addition to students learning the characteristics of solids and liquids,
Ebony also has learning goals related to students using the writing process to com-
municate their understanding of solids and liquids.
TA B L E 2 . 1 8
Learning Goals and Content Standards for Ebony Sanders’s Unit
Learning Goals Corresponding State Standard(s)
Physical Science
The students will recognize I.A.2. Compare, sort and group concrete objects according to observable properties.
matter in their daily
surroundings.
The students will describe I.A.1. Use the senses to gather information about objects or events such as size, shape, color,
the characteristics of texture, sound, position, and change (qualitative observations).
solids and liquids based IV.A.3. Objects can be described by the properties of the materials from which they are made,
on their explorations. and those properties can be used to separate or sort a group of objects or materials.
IV.A.4. Materials can exist in different states.
a. Explore and describe characteristics in solids.
b. Explore and describe characteristics in liquids.
The students will demonstrate I.B.1.b. Employ simple equipment, such as hand lenses, thermometers, and balances, to gather
the appropriate steps to test data and extend the senses.
an unknown solid or IV.A.2. Properties of matter can be measured using tools, such as rulers, balances, and
liquid. thermometers.
Writing
The students will use the 1-W1.2. Begin using prewriting strategies.
writing process to describe 1-W1.3. Demonstrate the ability to generate drafts using words and pictures that focus on a
solids and liquids. topic and that include relevant details.
1-W1.6. Demonstrate the ability to write in a variety of formats.
1-W1.6.1. Demonstrate the ability to write simple compositions, friendly letters, and expressive
and informational pieces with peer or teacher support.
gre78720_ch02_032-065.indd Page 61 3/21/09 12:37:37 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Helpful Websites 61
Our review of the case studies reinforces several points. First, revising content
standards may be necessary when using them as learning goals. Also, because the
number of standards is greater than a teacher’s time to cover them, we should consider
the importance of the learning outcome and the instructional time we allocate to it.
In addition, student engagement may be improved by the use of resources based on
relevant real-life experiences. Finally, the integration of learning goals (e.g., writing and
science observations) can “create” time for coverage of multiple subject areas.
HELPFUL WEBSITES
http://www.ccsso.org/Projects/State_Education_Indicators/Key_State_Education_Policies/3160.
cfm#S. Provides links to content standards at state department of education websites. Use
gre78720_ch02_032-065.indd Page 62 3/21/09 12:37:37 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
this site to locate your state content standards—the best place for you to begin learning
about content standards and learning goals.
http://www.ncte.org/about/over/standards/110846.htm (English language arts)
http://standards.nctm.org/document/appendix/numb.htm (Mathematics)
http://books.nap.edu/html/nses/html/6a.html (Science)
http://www.socialstudies.org/standards/strands/ (Social Studies)
Provide content standards developed at the national level.
REFERENCES
Anderson, L., and D. Krathwohl (eds.). 2001. A taxonomy for learning, teaching, and assessing: A
revision of Bloom’s taxonomy of educational objectives (Abr. Ed.). New York: Longman.
Arizona Department of Education. 2006. The social studies standard articulated by grade level.
Retrieved June 2, 2007 from http://www.ade.state.az.us/standards/sstudies/articulated/.
Bloom, B. (ed.), M. Engelhart, E. Furst, W. Hill, and D. Krathwohl. 1956. Taxonomy of
educational objectives: Handbook I: Cognitive domain. New York: David McKay Company.
California State Board of Education. 2006. Kindergarten English-language arts content standards.
Retrieved June 2, 2007 from http://www.cde.ca.gov/be/st/ss/engkindergarten.asp.
Covey, S. 1989. The seven habits of highly effective people. New York: Simon & Schuster, p. 95.
Florida Department of Education. 2005. Foreign languages grades 6–8. Retrieved June 2, 2007
from http://www.firn.edu/doe/curric/prek12/pdf/forlang6.pdf.
gre78720_ch02_032-065.indd Page 63 3/21/09 12:37:38 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
References 63
Gagné, R. 1985. The conditions of learning and theory of instruction. 4th ed. New York: Holt,
Rinehart, and Winston.
Georgia Department of Education. 2006. Grade 9 ELA standards. Retrieved June 2, 2007 from
http://www.georgiastandards.org/english.aspx.
Gronlund, N. 2003. Assessment of student achievement. 7th ed. Boston: Allyn & Bacon.
Gronlund, N., and S. Brookhart. 2009. Gronlund’s writing instructional objectives. 8th ed. Upper
Saddle River, NJ: Pearson Education.
Guskey, T. 1997. Implementing mastery learning. 2nd ed. Belmont, CA: Wadsworth Publishing
Company.
Illinois State Board of Education. 1997. Illinois learning standards for English language arts.
Retrieved June 11, 2007 from http://www.isbe.state.il.us/ils/ela/standards.htm.
International Reading Association & National Council of Teachers of English. 1996. Standards
for the English language arts. Newark, DE: International Reading Association.
Krathwohl, D., B. Bloom, and B. Masia. 1984. Taxonomy of educational objectives: Book 2,
Affective domain. Boston, MA: Allyn & Bacon.
L’Engle, M. 1962. A wrinkle in time. New York: Farrar, Straus, and Giroux.
Marzano, R. J. 2001. Designing a new taxonomy of educational objectives. Thousand Oaks, CA:
Corwin Press.
Massachusetts Department of Education. 2003. Massachusetts history and social science
curriculum framework. Retrieved June 2, 2007 from http://www.doe.mass.edu/
frameworks/current.html.
McMillan, J. 2007. Classroom assessment: Principles and practice for effective standards-based
instruction. 4th ed. Boston, MA: Allyn and Bacon.
Metfessel, N., W. Michael, and D. Kirsner. 1969. Instrumentation of Bloom’s and Krathwohl’s
taxonomies for the writing of behavioral objectives. Psychology in the Schools 6: 227–231.
National Council of Teachers of Mathematics. 2000. Principles and standards for school
mathematics. Reston, VA: Author.
New Jersey. 2004. New Jersey core curriculum content standards for visual and performing arts.
Retrieved June 2, 2007 from http://www.nj.gov/education/cccs/s1_vpa.pdf.
New York State Education Department. 1996. Learning standards for mathematics, science, and
technology. Retrieved June 2, 2007 from http://www.emsc.nysed.gov/ciai/mst/pub/
mststa5.pdf.
North Carolina Department of Public Instruction. 1998. K–8 mathematics competency goals and
objectives. Retrieved June 9, 2007 from http://community.learnnc.org/dpi/math/archives/
K8-98.pdf.
North Carolina Department of Public Instruction. 2004. Science standard course of study and
grade level competencies: K–12. Retrieved June 2, 2007 from http://www.ncpublicschools.
org/docs/curriculum/science/scos/2004/science.pdf.
Office of the Superintendent of Public Instruction. n.d. Essential academic learning requirements:
History. Olympia, Washington. Retrieved June 11, 2007, from http://www.k12.wa.us/
CurriculumInstruct/SocStudies/historyEALRs.aspx.
Ohio Department of Education. 2003. Academic content standards: K–12 fine arts. Retrieved
June 11, 2007, from http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.
aspx?page⫽3&TopicRelationID⫽336&ContentID⫽1388&Content⫽12661.
Popham, W. J. 2003. Trouble with testing: Why standards-based assessment doesn’t measure up.
American School Board Journal 190(2): 14–17.
Popham, J. 2006. Content standards: The unindicted co-conspirator. Educational Leadership,
64(1): 87–88.
Quellmalz, E. 1985. Developing reasoning skills. In J. Baron & R. Sternberg, (eds.), Teaching
thinking skills: Theory and practice, pp. 86–105. New York: Freeman.
Sandford, R., M. Ulicsak, K. Facer, and T. Rudd. 2007. Teaching with games: Using commercial
off-the-shelf computer games in formal education. Retrieved June 2, 2008, from http://www.
futurelab.org.uk/projects/teaching_with_games/research/final_report/.
Simpson, E. 1972. The classification of educational objectives in the psychomotor domain.
Washington, D.C.: Gryphon House.
gre78720_ch02_032-065.indd Page 64 3/21/09 12:37:38 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
South Carolina Department of Education. 2003a. South Carolina English language arts
curriculum standards 2002. Retrieved June 2, 2007, from http://ed.sc.gov/agency/offices/
cso/standards/ela/.
South Carolina Department of Education. 2003b. South Carolina visual and performing arts
curriculum standards 2003. Retrieved June 2, 2007, from http://ed.sc.gov/agency/offices/
cso/standards/vpa/.
South Carolina Department of Education. 2005a. South Carolina science academic standards.
Retrieved June 2, 2008, from http://ed.sc.gov/agency/offices/cso/standards/science/.
South Carolina Department of Education. 2005b. South Carolina social studies academic
standards. Retrieved June 2, 2008, from http://ed.sc.gov/agency/offices/cso/standards/ss/.
South Carolina Department of Education. 2007. South Carolina academic standards for
mathematics. Retrieved June 2, 2008, from http://ed.sc.gov/agency/offices/cso/standards/
math/.
Texas Education Agency. 2006. Chapter 111. Texas essential knowledge and skills for mathematics:
Subchapter A. Elementary. Retrieved June 2, 2007, from http://www.tea.state.tx.us/rules/
tac/chapter111/ch111a.html.
VanSledright, B. A. 2004. What does it mean to think historically . . . and how do you teach it?
Social Education 68(3): 230–233.
Virginia Department of Education. 2001. Physical education standards of learning for Virginia
Public Schools. Retrieved June 2, 2007, from http://www.pen.k12.va.us/VDOE/
Superintendent/Sols/physedk-12.pdf.
Wiggins, G., and J. McTighe. 1998. Understanding by design. Alexandria, VA: Association for
Supervision and Curriculum Development.
gre78720_ch02_032-065.indd Page 65 3/21/09 12:37:38 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
gre78720_ch03_066-093.indd Page 66 3/21/09 1:24:10 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
66
gre78720_ch03_066-093.indd Page 67 3/21/09 1:24:12 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 3
DIAGNOSTIC ASSESSMENT:
ENSURING STUDENT SUCCESS
FROM THE BEGINNING
Evaluation is creation.
–Friedrich Nietzsche
INTRODUCTION
The public and the press focus on and therefore emphasize formal standardized
large-scale state testing at the end of each year. But we believe that assessments at
the beginning and during each year are actually more important when it comes to
fostering student learning. In this chapter we address diagnostic assessments. Intro-
duced in Chapter 1, diagnostic assessments are the assessments used at the begin-
ning of the year and the beginning of units to help you understand and respond to
the wide range of needs, backgrounds, and attitudes that your students bring to
your classroom. They also show you how students think and what misconceptions
they have. You must understand the varying needs of your students so you can
provide differentiated instruction that helps to close achievement gaps. Different
students will need different strategies to profit from your classes. We chose
Nietzsche’s quotation to begin our chapter because diagnostic assessments are the
foundational evaluation methods for creating learning in your classroom.
Determining what your students know at the beginning of the year and at the
beginning of a new unit allows you to design lessons to meet their needs. We don’t
want students getting discouraged from the first day of class. As you recall, Jocelyn,
the biology teacher described in Chapter 1, discovered that none of her students’
reading skills were adequate for tackling her biology textbook. This knowledge
helped her design instruction that addressed biology. But she also addressed her
students’ need for intensive vocabulary work and encouraged them to demonstrate
their learning through means other than traditional writing tasks and tests, rather
than assuming they simply weren’t equipped to meet the class goals. Diagnostic
assessment was crucial for Jocelyn to figure out how best to teach her students and
tailor activities to their needs.
67
gre78720_ch03_066-093.indd Page 68 3/21/09 1:24:12 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
BAND
EFORE YOU BEGIN: HIGH EXPECTATIONS
BEHAVIORS THAT CONVEY THEM
In developing diagnostic assessment, you must start by holding high expectations for
your students. Even though Jocelyn’s students’ basic reading skills were not at grade level,
she designed her instruction to hold them fully accountable for the complex biology
content she would teach them. To design appropriate instruction, she had to base her
diagnostic assessment on the assumption that they could meet the learning goals.
All of us support and give lip service to high expectations, but they are often
quite difficult to internalize and implement in a classroom. Researchers have found
that some teachers unconsciously communicate differently with students, depending
on whether they believe those students are high or low achievers (Schunk et al., 2008).
These teachers convey high—or low—expectations to their students through their
behaviors and, in particular, through their actions during ongoing, informal assess-
ment activities that begin the very first day of the school year. Often teachers are not
even aware that they convey such notions to students.
Table 3.1 illustrates some of the behaviors that teachers engage in during ongoing
classroom assessment and interaction with students for whom they have low expecta-
tions. For example, if a student gives an incorrect response or no response to a teacher’s
oral question, a teacher can quickly move on to someone else, or the teacher can provide
a few seconds for the student to think, clarify the question, repeat the question, or give
the student additional information that could help make a connection more prominent.
If the teacher has low expectations for a student, the most likely response is to move
quickly on to another student. Such behavior can be particularly noticeable for students
who are learning English (Rothenberg & Fisher, 2007). Teachers don’t want to put them
TA B L E 3 . 1
Some Teacher Behaviors Related to Assessment That Communicate Low Expectations
Waiting less time for low achievers to answer a question.
Giving answers to low achievers or calling on someone else rather than trying to improve responses by giving clues or rephrasing.
Generally paying less attention to low achievers and interacting with them less frequently.
Calling on low achievers less often, or asking them only easier, nonanalytic questions.
Generally demanding less from low achievers (e.g., accepting low-quality or incorrect responses).
Not giving low achievers the benefit of the doubt as much as high achievers when grading tests or assignments.
Adapted from Brophy, J. 1998. Motivating students to learn. Copyright McGraw-Hill Companies, Inc. Used with permission.
gre78720_ch03_066-093.indd Page 69 3/21/09 1:24:12 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Before You Begin: High Expectations and Behaviors That Convey Them 69
on the spot or embarrass them, but then they end up not requiring the same level of
participation and accountability from their English learners.
In contrast, a teacher who has high expectations is more likely to probe for an
answer, clarify the question, or do something to communicate that the teacher expects
the student to be able to answer correctly. These teacher behaviors then influence the
learning process. The latter student will feel encouraged and be more likely to answer
correctly more often and thus probably will learn more.
Different teacher behavior patterns toward low and high achievers can help us
understand the connection between student performance and teacher expectations.
You can see how gradually across time, students get discouraged when they are ignored,
criticized, or skipped over. A year’s accumulation of less attention, easier work, and
more criticism has a major impact on a student’s learning. Often teachers’ perception
of students heavily influences the pupils’ perceptions of themselves. Students buy into
the teacher’s perceptions and don’t try as hard and don’t see themselves as successful
or competent. Such perceptions then lead to student behaviors that result in lower
achievement, fulfilling the teacher’s original expectations.
Figure 3.1 shows how teacher expectations could translate into different out-
comes for two groups. If teachers tend to believe a particular type of student (perhaps
the poor or minority children on the wrong side of the achievement gap) won’t do
well, their own instructional behaviors could in fact lead to lower achievement for
those less-favored groups.
Researchers have indeed uncovered a strong link between teacher expectations,
which can be formed very early in the school year, and student achievement. One early
study illustrating this point well was done with first-grade teachers (Palardy, 1969). At
the beginning of a school year, a researcher asked a group of teachers about whether
boys were likely to make as much progress in reading as girls during first grade. Ten
of these teachers felt that boys tend to do as well as girls, and ten felt that boys usually
do significantly worse than girls in reading in first grade. In this study, according to
scores on aptitude tests, all students had equivalent ability to do well. At the end of
the school year, the researcher looked at the achievement levels in reading of first-grade
girls and boys who had been students in these teachers’ classrooms. He found that the
work
girls’ achievement was no different for all teachers. The boys’ achievement was differ-
ent depending on which teacher they had. If they had a teacher with high expectations
for boys, their scores did not differ from the girls’. If they had a teacher with low
expectations for boys, their scores were significantly lower than the girls’.
Remember that these differences in teacher behaviors are not necessarily conscious
and deliberate. Teachers care about their students and want them to learn as much as they
can. But we all have some unexamined biases or stereotypes and assumptions that can
influence our behavior about certain students. For example, one African American teacher
we know was observed by her principal her first year of teaching. He marked on a seating
chart whom she called on over a 20-minute period. When he showed her the results, she
was horrified to see that she called on her European American children more often than
on her African American children, even though her class had equivalent numbers of each.
From that day forward, she has made a concerted effort to be mindful of her behaviors
toward students and has developed a strategy for calling regularly on all students.
Careful efforts by a teacher to convey high expectations through her behavior
toward her students is captured in one boy’s remark to a new student in his class (cited
by Eggen & Kauchak, 2004): “Put your hand down. Mrs. Lawless calls on all of us. She
thinks we’re all smart.” If you are making an effort to close the achievement gap, you will
not only take deliberate, conscious steps to make sure you avoid behaviors such as pro-
viding briefer and less informative answers to questions of low achievers, or rewarding
incorrect answers by low achievers (see Table 3.1) as you engage in informal assessment
practices in your classroom. You will also design your more formal diagnostic assess-
ments based on the assumption that all students will meet the class learning goals.
Low
Essay from first day of school
FIGURE 3.2 Using the Process of Triangulation to See a Pattern in Achievement for a
Student Who Did Not Score as Well on a Standardized Test as on Other Measures
Table 3.2 provides a list of sources of information that are usually available to teachers
at the beginning of the school year.
Grades
End-of-term grades can be a particularly limited source of information because they
are a fairly crude summary of a student’s performance. Most students don’t perform
consistently across all the learning goals in one semester. A grade is an average that
can obscure information needed about the skills and knowledge that students bring to
your classroom. Examining the pattern of grades across years and subjects, though,
can give you a rough idea of a student’s strengths and weaknesses.
TA B L E 3 . 2
Sources of Information Available to Teachers at the Beginning
of the School Year
Nonacademic Records Academic Records
Written comments from previous teachers Eligibility for and records from special
programs (Special Education, Gifted,
English as a Second Language)
Test. These tests are useful for comparing a student’s performance to a national
sample of students on broad knowledge across several fields such as language arts,
math, and reading. In addition, all states require achievement tests in grades 3
through 8 to determine annual yearly progress of schools for the No Child Left
Behind Act. Score interpretation for large-scale tests is explained in Chapter 11.
These scores can give you a picture of which of your students are in the typical range
in the content areas tested.
Scores from any of these large-scale tests are not usually useful for targeting
specific skills that children need to work on, unless wide discrepancies appear. Stan-
dardized tests are useful for giving you clues about additional information you might
want to gather yourself. Remember that most published standardized achievement tests
provide a panoramic view, with only one or two items examining each skill. To get the
close-up snapshot needed for designing specific instruction, you need to gather more
detailed information yourself.
School Records
Other schools devise forms such as the one from Cotton Belt Elementary School in
Figure 3.3. These forms are designed to give helpful information gathered from a cur-
rent teacher’s experience for use by the next year’s teacher. These forms have academic
and social information that offer clues about students. To avoid bias, remember the
need for triangulation as you gather this information and check it against other sources
about the student.
Special Education The federal government requires public schools to provide a free
and appropriate education to students with disabilities who qualify under the Indi-
viduals with Disabilities Education Act (IDEA). Each student has an Individualized
Education Program (IEP) tailored to that student’s needs. All teachers who work with
that student must be aware of and implement that IEP. Many students have accom-
modations for both instruction (e.g., preferential seating, large-print texts) and assess-
ment (e.g., extended time for assessments, oral administration of an assessment) that
each teacher must honor.
gre78720_ch03_066-093.indd Page 73 3/21/09 1:24:13 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Academic Performance
Please check those that apply for the UPCOMING school year.
Recommended for Reading Assistance Qualifies for ESL
Qualifies for speech therapy Qualifies for gifted
Qualifies for Special Services–Explain
FIGURE 3.3 Student Data Form from Cotton Belt Elementary School
English as a Second Language Programs for students who are learning English as a
Second Language (ESL) vary from state to state and even from district to district. Often
these students have some time each day with an ESL instructor, but they are also
expected to function in their regular classes. With students who are learning English,
your nonverbal cues and gestures are very important for the learning process. Know-
ing you have ESL students will allow you to plan to amplify this dimension of your
teaching. Getting a clear sense of the level of these students’ English language skills in
consultation with the ESL instructor will also allow you to plan your instruction and
assessment to accommodate their needs.
gre78720_ch03_066-093.indd Page 74 3/21/09 1:24:13 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
invest in explicitly teaching social skills, such as showing respect for others, sharing
materials, and handling disagreements. Finally, these activities also help you get to
know your students’ interests and preferences, which will come in handy as you design
your future instruction and your assessment.
TA B L E 3 . 3
Questions Systematic Preassessment Can Address
What is students’ current level of knowledge about this unit? Do they have common
misconceptions?
Have all students mastered the foundational skills and knowledge necessary for this unit?
introverted may never raise their hand. Students who take more time to process infor-
mation may not have a chance to get an answer out fast enough. As a result, most
teachers also use more systematic ways to gather information about student under-
standing before a unit begins to make sure they get a clear picture of all of their
children’s understanding.
Have students perform the skill/procedure and rate Interview students orally or have them write the answers to
the questions and score
will probably focus on the most central or important for your preassessment. For
example, suppose you have one learning goal: the students will be able to distinguish
the five cultural regions of the United States, including language, customs, culture, and
geography. For the preassessment you might want to determine whether students have
any understanding of cultural regions and can distinguish among regions in any way.
You wouldn’t want to give a long preassessment of 50 multiple-choice questions with
minute details about each region’s specific customs, culture, and geography, especially
if you suspect your students don’t even know what a cultural region is.
If you want to know whether your students can use their knowledge to perform
a desired skill, you will be assessing the skill or procedure that you expect them to
master. Asking math students to solve a specific problem taps this kind of skill. For
example, To the nearest hundred, calculate the sum of 87,500, 912 and 92,431,331. Solv-
ing this problem correctly will show the teacher that the students can use their under-
standing of place value in a mathematical procedure.
Measuring Knowledge/Understanding
If you are measuring knowledge and understanding, you can have students answer
questions orally (if you note their answers on a checklist) or on paper. Usually two or
three questions per learning goal are sufficient, especially when you are quite certain
that students are not likely to have encountered the content of the unit before, such as
in advanced math or physics.
In returning to the earlier example, let’s suppose you are getting ready to
preassess the level of knowledge students have of your learning goal on the five
regions of the United States. You might give students a map of the United States
and ask them to color the five regions with different colors and then, at the bottom,
list all the things they can think of that each region is known for. They could also
briefly list any similarities or differences they see across regions. Such an assessment
is brief, covers the key content, and allows students who may already be familiar
with culture, geography, or customs to demonstrate their specific knowledge. It also
allows students who don’t know much to avoid suffering through numerous detailed
questions about culture, geography, and customs. Other examples of brief, system-
atic preassessments for knowledge and understanding appear in Table 3.4. Notice
that each of the measures goes beyond rote knowledge to address the “comprehen-
sion” and “analysis” levels of Bloom’s taxonomy. Using higher order questions allows
the teacher to get a sense of the way students think about the topic and provides
a challenge for all students. It also enables the teacher to access multiple levels of
understanding. Ideally, no student should get all of the items correct or all of the
items wrong (Chapman & King, 2005).
Measuring Skills/Strategies/Procedures
If your learning goal requires students to perform skills, strategies, or procedures,
before instruction begins you need to set up a method for observing them perform
the key skills or a representative subset of skills. You then rate their level of perfor-
mance with a scoring guide. Measuring skills is crucial in disciplines where most learn-
ing is centered on performing specific skills, such as physical education, art, music, or
dance. If you decide on one or more basic skills or procedures, you will design a
checklist or other scoring guide that captures the crucial differences in student levels.
For example, if a physical education teacher is teaching second graders how to catch
a ball, she will have them take turns throwing and catching a ball a few times. As she
scans the group, she will note the children’s skill level, starting with those who don’t
have the skill at all. A scoring chart that lists the three levels of skill down one side
and the names of the students across the top (as in Table 3.5) is useful for completing
this task quickly.
This activity will help the teacher determine which aspects of the catching pat-
tern (e.g., arm position) need to be worked on most with the students. In addition, if
she finds that many students already have a mature catching pattern, she will revise
her goals for the unit and move on to a more difficult skill. Table 3.6 shows several
examples of diagnostic measures for key skills and procedures.
gre78720_ch03_066-093.indd Page 79 3/21/09 1:24:14 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 3 . 4
Examples of Knowledge/Understanding Diagnostic Measures Before Instructional Units
Learning Goal Diagnostic Measure Sample Scoring
Social studies: Student will be able to Give students a U.S. map and ask them 1 point for each correct region, 1 point
distinguish the five cultural regions to color the five cultural regions. Then for each item associated correctly
of the United States, including list things associated with each region with a region, up to 5. 1 point for
language, customs, culture, and at the bottom, and list similarities or each similarity or difference up to 2.
geography. differences between regions.
Algebra: Student will be able to Ask students to describe the difference Develop a scoring guide that lists the
understand and represent relations between a relation and a function and key distinctions and misconceptions.
and functions in math. give an example of each. Score each answer on a 1 to 4-point
scale based on the list.
Secondary English: Student will be able Have students write the definition of 1 point for each correct definition,
to compare and contrast utopias and utopia and dystopia. Then provide any 1 point for each example.
dystopias in literature. examples they can think of from
previously read literature.
Music: Student will be able to explain Ask students to listen to two different 1 point for each distinction (e.g.,
distinguishing characteristics of pieces of music and write down rhythm, melody, harmony,
musical genres. distinctions between them. expression, and timbre).
Vocabulary for multiple subjects: Provide students a list of vocabulary 1 point for each correctly matched
Students will be able to understand words relevant to the unit and ask word.
and use key terms of the discipline them to match words with their
in writing and speaking. definitions.
TA B L E 3 . 5
Scoring Guide for Catching a Ball
Level Student 1 Student 2 Student 3 Student 4 Student 5
ONE
• Arms are outstretched, elbows extended and palms upward.
• Ball is contacted with arms and elbows flexed.
• Object is trapped against the body.
TWO
• Arms are in front of body, elbows slightly flexed.
• Arms encircle the ball against the chest.
• Hands, arms hold the ball to chest.
THREE
• Arms are slightly ahead of body, elbows flexed.
• Ball is contacted with hands and is grasped with fingers.
• Palms are adjusted to size and flight of ball.
From Chepko, S.F., and R.K. Arnold. 2000. Guidelines for physical education programs: Standards, objectives, and assessments for Grades K–12. New York: Allyn &
Bacon. Reprinted by permission of Pearson Education.
gre78720_ch03_066-093.indd Page 80 3/21/09 1:24:14 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 3 . 6
Examples of Skills/Strategies/Procedures Diagnostic Measures Before Instructional Units
Learning Goal Diagnostic Measure Sample Scoring
Physical education: Student will be Have students throw the ball in pairs. Check off their skill level on a scoring
able to execute a mature overhand guide with elements of a good throw
throwing pattern. listed.
Music: Student will be able to maintain Have students listen to recorded song Use the class list to check off each
a steady beat in 44 time. and clap a steady beat. student who maintains a steady beat
for four or more measures.
Theater: Student will be able to write a Provide a paragraph describing a new Develop a scoring guide that addresses
script for a play based on personal play. Have students develop the first the level of quality of the dialogue
experience, heritage, imagination, five lines of dialogue for the opening in reference to the two elements.
literature, and history. scene based on two elements (e.g.,
personal experience and literature) of
the learning goal.
Elementary language arts: Student will Have students take turns reading aloud Teacher circulates and notes number
be able to read grade-level material to a partner. and kinds of miscues per sentence
fluently. on clipboard chart.
TA B L E 3 . 7
Preassessment Information on Five Cultural Regions Ordered
from Least Knowledgeable to Most Knowledgeable Student
Regions Similarities Differences
Indicated Associations Between Between
on Map? with Region? Regions? Regions?
Student 100% ⴝ 5/5 100% ⴝ 5/5 100% ⴝ 2/2 100% ⴝ 2/2
1 0% 0% 0% 0%
2 0% 0% 0% 0%
3 0% 0% 0% 0%
4 20% 0% 0% 0%
5 20% 20% 0% 0%
6 40% 20% 0% 0%
7 40% 20% 0% 0%
8 60% 40% 0% 0%
9 80% 40% 0% 0%
They are only a way for you to get a rough picture of where the members of your class
fall in relation to the learning goals you will be addressing. These scores are just an
estimate to help you get a sense of where you will probably need to put the most energy
and instructional time on the unit.
We recommend assigning scores based on percentage correct. You can interpret
a score of 25 percent more quickly than a score of 3 out of 12 or 6 out of 24. Table 3.7
shows the preassessment information in percentages from one class of second graders
related to the five cultural regions learning goal, arranged from lowest to highest scores.
You can see from looking at the first question in the table that the last three students
have an emerging factual understanding of the boundaries of U.S. cultural regions, but
the first three students in the table appear to have no understanding of this most basic
element of the unit.
Furthermore, these first three students had 0 percent correct across all items.
Seeing 0 percent correct across all questions should function as a red flag for you.
Students who attain a score of 0 percent on a preassessment need additional preas-
sessment attention. If students have no prior knowledge or skill, the learning goal,
as it stands, may be too difficult for them. At this point, you need to drop back to
check whether they have the prior knowledge and skills needed for the unit you are
preparing. For example, in the unit on the five geographic regions, you would want
to check on whether these three students have worked with maps before and whether
they understand how maps represent places. You would want to find out if they
understand the relationship between individual states, regions, and the United States.
You would want to ask them questions about what they know about their own state
and region. If they have only recently arrived in the United States, you may need to
gre78720_ch03_066-093.indd Page 82 3/21/09 1:24:15 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
provide additional background information before they are ready for the learning
goals associated with this unit.
In Table 3.7 you will also note that all students except Student 10 had difficulty
comparing and contrasting regions, even if they showed some basic knowledge about
region facts. This finding should let you know that you need to focus on higher-level
thinking skills, such as analysis, as you teach this unit. Encouraging comparisons
across regions as you study each new region could help your students develop facility
with analysis.
For students who perform significantly better than others on the preassess-
ment, such as Student 10, you need to make sure that they are not bored and have
new challenges. You may decide to differentiate activities to include more depth or
breadth. For example, with the regions unit, Student 10 will need different tasks (not
just more tasks) than some of the other students. This student could be given tasks
that emphasize higher-level thinking skills and could go beyond the grade-level
learning goals, such as analyzing how the geography of each region influences its
culture and economy.
TA B L E 3 . 8
Three Rules to Guide Your Diagnostic Classroom Assessment
Hold high expectations for all your students and engage in assessment behaviors that
convey high expectations from the first day of school.
Choose brief measures that focus on the most important elements of your learning goals,
including higher-order thinking, before beginning a unit.
Rely most on the information you gather systematically yourself across time.
gre78720_ch03_066-093.indd Page 83 3/21/09 1:24:15 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
lengthy pretests. You are not conducting a research study where you are trying to keep
pre- and post-tests perfectly consistent and where you are controlling all variables.
Instead, you are collecting some basic information to help you plan your unit and to
be able to document what your students know and don’t know beforehand. After the
unit, you can do more extensive assessment to demonstrate how much they have
learned, using your preassessment as baseline information.
Because these measures must be brief, you want to design them economically so
that a broad range of knowledge and skill levels can be demonstrated by students. You
want to make sure you are getting at higher levels of Bloom’s taxonomy, as well as
assessing the most basic principles. If your unit focuses on both knowledge and skills,
be sure to find ways to measure both, and not just one or the other. For example, some
of the best math teachers tell us that students can work lots of problems using a par-
ticular algorithm, but they have a much harder time with the more basic understand-
ing of knowing when that procedure should and should not be used.
Third, rely most on the information you gather systematically yourself across time.
You should weight most heavily the information you collect that is current and observ-
able. Last year’s information can be outdated or erroneous unless you can detect strong
patterns through triangulation of different sources. Because your fleeting impressions
can be distorted, erroneous, or forgotten if they don’t conform to your first impres-
sions, you should gather information in a systematic way. Writing samples, documen-
tation of contributions to discussion, pretests, early homework, and journal entries are
all useful sources of information.
TA B L E 3 . 9
Accommodation Considerations for Diagnostic Assessments
Issue Type of Accommodation
Difficulty focusing • Organize the assessment content (e.g., graphic organizers, diagrams,
attention checklists)
• Allow completion one step at a time
• Limit time for questions, tasks
• Increase novelty of assessment tasks
Literacy skills below • Use language without complex syntax
those of typical • Check understanding of specific vocabulary needed for the unit and
peers (e.g., offer opportunities for repeated access
learning • Use visual cues, questions allowing short answers
disability) • Use oral more than written questions
• Provide nontraditional opportunities to demonstrate understanding,
such as diagrams, concept maps, drawings
Lack of familiarity • Provide opportunities for multiple types of formats beyond teacher
with school questioning
culture • Determine extent of knowledge of the “hidden rules” of school
culture (e.g., how to ask questions, student and teacher roles)
participation strong. This last issue is more relevant to ongoing instruction and will
be addressed in Chapter 4.
We have listed in Table 3.9 some considerations for accommodations in diag-
nostic assessments that can help address these considerations for students. We have
organized them by six common issues of concern you are likely to encounter among
your students.
TA B L E 3 . 1 0
Background Information Helpful in Teaching Students Whose First
Language Is Not English
What language is spoken at home?
How long has student been exposed to English at school? In other contexts?
B O X 3 . 1 E X A M P L E O F A M A T H Q U E S T I O N
B E F O R E A N D A F T E R M O D I F I C A T I O N S F O R
E N G L I S H L A N G U A G E L E A R N E R S
Before After
Patty just received a letter in the mail telling about a new promotion with stuffed A class has 79 stars.
animals. When Patty has collected and shown proof of owning 125 stuffed ani- They need 125 stars.
mals, she will receive the new Million Dollar Bear free. Patty has 79 animals right How many more stars do they need?
now. Write an equation to show how many more animals Patty will need to col- Choose the correct equation.
lect to get her free Million Dollar Bear.
A. ⫺ 125 ⫽ 79
A. ⫺ 125 ⫽ 79 B. 79 ⫹ ⫽ 125
B. 79 ⫹ ⫽ 125 C. 79 ⫺ ⫽ 125
C. 79 ⫺ ⫽ 125 D. 125 ⫹ 79 ⫽
D. 125 ⫹ 79 ⫽
From Kopriva, R. 2008. Improving testing for English Language Learners. New York: Routledge. Reprinted with permission.
and skills needed to address the learning goal, but you simplify the sentence structure
and vocabulary, you eliminate idiomatic expressions, and you avoid hypothetical state-
ments. See Box 3.1 for an example.
Because each discipline has specific vocabulary, you want to be sure this vocabu-
lary is preassessed and then focused on and learned early in any unit by your English
language learners, and by your other students as well. Using as many concrete visual cues
and prompts as possible for accurate preassessment as well as during later instruction is
helpful (Haynes, 2007). Charts and drawings of scientific processes such as those devel-
oped for Jocelyn’s biology class can help students grasp and convey concepts that they
may not be able to describe verbally. In math, we often see students using number lines
taped to their desks. Many teachers also have students make a representational drawing
of a problem before they start to solve it. In social studies, maps and globes are helpful,
as well as pictures and artifacts. In language arts classes, charts with step-by-step proce-
dures for common writing tasks can be designed by students and posted for future use.
Such posters can serve as an indicator of their understanding of the process.
Another consideration is to limit use of extended writing in demonstrating
understanding. When ESL students write in English, they have to use a good bit of
their attention on mechanics and grammar, limiting their ability to attend to higher-
level skills related to the content. Thus, an extended writing task for a diagnostic assess-
ment might impede these students’ demonstration of their learning and its application.
Offering opportunities to brainstorm in groups or individually, or to make outlines
first, can help make a writing task less daunting. These accommodations help you
access students’ understanding of the content apart from their understanding of the
English language. Such strategies will give you a more accurate sense of the student,
which will help you provide better feedback and instruction.
higher-order thinking skills that go beyond the basic content of the learning goals. If
you don’t allow for a wide range, they may get a perfect score without your learning
the full extent of their knowledge. Researchers call this a ceiling effect. A ceiling effect Ceiling Effect
occurs when a student attains the maximum score, or “ceiling,” for that assessment. When a student attains the maximum
For example, if a student had a perfect score on the five cultural regions preassessment, score, or “ceiling,” for an assessment,
thus preventing appraisal of the
you would know that they knew all of that content, but you wouldn’t know what else full extent of the student’s
they might know and where your instruction for them should begin. knowledge.
TA B L E 3 . 1 1
Questions About Typical Classroom Behavior Illustrating Possible
Cultural Differences
1. What is the typical purpose and format of questions and answers?
2. What are the typical student and teacher roles?
3. Is cooperation valued over competition among students?
4. What are typical gender roles?
classroom behavior related to assessment are listed in Table 3.11, based on work by
Dunlap and Weisman (2006) and Kusimo et al. (2000).
Some divergent traditions exist around teacher questioning, a common diag-
nostic assessment practice in the United States, but not in all cultures (Kusimo et
al., 2000). For example, in some cultures, adults will never use the pedagogical tech-
nique of asking children direct questions about something the adult obviously knows.
This common learning strategy in our culture seems silly to them—why would you
waste your time asking a question you already know the answer to? Other cultures,
such as some American Indian groups, are never expected to demonstrate skills
publicly until they are completely mastered. This assumption could interfere with a
teacher’s desire to engage students in public conversation to detect the extent of their
prior knowledge.
In some cultures, people are in the habit of pausing a few seconds when answer-
ing an oral question. This pause is meant to show the person is offering respect to the
questioner by considering the question carefully. In our culture, the teacher faced with
the pause may move quickly on to another student, assuming that the first doesn’t
know the answer. Finally, among some groups, if questions are phrased implying a
choice (e.g., “Would you like to show us a cartwheel?”), students will assume they can
refuse. They are more accustomed to direct commands.
Students also may not know how to explicitly ask questions as a method for
accessing information, which can be another important focus for diagnostic assess-
ment. Students who have acquired this skill tend to do better academically than stu-
dents who have not (Payne, 2008). This skill may be related to students’ lack of
familiarity with the less casual and more formal language typically used in school,
which can put them at a disadvantage.
Beyond questioning patterns and formality of language, other cultural customs
may involve different roles for learners (Kusimo et al., 2000). In some cultures, students
are taught to be respectful and quiet, so you may have difficulty getting them to par-
ticipate in discussions or to venture an opinion of their own during informal diagnos-
tic assessments. Some cultures, such as those in Mexico and Central America, are more
oriented toward the group’s goals, whereas the dominant culture in the United States
is more oriented toward individual success. The more group-oriented cultures foster
interdependence and working together, and so students from these cultures may be
quite comfortable with cooperative group projects and activities. Using these students’
strength—the ability to work together—can foster their learning and help develop a
positive classroom atmosphere. Such students may not, however, be as comfortable
with activities that emphasize individual competition or individual assessment. For
example, some students will purposely answer a question wrong if a peer answered it
wrong so the peer will not be embarrassed.
gre78720_ch03_066-093.indd Page 89 3/21/09 1:24:16 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 3 . 1 2
Preassessment Results for One Learning Goal for Unit on
Characteristics of Liquids and Solids
Pre-unit assessment results for learning goal Students will take appropriate steps to test
an unknown liquid or solid
Percent correct Percent correct
on Solids on Liquids Total percent correct
(6 possible) (5 possible) (11 possible)
Student 8 0% 0% 0%
Student 10 0% 0% 0%
Student 11 0% 0% 0%
Student 15 0% 0% 0%
Student 16 0% 20% 9%
Student 12 17% 0% 9%
Student 5 17% 0% 9%
You might also notice that four students did not describe either object with any
words. As we have pointed out, when students cannot complete any element of the
preassessment, you should further assess the knowledge and skills that you assume
they should have mastered earlier in order to be ready for this unit. In a small group,
Ebony worked with these four students to learn more about their level of understanding.
She used another “feely box” task in which every student put a hand in the box in
turn, and they talked about what they felt so she could see how different children
observed differently. She also had each one write one descriptive word after the discus-
sion. She found that two of the students lagged behind typical peers in vocabulary
because they were not native speakers of English, and the two others had great diffi-
culty with writing. This information helped Ebony become sensitive to these students’
needs as she designed her lessons. It also alerted her that they would require additional
accommodations and instruction in these areas.
gre78720_ch03_066-093.indd Page 91 3/21/09 1:24:17 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
6. You have given a preassessment to your class, and the results are as follows:
1 0% 0% 0%
2 0% 0% 0%
4 20% 0% 80%
Describe in words what the table tells you in terms of what your next steps
should be. How might these results lead you to differentiate your instruction?
7. Describe accommodations related to assessment that you have seen teachers
implement for each of the groups in Table 3.9. Were these accommodations
effective? Why or why not? Have you seen other types of accommodations for
other types of issues students bring to the classroom?
8. Using Table 3.3, critique Ebony’s diagnostic assessment process. What did she do
well? How specifically might she have improved her diagnostic assessment?
9. Analyze the extent to which Ebony followed each of the four steps in designing
brief pre-unit diagnostic assessments described in Figure 3.4.
HELPFUL WEBSITES
http://www.cast.org
This website for the Center for Applied Special Technology (CAST) provides many resources for
helping you modify your instruction and assessment using universal design for learning
(UDL) principles. You can find lesson builders, UDL guidelines, and excerpts from
Teaching every student in the digital age, as well as reports and case studies detailing
effective classroom practices.
http://www.cfa.harvard.edu/smgphp/mosart/about_mosart.html
MOSART (Misconceptions-Oriented Standards-based Assessment Resources for Teachers) was
funded by the National Science Foundation to provide assessment support for teachers.
The website provides items linked to the K–12 science curriculum as well as a tutorial on
how to use and interpret them. These items can be particularly useful for pre-unit
assessment because they focus on common misconceptions.
REFERENCES
CAST. 2008. Universal design for learning guidelines version 1.0. Wakefield, MA: Author. Retrieved
April 14 from http://www.cast.org/publications/UDLguidelines/UDL_Guidelines_
v1.0-Organizer.pdf.
gre78720_ch03_066-093.indd Page 93 3/21/09 1:24:17 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
References 93
Chapman, C., and R. King. 2005. Differentiated assessment strategies. Thousand Oaks, CA:
Corwin Press.
Dunlap, C. A., and E. M. Weisman. 2006. Helping English language learners succeed. Huntington
Beach, CA: Shell Educational Publishing.
Eggen, P., and D. Kauchak. 2004. Educational psychology: Windows on classrooms. 6th ed. Upper
Saddle River, NJ: Pearson.
Haynes, J. 2007. Getting started with English language learners. Alexandria, VA: ASCD.
Kusimo, P., M. Ritter, K. Busick, C. Ferguson, E. Trumbull, and G. Solano-Flores. 2000. Making
assessment work for everyone: How to build on student strengths. Southwest Educational
Development Laboratory. Retrieved June 19, 2007, from http://www.sedl.org/pubs/
catalog/items/t105/assessment-full.pdf.
Moon, T. 2005. The role of assessment in differentiation. Theory into Practice 44(3): 226–233.
Palardy, J. M. 1969. What teachers believe—What children achieve. The Elementary School
Journal 69: 370–374.
Payne, R. 2003. A framework for understanding poverty. 3rd ed. Highlands, TX: aha! Process, Inc.
Payne, R. 2008. Nine powerful practices. Educational Leadership 65(7): 48–52.
Rakow, S. 2007. All means all: Classrooms that work for advanced learners. National Middle
School Association Middle Ground 11(1): 10–12.
Rose, D. H., and A. Meyer. 2002. Teaching every student in the digital age: Universal design for
learning. Alexandria, VA: Association for Supervision and Curriculum Development.
Rothenberg, C., and D. Fisher. 2007. Teaching English language learners: A differentiated
approach. Upper Saddle River, NJ: Pearson.
Schunk, D., P. Pintrich, and J. Meece. 2008. Motivation in education. 3rd ed. Upper Saddle River,
NJ: Pearson.
gre78720_ch04_094-119.indd Page 94 3/21/09 1:38:43 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
94
gre78720_ch04_094-119.indd Page 95 3/21/09 1:38:46 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 4
FORMATIVE ASSESSMENT:
ONGOING ASSESSMENT TO
PROMOTE STUDENT SUCCESS
. . . students and teachers [must begin] to look to assessment as a source of insight
and help instead of an occasion for meting out rewards and punishments.
–Lorrie Shepard (2000)
INTRODUCTION
As a teacher preparing students to participate in our democratic society, you may find
this chapter more directly helpful than any other in this text. Formative assessment,
as we discussed in Chapter 1, is the monitoring of student progress during instruction
that includes feedback and opportunities to improve. These frequent and more infor-
mal assessments help you get a picture of the changing knowledge, skills, and strate-
gies of your students, and they show you how they are thinking and learning. This
understanding then allows you to offer tailored feedback to students, permitting them
to close the gap between where they are and where they need to be. Lorrie Shepard,
whose quotation opens this chapter, cites research suggesting that the formative
assessment strategies that we describe in this chapter can improve student achieve-
ment as effectively as or more effectively than many other traditional academic interven-
tions, such as one-on-one tutoring (Shepard, 2005). In particular, a groundbreaking
review by Black and Wiliam (1998) demonstrated that formative assessment can
dramatically increase student achievement, especially among lower achievers.
First, formative assessment can help you provide equal access to educational
opportunities for your students (Guskey, 2007). After your diagnostic assessments have
assisted you in differentiating appropriate instruction for students at all levels in your
classroom, formative assessment helps you check on student progress and provide mid-
course correction so that students see exactly how to close the gap between where they
are and where they need to reach to master each learning goal. This process is particu-
larly helpful for closing the achievement gap because it enables all students to do what
high-achieving students typically do on their own. Susan Brookhart (2001) found that
high achievers naturally search for what they can learn from the assessments in which
they participate. Lower-achieving students often need more help from the teacher in
making these connections to enhance their learning and increase achievement. Forma-
tive assessment is a tool that provides such assistance (Stiggins, 2008).
Second, formative assessment helps you promote self-governing skills for demo-
cratic participation. Many traditional assessments tell you only which students your
instruction was effective for, and the rest are no better off. Formative assessment lets you
take students wherever they start, and then aids them all in making progress and getting
smarter. Through carefully crafted feedback and opportunities for improvement, these
practices will enhance the development of your students’ self-governing skills. Self-
governance comes into play when students use this information to direct their attention
to the concepts and skills they next need to attain to close the gap between where they
are and where they need to be. These efforts eventually lead to independence of thought
as students internalize standards and depend less upon the judgment of others.
Third, formative assessment is a key to helping your students develop critical-
thinking skills. Formative assessment, when used correctly, requires higher skills, such
as application, analysis, and synthesis from Bloom’s taxonomy (Guskey, 2007; Shepard,
2008). As your students begin to look at their work in light of the standards you show
them and the comments you make about their work, they must actively struggle with
applying that information to improve their work. This process involves metacognition, a
crucial set of skills described in Chapter 1 and included as a type of knowledge in the
revised Bloom’s taxonomy (Anderson & Krathwohl, 2001). Formative assessment by its
nature requires the metacognitive skills of developing awareness of and monitoring steps
in problem solving or creation, offering estimates of whether answers make sense, decid-
ing what else one needs to know, and developing judgment skills to evaluate work.
To understand how formative assessment can fulfill all these functions, we will
examine its characteristics and how it works. Especially important is distinguishing
formative assessment from the typical ways that assessment operates in classrooms
(McMillan, 2007). The following sections of the chapter address each of the six key
elements of formative assessment, which we describe as specific actions that teachers
take. We will also contrast these actions with many of the assessment practices we all
grew up with. Table 4.1 illustrates these elements as teacher actions.
TA B L E 4 . 1
Key Elements of Formative Assessment as Teacher Actions
Element Teacher Action
2 Provide formative tasks involving understanding and application more often than
rote memorization.
3 Give feedback to students providing information on how to close the gap between
where they are and the evaluation standards for which they are aiming.
5 Offer students an opportunity to close the gap between where they are and the
evaluation standards.
develop, with the help of their teachers, internal standards of quality that allow them
to self-assess. We want to help students gradually discard the idea that only teachers
can or should judge the quality of their work. The first step in doing this is to ensure
that students know what they are aiming for.
A number of school districts are taking this idea very seriously. They require
their teachers to write the learning goal for each lesson on the board at the front of
the class before the lesson starts. This practice alone, however, doesn’t give students
access to the elements that define a quality performance or the criteria that must be
met for the learning goal to be achieved. Teachers must do more to communicate these
often complex aspects of performance.
One way to communicate the elements required in a performance or assignment
is to provide several positive examples of the final products or outcomes, and then
work through them with the class using your scoring guide. A similar task involves
having the class work together to improve an anonymous and less-than-perfect sample
of an assignment. When designing the scoring guide for specific assignments related
to the learning goal, some teachers also have students take part in defining with them
what each level of performance looks like in concrete, clear language that students
provide. Processes such as these take time, but they pay off for both students and
teachers. One key benefit is that when students have a clear understanding of the
criteria used for evaluation, they see the evaluation process as fair.
Ethics Alert: When you use samples of student work, students must not recognize the au-
thors of these examples, or you are violating confidentiality. Make sure all identifying infor-
mation has been removed.
This semester, one of our students provided an example of the unfairness that
can result when scoring criteria are left ambiguous. In high school he had participated
in chorus for four years. The chorus teacher had one grading criterion called “par-
ticipation” related to her goal of enhancing students’ love of music. But she never
explained to the students what she expected from them. Every term this student
gre78720_ch04_094-119.indd Page 98 3/21/09 1:38:46 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
received a low participation grade and assumed the teacher didn’t like him and was
just partial to a few favorite students. Based on this assumption, he felt it would be
fruitless to ask what counted as good participation. This teacher missed an opportunity
to be perceived as fair and impartial by laying out specific criteria that would count
for participation related to her learning goal of creating lifelong music lovers. More
students are willing to rise to a challenge if they know what it takes to be successful.
When students learn to focus their efforts on the learning goals, another benefit
is increased autonomy. Students no longer need to rely solely on the teacher’s judgment
of their work, and they can start to develop confidence in their own understanding of
the definition of success they are aiming for. They can learn to see critiques of their
work as information about how to improve toward their own goal. Although the
teacher is still ultimately responsible for guiding students and providing feedback, the
power of the teacher as authority becomes shared with students.
Spending time on evaluation standards sometimes seems like a waste of time if
the teacher already believes the assignment and the standards for scoring it are con-
crete and clear. But don’t be too quick to make this decision on your own. In Table 4.2
you see a scoring guide that one high school teacher recently passed out to her class
for a culminating assignment on a unit related to analysis of newspaper articles. The
assignment required them, in a five-minute speech, to choose the two most important
articles they had read during the unit, to describe how they defined “most important,”
and to describe one similarity and one difference between the two articles. She was
ready to move on when hands started going up. After all of their previous discussions,
she was surprised that the students wanted to know how to define “important.” She
turned the question back to them, and the class had a useful 10-minute discussion on
the variety of definitions of important, and several ways importance could be deter-
mined. She reported that students were intrigued with the range of possibilities and
came up with quite varied presentations as a result of this discussion.
TA B L E 4 . 2
Rubric for Grading Brief Oral Presentation
Scoring for Grading Brief Presentation 0 .5 1
produce on demand, and they are not usually challenging or difficult to master. One
of our students had a teacher who gave back a multiple-choice exam so students could
see right and wrong answers, and called this formative assessment. But true formative
assessment requires spending time in dialogue between teachers and students on the
gray areas (like what does “important” really mean when you are choosing an impor-
tant article?), not the black and white ones. With formative assessment, teachers need
to work with knowledge and skills that take practice and that are less accessible and
more elusive to students, not simply cut-and-dried facts.
Formative assessment tasks can range from informal questions and discussions
during class time to more complex assignments, such as term papers. Knowing when
to use a particular strategy to solve a problem and when not to, learning what types of
writing are persuasive and why, and understanding the difference between good and
bad composition in a painting are just a few examples. In fact, almost any assignment
can be used for formative assessment if it includes the teacher actions in Table 4.1. Some
examples of typical formative assessment tasks used by teachers appear in Table 4.3. In
the following subsections of the chapter we explore several kinds of tasks for formative
assessment in more detail.
TA B L E 4 . 3
Examples of Formative Assessment Tasks
Oral questions asking for student reasoning Assign homework asking students to write
behind a position on a political issue. three examples of persuasive techniques
discussed in class that they see in
television ads that evening.
Informal observations of quality of student In theater class, have students observe and
gymnastic routines. critique a video of their most recent public
performance.
Ungraded quiz covering that day’s biology As students solve math problems, ask them
lecture on organelles. to explain how they got their answer and
why it is a reasonable answer.
Authors’ chair where students share their Band students record themselves individually
writing and receive comments from playing a piece and have themselves and a
students and teacher. peer look for two musical strengths and
target one area for improvement.
Quick writes, or brief written answers to In an art class, have students write a brief
questions at the end of class such as, reflection about a work in progress or
“What was the most important idea you use a scoring guide to analyze aspects of
learned today?” or “Describe which the piece they want to improve.
aspects of today’s lesson were particularly
difficult and list questions you still have.”
In ESL class or class with English language Hold individual conferences with students
learners, audio record a conversation to check progress on a major paper or
designed to reach a decision or solve a project. (See the Tracie Clinton example
problem. Have students in the group in Chapter 1.)
evaluate the performance and discuss
communication pitfalls.
gre78720_ch04_094-119.indd Page 100 3/21/09 1:38:47 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
B O X 4 . 1 S A M P L E Q U E S T I O N S
T O E N C O U R A G E H I G H E R
L E V E L T H I N K I N G
Quality Questions
Comprehension (Understand):
Formative Assessment: The Essential Link Between Teaching and Learning 101
TA B L E 4 . 4
Rules for Asking Questions That Enhance Formative Assessment
Questioning Rules Example
quiet. But “wait time” (usually three seconds is enough) has been shown to improve
student answers. Another way to allow some thinking time is to have your students
jot a brief answer on paper and then discuss it with a partner before sharing with a
larger group. This procedure is often termed “Think, pair, share.”
gre78720_ch04_094-119.indd Page 102 3/21/09 1:38:47 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 4 . 5
Strategies for “Bumping Up” Your Questions from the Basic Knowledge
Level When Working on Defining a New Concept
AVOID STRATEGY INSTEAD
Knowledge-Level Higher-Level
Examples Alternatives
“What does listless Ask students to provide “Describe a time when you felt
mean?” examples of concepts from listless.”
their own experience rather
than definitions.
“Define equity.” Ask students to apply the “Where have you seen equity
concept to something they demonstrated in current
have seen or read recently. events you have read about
or seen on TV?”
“Describe osmosis.” Ask students how they would “How could you use a visual
explain this concept to a concrete method to explain
younger student. osmosis to a first grader?”
Formative Assessment: The Essential Link Between Teaching and Learning 103
One common method that teachers use to systematically gather answers to ques-
tions from the whole class is individual white boards or chalk boards. You can use
“The In and Out Game” or the “Always, Sometimes, Never Game” described in Chap-
ter 3 as formative assessment in the midst of a new lesson and have the students write
their answer on their board, then hold them up so only you can see them. You can do
a quick check of the answers to get a sense of the level of understanding across the
whole class. Even our college students have enjoyed using white boards and have told
us that the boards keep them involved and give them an opportunity to experiment
with answers without fear of being wrong in front of their peers.
Quick Write Quick Writes A quick write is a brief written response to a question or probe. Quick
A brief written response to a writes have been developed in the context of science units as a technique for gauging
question or probe. change in student understanding over time, allowing teachers to meet the learning
needs of students, and obtaining valuable information for instructional changes where
necessary (Bass, 2003; Green et al., 2007). Quick writes have also been used in many
other learning contexts. Many teachers use a quick write at the end of class to ask
students to summarize the key points of the lesson that day or to describe what ques-
tions they still have. Quick writes are easy and flexible—they can be adapted to almost
any content you work with and can address any area you might want to probe with
your students. When consolidating new information, quick writes have the advantage
of requiring students to develop their own representations of content rather than
checking a right answer, filling in a blank, or memorizing what the teacher has told
them. They also provide opportunities for teacher feedback that can be easily differ-
entiated according to the needs of the students.
Collaborative Quizzes Quizzes are a staple of most classrooms, and they can be used
for formative assessment. One variation on quizzes that we recently read about is called
the collaborative quiz (Rao et al., 2002). Students take a quiz in a small group where
they must agree on correct answers and report back to the class. This process can
develop student reasoning skills, promote useful discussion over disagreements about
answers, and help students fill in gaps in knowledge. To be effective, you must ensure
that the discussion is structured so that students who have wrong answers will not be
stigmatized.
Self- and Peer Assessments Perhaps the best way for students to learn from formative
assessment is to enlist them to do it for themselves. Getting students in the habit of
reflecting about their work, rather than just doing it and turning it in, helps them begin
to assume a more active role in applying the criteria for success. An active role increases
student willingness to take more general responsibility for learning. You are encourag-
ing them to take a metacognitive step and make observations about their work, then
judge their work against what is expected. Eventually they develop the capacity to set
standards and goals for their own work. Table 4.6 provides sample questions that stu-
dents can ask themselves to start an internal reflective dialogue.
Students need instruction in the skill of self-assessment because most students
do not automatically know how to apply evaluation standards to their own work.
Remember that evaluation skills in Bloom’s taxonomy are one of the most difficult
cognitive levels to master. Students must know what is central and what is peripheral.
They must know the vocabulary that describes the different levels of quality so they
can see the differences between success and failure. Students also need feedback on
their judgments to gradually become accurate.
To assist students in this process, Heidi Andrade (2008) suggests having students
use different colored pencils to underline important elements in the scoring guide and
then use the same color to note evidence of each element in their paper. Practice with
assignments from an earlier class (with all identification removed) can also be helpful.
At first you might find that poorer students tend to inflate their ratings of their per-
formance, and better students tend to undervalue their work. But practice across time
helps students develop more accurate self-assessments.
Another way that students can be involved in formative assessment is through
peer assessment. Students can profit from examining others’ work. It gives them
practice applying the evaluation standards and ultimately will give them more
gre78720_ch04_094-119.indd Page 105 3/21/09 1:38:48 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Formative Assessment: The Essential Link Between Teaching and Learning 105
TA B L E 4 . 6
Sample Questions for Self-Assessment Reflection
What goals do I want to accomplish with Do I really understand the difference
this assignment? between a utopia and a dystopia?
Am I executing my dance routine with How could I improve this paragraph to make
proper alignment and balance? it more persuasive?
Am I communicating what I want to with How could I apply this new concept to my
my performance of this musical piece? own life?
What skill should I work on next? What strategies have I used successfully
before that might work with this new
word problem?
How will I be able to tell when I make a How would I summarize the paragraph I just
mistake? read?
Does my answer cover all parts of the What are ways I can improve the next time
question? I do something like this?
Does this answer make sense for this Do I understand and have I followed all
problem? directions?
insight into their own work. The key is to make sure that they know the goal is to
offer constructive suggestions for improvement. As in self-assessment, students
must be trained to understand the learning goals and the evaluation standards. They
must know what to look for in the assignments they are examining. More specifi-
cally, they must learn the evaluative standards that are important to a task. This
topic will be further addressed in Chapter 9 when we discuss performance tasks
and scoring guides.
Peer assessment also requires training in interpersonal sensitivity. We know one
elementary teacher who spends two weeks every year modeling and coaching appro-
priate ways to offer feedback and suggestions to peers. Only then does she let her
students engage in peer assessment using a simple scoring guide to examine stories
their classmates are working on. She also structures peer-assessment activities so that
students discuss strengths as well as ways to improve. She asks students to describe
two “pats on the back” and one suggestion each time they peer assess. You as a teacher
must develop a positive climate in your classroom where students are honestly work-
ing together to improve the quality of each other’s work in constructive ways. Students
must feel comfortable and supported by the teacher and each other.
The hard work required in developing constructive peer and self assessment
practices in your classroom will pay off. Students become more familiar with the learn-
ing goals and evaluation standards when they are asked to use them in judging their
own and others’ work. They develop more autonomy and independence as well as
confidence in their judgment. They practice sophisticated high-level thinking skills. In
addition, students receive more input on their work from a variety of points of view
and learn to consider alternative perspectives seriously. Often students use more basic
language in their interactions than the teacher does, which may help promote new
gre78720_ch04_094-119.indd Page 106 3/21/09 1:38:48 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 4 . 7
Homework Strategies for Formative Assessment
1. Explain why the homework assignment is valuable in terms of reaching the learning goal.
2. Plan homework assignments that reflect the range of student needs related to the
learning goal.
3. Focus on quality and not quantity, and vary the types of assignments.
6. Provide prompt feedback on how to close the gap between where students are and what
they are aiming for.
understanding. Eventually this process allows students to develop the capacity to set
standards and goals for their own work.
Homework
Homework can be a controversial subject because it is not always used wisely. Table 4.7
offers suggestions for making homework most effective for formative assessment pur-
poses. In keeping with the key elements of formative assessment, you must first make
sure students understand what the homework assignment is meant to accomplish in
terms of the learning goal. Before you assign any homework, you want to be sure it has
a clear purpose that the students understand. Students need to believe that the home-
work is valuable and will lead to academic improvement. As you may recall from your
own personal experience, many students dislike homework and experience negative
feelings while they are doing it. They haven’t internalized the conviction of some adults
that all homework is good for building character and developing a sense of responsibil-
ity. If students can understand directly the value and purpose of a homework assign-
ment, negative feelings are less likely to surface about homework.
Next, you should assign homework that addresses the specific needs of your
students. This usually requires flexible differentiation. For students who need practice
with a skill, you can use homework to reinforce what has been taught. For students
who need to be stretched to apply a skill to a new situation, you may focus on appli-
cations. Tanis Bryan and her colleagues (2001) also point out the importance of taking
into account differing homework completion rates for students with disabilities, who
can take up to eight times as long to complete homework as other students.
In addition, some students may require cues and prompts for completing an assign-
ment. For example, you might develop a checklist for an assignment where students can
check off each step as they complete it. You might also set up homework checking groups
who work together and graph their progress across time. Finally, understanding of students’
home situations can help you tailor assignments to them. For example, availability of time
for homework must be considered for a student in charge of younger siblings after school
or for students who must work to help support their family.
Students have reported in research that their homework is usually boring and
routine. They often are required to do the same tedious tasks (e.g., answer chapter
gre78720_ch04_094-119.indd Page 107 3/21/09 1:38:48 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Formative Assessment: The Essential Link Between Teaching and Learning 107
questions, do 10 math problems from the text, write definitions for the words of the
week) over and over. Homework assignments that are brief but valuable will enhance
interest in school and help promote mastery goals. The quality of an assignment is
much more important than the quantity. Particularly useful are assignments demon-
strating that students are able to apply concepts learned in class to situations in the
real world. For example, in a unit on persuasive writing, students could look for per-
suasive strategies used in articles in a newspaper or on a website. In a unit on under-
standing proportions in math, you could have students measure the dimensions of one
piece of furniture in a room at home and construct a drawing to scale of the room
and that object.
Use of students’ family and cultural background can help you develop homework
assignments that are meaningful to students. When you have assignments related to
family, you should be aware of the diversity in your students’ lives and point out that
by family you mean those people whom your students consider as family. For social
studies and history units, you can ask students to find out about their family’s experi-
ence with and connection to historical events. A student we know in a history class
recently surprised his teacher with a battered bronze bust of Hitler that his grandfather
had found in a German factory at the end of World War II. These family experiences
can be compared and contrasted with other information about the events.
Family members also often have skills and knowledge that can help students
move toward mastering learning goals through activities at home. Tapping such infor-
mation, especially when working with students from culturally marginalized groups,
can help students connect the school and home cultures and can help them feel that
their home culture is valued at school. For example, family traditions related to music
and art, business and trade, or farming can be applied to units in science, math, social
studies, or the arts.
One problem that can arise with this approach, if you are not careful, is that
students learn a lot about parts of the topic that are interesting to them, but they don’t
make the needed connections to the learning goals. Their understanding remains
superficial. Your job is to make sure you always connect your learning activities to
mastery of the skills and knowledge your students need. Your goal is not simply to
keep your students entertained.
The last thing to remember about homework is that assigning it as a punishment
is a guaranteed way to discourage motivation and learning. All of us remember teach-
ers saying things like, “All right, class, since you can’t settle down, I am doubling your
homework tonight and you must do all the problems instead of just the even ones.”
Please make a quiet pledge to yourself right now never to do this, no matter how much
your students annoy you on a bad day. Too often in the past, teachers have assumed
that punishment or the threat of punishment works best to get students to comply with
their requirements. Using homework as a punishment sends the message that learning
activities are negative and something to avoid—not meaningful, and certainly not
enjoyable. Imagine our loss if our favorite authors—Stephen King, Emily Dickinson,
Toni Morrison, or Joan Didion—had perceived writing as a form of punishment
because of a teacher’s misuse of assessment.
Ethics Alert: Behavior problems and assessment practices should be kept separate. The
purpose of assessment is for us to understand what the student understands, not to manage
behavior through punishment.
gre78720_ch04_094-119.indd Page 108 3/21/09 1:38:48 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 4 . 8
Characteristics of Useful Feedback
Type Questions It Answers Useful Examples
Information on the • What level of knowledge/understanding • “Your understanding of musical forms needs
assignment’s knowledge/ related to the learning goals did the student tweaking, especially the difference between
understanding (WHAT) demonstrate, and what needs work? sonatas and symphonies.”
• What actions can the student take to • “Review the musical selections from Lesson
improve that knowledge/understanding? 3 in the listening lab.”
• In what ways does this assignment show • “You have now clearly mastered an
improvement over previous assignments? understanding of the causes of World War II.”
Information on the • What skills/strategies/procedures related to • “You need to work on seeing more than one
assignment’s skills/ the learning goals did the student use well, perspective on an issue.”
strategies/procedures and which need more work? • “I suggest you may want to look at your
(HOW) • What actions can the student take to argument from your opponent’s point of
improve that skill/strategy/procedure? view.”
• What kind of improvement has occurred • “You are making progress in your drawing
in progress toward learning goals? What is because you are now incorporating balance
the reason for the improvement? into your composition.”
• What strategies should be continued? • “You did a good job keeping a steady beat in
Changed? that piece. Keep it up.”
• Is there evidence of self-monitoring and • I like how you noticed that you forgot to
self-correction? regroup in the 100’s column and went back
• Is there evidence of improvement in self- to fix it. You didn’t do that last week, so you
monitoring? Self-correction? are making progress.”
gre78720_ch04_094-119.indd Page 109 3/21/09 1:38:49 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
Formative Assessment: The Essential Link Between Teaching and Learning 109
TA B L E 4 . 9
Characteristics of Feedback That Are Not Useful
Type Questions It Answers Not Useful Examples
versus potentially harmful kinds of oral and written feedback. Notice that Table 4.8
addresses both knowledge/understanding (WHAT) and skills/strategies/procedures
(HOW) previously seen in Chapter 2 for designing learning goals and in Chapter 3
for designing diagnostic assessments.
In looking at the differences between Tables 4.8 and 4.9, you can see that useful
feedback informs, reminds, questions, suggests, and describes. It gives students some-
thing to think about and helps them know what to do next in relation to both the
knowledge/understanding and skills/strategies/procedures components of the learning
goals. Poor feedback, on the other hand, offers judgments about the work or the stu-
dent and not much else. Useful feedback helps students feel they are making discover-
ies that help them grow. Poor feedback makes them think they have made mistakes
that reflect low ability or other negative character traits. As early as second grade,
children can sense the difference between feedback that is descriptive or informational
and feedback that is more evaluative. When students feel judged, their motivation and
effort usually suffer. Often in the past when the sorting function of schools was most
prominent, teachers focused on the evaluative function of feedback more seriously than
the informative function. But now as we all work to help all children learn as much
as they can, informative feedback must be our goal. We will now look at each of the
four kinds of feedback in Tables 4.8 and 4.9 in more detail.
feedback should also address progress you have seen since the last assignment to
help students make connections across assignments. Focusing explicitly on progress
also provides encouragement that effort is making a difference in their learning,
promoting effort optimism.
Formative Assessment: The Essential Link Between Teaching and Learning 111
will need different kinds of feedback, and this is an important opportunity to dif-
ferentiate your individual instruction for them. You must make sure students don’t
perceive the gap between where they are and where they need to be as too large,
or they will find it unattainable and not worth the effort. Similarly, if you give stu-
dents too many things to improve at once, you will overload them and they will get
discouraged. Usually you will want to pick out only one or two areas for improve-
ment at a time.
assessment and will let the students know that all other aspects of quality are up to
them. Because formative assessment can increase a teacher’s workload, you must set
explicit limits.
They take responsibility to make an instructional change. This approach leads to what
we often call a “Don’t blame the kid first” philosophy of teaching. Teachers, as profes-
sionals, have an obligation to look to themselves for solutions to classroom problems
before they write students off as unable to learn.
TA B L E 4 . 1 0
Accommodation Considerations for Multiple Means of Engagement
for Formative Assessments
Issue Type of Accommodation
Difficulty with fine motor skills • Consider choice in terms of fine motor limits.
• Provide scaffolding related to skill level.
Literacy skills below those of typical • Provide opportunities for small-group discussions.
peers (e.g., learning disability) • Focus early formative assessments on ensuring that
specific vocabulary of the discipline is learned.
• Consider interests, relevance, choice in terms of
literacy skill levels.
• Provide scaffolding related to literacy skill levels
(e.g., structure of tasks, language demands).
expressions to fit the situation” (p. 27). Table 4.10 summarizes accommodation consid-
erations for formative assessments related to a sample of needs your students may have.
Also remember our point in Chapter 3 that many students, not just students with these
specific needs, will benefit from your considering such accommodations.
B O X 4 . 2 M A T H J O U R N A L P R O M P T S F O R
U N I T O N S O L V I N G M A T H P R O B L E M S
R E L A T E D T O P R O P O R T I O N S
Journal Prompt 1 already have a scale model of your room showing 1 foot ⫽
A tree casts a shadow of 15 meters when a 2-meter post nearby 1.5 inch. The space where you want to put the table is 3 inches
casts a shadow of 3 meters. Find the height of the tree. by 4.5 inches on your scale model. Should you buy it?
Her approach involves designing three equally difficult journal prompts that
require students to analyze a problem, solve it, and document how they know their
answer is reasonable. Each prompt addresses problems incorporating geometric
concepts and similar and congruent figures, the mathematical content to be mas-
tered in this unit. Thus the prompts constitute one measure of the skills and knowl-
edge students should be able to demonstrate when they have mastered the learning
goals.
She plans to use the first prompt as a diagnostic assessment and springboard for
class discussion at the beginning of the unit. The second prompt serves as a formative
assessment involving individual work and class discussion in the middle of the unit to
check on student understanding and progress toward the learning goals. The final
prompt will be given as a review before Lynn’s summative assessment for the unit to
determine what remaining aspects of the goals may need additional work. An example
of the three prompts is displayed in Box 4.2.
To keep track of student progress across time, Lynn uses the same scoring
guide each time to score student responses. The scoring guide is shown in Table 4.11.
After the last prompt, most students should have checks in the column farthest to
the right for each checklist item because students should reach mastery on each ele-
ment by that time.
In addition, Lynn can examine the pattern of checks for all students in her class
when she examines the students’ journals after these prompts. If she notices, for
example, that certain mathematical terms aren’t being used correctly by a group of
students, she can design a mini-lesson for them. If some students are getting stuck
on a specific procedure, she can highlight that in upcoming instruction. The scores
on the checklist, when scanned for the whole class, are an important source of infor-
mation for designing feedback and later instruction to ensure all students meet the
learning goal.
Lynn’s example also illustrates several important points we have made about
formative assessment. First, her journal prompts are aligned with all three of her learning
gre78720_ch04_094-119.indd Page 116 3/21/09 1:38:50 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
TA B L E 4 . 1 1
Scoring Guide for Scoring Math Journal Prompt
Needs Further Minor Assistance Mastery of This
Instruction and Needed to Reach Element of the
Guidance (list) Understanding (list) Problem
Diagram Is
Accurate
Explanation
Demonstrates
a Thorough
Understanding of
the Problems
Follows
Appropriate
Procedures to
Solve the Problem
Uses Appropriate
Units of
Measurement
Uses Proper
Mathematical
Terminology
Documents
Check on
Reasonableness
of Answer
goals. When students respond to these prompts correctly, they will provide one kind
of evidence that they have mastered the content of the learning goals. In addition,
Lynn’s prompts require both types of content knowledge discussed in Chapter 3—
skills/strategies/procedures and knowledge/understanding. These tasks are consistent
with learning goals that require skills to do certain types of problems as well as a clear
understanding of the concepts involved.
Math teachers often tell us that their students learn how to solve problems fairly
easily, but they have more difficulty knowing when to use a specific strategy or know-
ing the reasoning behind following a particular procedure. This kind of understanding,
in addition to the problem solution, is facilitated by the questions Lynn asks. Because
students reveal their level of understanding in their responses to these prompts as well
as in class discussion following them, Lynn can provide them with useful feedback on
how to close the gap between where they are and where they need to be to successfully
meet the learning goals. Finally, students have an opportunity after the formative feed-
back from Lynn to demonstrate their understanding and show progress in learning
during a summative assessment.
gre78720_ch04_094-119.indd Page 117 3/21/09 1:38:50 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
HELPFUL WEBSITES
http://www.caroltomlinson.com/
Carol Tomlinson, one of the foremost advocates of differentiated instruction, provides articles,
books, presentations, and other resources on her website.
http://www.tki.org.nz/r/assessment/one/formative_e.php
This website offers a variety of presentations from experts about formative assessment, including
a general introduction, how to offer quality feedback, teacher-student conversations that
enhance learning, and practical formative strategies.
7. Describe tasks you have seen teachers use that enable their students to close
the gap between where they are and where they need to be. What kind of
feedback do teachers offer these students before they take action to close
the gap?
8. Analyze the feedback you received on a paper or project in terms of the
characteristics of useful and poor feedback in Tables 4.8 and 4.9. Describe how
you might improve the feedback you received.
9. For the formative assessment task you designed for question 2, describe several
accommodations you might make for a student whose first language is not
English and for a student who has difficulty paying attention based on UDL
principles described in Chapters 3 and 4.
10. Describe accommodations you have seen teachers implement for one of the
groups in Table 4.10 related to multiple means of engagement. Were these
accommodations effective? Why or why not?
11. Analyze Lynn McCarter’s formative assessment approach in terms of how
adequately it addresses the key elements of formative assessment as teacher
actions (Table 4.1). How might you improve Lynn’s formative assessment?
REFERENCES
Andrade, H. 2008. Self-assessment through rubrics. Educational Leadership 65(4): 60–63.
Bass, K. 2003. Monitoring understanding in elementary hands-on science through short writing
exercises. Unpublished doctoral dissertation, University of Michigan.
Black, P., and D. Wiliam, 1998. Assessment and classroom learning. Assessment in Education:
Principles, Policy, and Practice 5(1): 7–74.
Brookhart, S. 2001. Successful students’ formative and summative uses of assessment information.
Assessment in Education 8: 153–169.
Bryan, T., K. Burstein, and J. Bryan. 2001. Students with learning disabilities: Homework problems
and promising practices. Educational Psychologist 36: 167–180.
CAST. 2008. Universal design for learning guidelines version 1.0. Wakefield, MA: Author. Retrieved
April 14 from http://www.cast.org/publications/UDLguidelines/UDL_Guidelines_v1.0-
Organizer.pdf.
Goldsmith, M. 2002. Try feedforward instead of feedback. Leader to Leader. Retrieved May 20,
2008, from http://www.marshallgoldsmithlibrary.com/cim/articles_print.php?aid⫽110.
Green, S., J. Smith, and E. K. Brown. 2007. Using quick writes as a classroom assessment tool:
Prospects and problems. Journal of Educational Research & Policy Studies 7(2): 38–52.
Guskey, T. 2007. Formative classroom assessment and Benjamin S. Bloom: Theory, research
and practice. In J. McMillan (ed.), Formative classroom assessment: Theory into practice,
pp. 63–78. New York: Teachers College Press.
Hattie, J., and H. Timperley. 2007. The power of feedback. Review of Educational Research 77:
81–112.
McMillan, J. 2007. Formative classroom assessment: Theory into practice. New York: Teachers
College Press.
Rao, S., H. Collins, and S. DiCarlo. 2002. Collaborative testing enhances student learning. Advances
in Physiology Education 26: 37–41.
Ruiz-Primo, M., M. Li, C. Ayala, and R. Shavelson. 2004. Evaluating students’ science notebooks
as an assessment tool. International Journal of Science Education 26: 1477–1506.
Shepard, L.A., 2000. The role of assessment in a learning culture. Educational Researcher 29(7):
4–14 (p. 10).
Shepard, L. A. 2005. Assessment. In L. Darling-Hammond and J. Bransford (eds.), Preparing
teachers for a changing world: What teachers should learn and be able to do. San Francisco:
Jossey-Bass.
gre78720_ch04_094-119.indd Page 119 3/21/09 1:38:51 AM user-s172 /Users/user-s172/Desktop/ANIL KHANNA/20.03.09/MHSF123:GREEN:210
References 119
Shepard, L. 2008. Formative assessment: Caveat Emptor. In C. A. Dwyer (ed.), The future of
assessment: Shaping teaching and learning, pp. 279–303. New York: Lawrence Erlbaum
Associates.
Shute, V. 2008. Focus on formative feedback. Review of Educational Research 78: 153–189.
Smith, E., and S. Gorad. 2005. ‘They don’t give us our marks’: The role of formative feedback in
student progress. Assessment in Education 12: 21–38.
Stiggins, R. 2008. Correcting “Errors of measurement” that sabotage student learning. In
C. A. Dwyer (ed.), The future of assessment: Shaping teaching and learning, pp. 229–243.
New York: Lawrence Erlbaum Associates.
Woelders, A. 2007. “It makes you think more when you watch things”: Scaffolding for historical
inquiry using film in the middle school classroom. Social Studies 98(4): 145–152.
gre78720_ch05_120-151.indd Page 120 3/21/09 11:55:04 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
120
gre78720_ch05_120-151.indd Page 121 3/21/09 11:55:06 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 5
PROGRESS MONITORING:
ASSESSMENT AS A
MOTIVATIONAL TOOL
For goals to be effective, people need summary feedback that reveals progress in
relation to their goals.
–Edwin Locke and Gary Latham
INTRODUCTION
In this chapter we are tightening the assessment focus to help you find ways to
discover and use tangible evidence of individual and class-wide growth in learning.
One of the most powerful motivators we have seen in classrooms—for teachers as
well as students—is concrete evidence of student progress. Remember the six stu-
dents in Chapter 1 who set a goal for the specific number of words they would read
in one minute and then charted their progress on a graph for 10 weeks? As they
saw the numbers rise, these students not only increased their reading fluency, they
also became more interested in reading and took more books home after school.
The instructional gains of these students, as well as 35 years of research (e.g.,
Locke & Latham, 2002; Zimmerman, 2008), provide us with concrete evidence that
when learners focus on personal goals and track their own progress, they begin to
develop mastery goals. As you recall from Chapter 1, mastery goals involve learning
for the sake of mastery, not just working to perform well in the eyes of others.
Encouraging mastery goals can lead to more positive attitudes, higher motivation,
and increased effort and engagement. Mastery goals also promote independent
thinking and personal responsibility, important self-governing skills for democratic
participation (Urdan & Schoenfelder, 2006). This chapter will focus on several
approaches you can use to capture and showcase progress across time.
121
gre78720_ch05_120-151.indd Page 122 3/21/09 11:55:06 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Commitment
Goals work best for progress monitoring when students are committed to them. One
way you can ensure commitment is by allowing students to choose the goals them-
selves so they are tailored to their needs and interests (see Table 5.1). As you recall
from Chapter 2, Jo Ellen Hertel, the middle-school computer teacher, found that
involving students in designing personal goals increased their interest in the unit.
When this is not possible, students are still likely to commit to a goal if they under-
stand its significance. For example, if a third-grade teacher explains that memorizing
multiplication tables will be useful when students want to quickly determine how much
money they will be paid for cat sitting when their neighbor is out of town (or for
similar relevant calculations), students are more willing to work toward that goal
because they see the value concretely.
Another factor that influences commitment is how official or “public” it is. Stu-
dents who sign a copy of their goals, for example, are more likely to take them seriously
than students who don’t have an official record. The other ingredient for commitment
is ensuring that students are challenged by the goal but can reach it. Do not waste time
monitoring progress toward goals that are easily achieved. You also want to make sure
that students don’t get discouraged by an impossible goal. For progress monitoring to
be motivating, you want to see steady gains across time.
TA B L E 5 . 1
Considerations for Effective Goals for Progress Monitoring
Consideration Suggestions
Commitment to the goal • Allow students to help set their own goals.
increases performance • Help students understand why the goal is important to
toward the goal. learning.
• Encourage public commitment to the goal.
• Ensure that the goal is challenging but attainable for the
student.
Specific and shorter-term • If addressing learning goals for progress monitoring, design
goals work better than goals based on procedures described in Chapter 2.
vague goals or urging • Work with student to personalize goals.
to “Do your best” or • Break long-range goals into subgoals that can be
“Try hard.” accomplished in a shorter period of time.
gre78720_ch05_120-151.indd Page 123 3/21/09 11:55:06 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 5 . 2
Mastery Monitoring Chart for Chemistry Unit
Chemistry Standards: Gases and Their Properties
Name:
Learning Goal Don’t Get It Yet* Action** Need More Practice* Action** Ready for Test*
Define standard
temperature and
pressure (STP).
*List evidence for your self-rating: quiz score, homework, success on review questions, etc.
**List what steps you will take to improve your preparedness for the test.
Adapted from Costa, A., & Kallick, B. 2004. Assessment strategies for self-directed learning. Thousand Oaks, CA: Corwin Press. p. 39. Reprinted with permission.
gre78720_ch05_120-151.indd Page 124 3/21/09 11:55:06 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Requiring students to monitor their progress so carefully helps them develop the meta-
cognitive skills that increase their ability to evaluate themselves.
M
FORMATIVE TASKS AS THE FOUNDATION FOR
ONITORING GROWTH
As we explained in Chapter 1, specific kinds of formative assessment strategies
promote mastery goals (see Figure 1.3). We added to that foundation in Chapter 4
by introducing you to the key elements of formative assessment (see Table 4.1). As
you recall, formative assessment involves assignments where mistakes are an expected
part of learning, and improvement is expected across time. Such assignments might
include turning in drafts of a paper or project for feedback before a final grade is
given, or self-assessing at each step in an art project. So, to notice progress over time,
you must use tasks designed so students can improve as they progress through them
(Table 5.3).
Your next step is to decide how to track this improvement over the course of the
unit of instruction and the school year. For example, to note student progress during
a unit on essay writing, you can use a checklist of steps that must be in place before
the final draft is complete. You can also rate each draft using the scoring guide that
TA B L E 5 . 3
Steps Required for Examining Progress Over Time
Step Examples
1. Set specific goal. • I will be able to catch a ball in my hands with my elbows
bent and my arms in front of my body (P.E.).
• I will be able to write journal entries that show
connections between the period we are studying and
current events (Social Studies).
• I will be able to read 100 words per minute with fewer
than three errors in third-grade stories.
2. Use formative assess- • Use videos of students catching a ball and use scoring
ment: assignments guide for self- and peer-assessment at two points during
where mistakes are a unit (P.E.).
an expected part • Progression of journal reflections across the year
of learning and incorporating teacher feedback (Social Studies).
improvement is • Students read aloud from an unfamiliar grade-level passage
expected across time. for a minute several times across the year (Elementary
language arts).
3. Choose a method to • Use scoring guide for catching a ball before, during and
track improvement after instruction (P.E.).
over time, representing • On a personal chart, track changes in teacher feedback
gains visually. and scores on scoring guide for reflections across the year
(Social Studies).
• Graph changes in number of words read correctly and
miscues over time (Elementary language arts).
gre78720_ch05_120-151.indd Page 125 3/21/09 11:55:06 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
will be employed with the final draft. Both of these methods document progress as
students develop the essay.
The earlier education model discussed in Chapter 1, in which sorting and judg-
ing were the dominant influence, had a pervasive impact on the way assessment was
conceived and designed. Under that model, researchers and educators devoted much
time and effort to developing tests that would sort students by determining their rela-
tive position in a larger group.
Rather than simply comparing the individual student to others, the newer assess-
ment approach looks instead for individual growth. You compare the individual student
to him- or herself across time in relation to the learning goal. This approach is espe-
cially relevant for classroom teachers because their primary concern is growth in stu-
dents’ learning. Because this approach is newer, it has less historical research and
accumulated standard practice associated with it. Because we believe this newer
approach is the wave of the future, we want to use this chapter to describe practices
found to be effective that you can adapt in your own classroom.
We believe focusing on individual growth across time, especially with a visual
representation, is a key strategy for enabling you to close the achievement gap for
students (often poor and minority students) who have consistently performed lower
than other demographic groups in academic achievement and graduation rates in the
United States. Keeping track of individual growth ensures that you know when a stu-
dent is falling behind. When you see someone lagging, you can do something about
it, such as changing instructional strategies, finding time for more practice, or making
accommodations.
The best teachers engage in some kind of progress monitoring as the basis for
differentiating their instruction for learners at different levels (Moon, 2005). If
you have no way to check on who is failing to accomplish key tasks necessary to
master the learning goals, you won’t have the information you need to help strug-
gling students catch up to their peers. You are trying to teach blindfolded. Progress
monitoring focuses your attention squarely on your efforts and your students’ efforts
to master the learning goals. You are conveying to students that you won’t give up
on them and you will do everything you can to help them master the learning
goals. This understanding helps you foster high expectations for your students. You
signal to them that you expect them all to keep working and to keep learning more.
And they see this work pay off as their graph or checklist illustrates tangible
improvement.
Systematic progress monitoring motivates students (Bandura, 1997; Zimmerman,
2008). Children we have worked with look forward to noting their accomplishments
on checklists or graphs. Some of you probably had a teacher who had you check
off each multiplication table as you learned it, and you may have felt encouraged
to see the growing number of checks over time. The same dynamic works when you
make a “To Do” list and you cross off items as you finish them. Many students can
feel overwhelmed by all they have to accomplish, so some method of monitoring
lets them see that they are progressing and learning new things. Even asking stu-
dents to set a goal and keep track of homework completion can enhance achieve-
ment (Trammel et al., 1994), probably because it increases the awareness necessary
for developing self-control (Zimmerman, 2002). Even more important, evidence is
accumulating that frequent progress monitoring and multiple opportunities for
improvement are a common characteristic of high-achievement schools, even in the
face of significant obstacles such as student bodies with high levels of poverty and
diversity (Reeves, 2004). The advantages of systematic progress monitoring are sum-
marized in Table 5.4.
gre78720_ch05_120-151.indd Page 126 3/21/09 11:55:07 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 5 . 4
Advantages of Systematic Progress Monitoring
Advantages for Teachers Advantages for Students
Provides clear signal when student falls Promotes mastery goals and a sense of
behind or gets ahead of others responsibility for own learning, especially
when students keep track of their own
progress
Provides foundation for differentiating Provides opportunity for setting personal goals
instruction
Allows tangible picture of growth useful for Allows students to see tangible evidence of
communicating with students, parents, their own growth and develop effort
and administrators optimism
Focuses teacher effort and attention on Focuses student effort and attention on
students mastering learning goals mastering learning goals
Mastery Monitoring
Mastery Monitoring The first approach is termed mastery monitoring. It requires three steps: (1) choos-
A method of monitoring progress ing a learning goal or standard, (2) designing several tasks that if completed satisfac-
by tracking student completion torily indicate mastery of the standard, and (3) documenting when each student
of different tasks, that when
completed, indicate achievement completes each task. Table 5.2, which we have discussed in terms of goals for a
of the learning goal. chemistry unit, depicts a mastery monitoring approach. Similarly, in his elementary
P.E. program at Orchard school, Jim Ross expects students to master the learning
goal of executing game strategies and skills (see Table 5.5) by age 12. He has broken
down this learning goal into several observable tasks, such as moves safely within
boundaries and receives an object from a teammate in a game. As you can see on the
checklist, he can keep track of each student’s accomplishment of each task that con-
tributes to the learning goal. The shaded column represents mastery of the learning
goal. The previous skills, when mastered, should lead to this proficiency level. Note
that more advanced skills are also presented in the table to challenge those who
exceed mastery of the learning goal.
For mastery monitoring to make sense, the tasks you choose to monitor should
each contribute new understanding and skills leading toward mastery of the learning
goal. For example, you can see how overtakes an escaping player and able to escape a
defender both play a role in executing game strategies and skills.
TA B L E 5 . 5
gre78720_ch05_120-151.indd Page 127
Student 1 Mastered 9-15 Mastered 9-15 Mastered 9-15 Mastered 10-15 Mastered 11-30
11:55:07 PM user-s172
127
/Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
gre78720_ch05_120-151.indd Page 128 3/21/09 11:55:07 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
If you want your students to use a graph to illustrate mastery monitoring for a
specific learning goal, it could look like the graph in Figure 5.1. The number of tasks
successfully accomplished is listed cumulatively from 1 to 7 on the left side of the
graph, and the expected completion dates are noted across the bottom. You can see
that a student who is on target for completing the tasks that lead to mastery of a learn-
ing goal is depicted in Figure 5.1. The line shows a steady increase in the number of
tasks completed over time. Figure 5.2 shows a student who is getting behind and may
need assistance. For this student, the graph depicts accomplishment of tasks 1–3, but
the student does not master task 4, so the line flattens out between dates 3 and 4.
Figure 5.3 depicts a student who has forged ahead quickly to master the learning goal
4
Tasks
0
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6 Date 7
Time
4
Tasks
0
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
Time
FIGURE 5.2 Example of a Mastery Monitoring Chart for a Student Who Has
Not Successfully Completed Task 4 by the Expected Date and May Require
Additional Assistance
gre78720_ch05_120-151.indd Page 129 3/21/09 11:55:12 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
4
Tasks
0
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
Time
FIGURE 5.3 Example of a Mastery Monitoring Chart for a Student Who has
Completed All Seven Tasks Before Due Date and Requires Enrichment
and could benefit from enrichment activities. The graph shows the student successfully
completed tasks 2 and 3 by the second date and all of the tasks by the third date.
Ethics Alert: To protect the confidentiality of a student’s performances, graphs should not
be posted on classroom walls or bulletin boards. Instead, each student should have a folder
or portfolio for keeping his or her mastery graphs.
The amount of time and organizational skills required for such progress monitor-
ing can be daunting. To reduce teacher workload, we suggest that students from second
or third grade and above can often be in charge of monitoring their own progress,
with the teacher prompting graph updates at appropriate intervals. Checking these
graphs periodically can give you and your students a snapshot of how they are pro-
gressing on accomplishing a major learning goal that spans many tasks and requires
considerable time. Computer spreadsheet programs can also help organize data with
students in rows and specific skills in columns. Such a setup allows for reviewing
progress quickly and efficiently.
You should also be selective about which learning goals you choose for progress
monitoring, and you may want to start with a small group of students first. Monitor-
ing progress in this way can provide important insights for your decisions about what,
when, and how to teach. Visual displays such as graphs can also clearly communicate
to other interested parties such as parents and administrators. To summarize mastery
monitoring procedures, Table 5.6 provides the steps in developing a mastery monitor-
ing system for progress monitoring. General Outcome
Measurement
A method of monitoring progress
General Outcome Measurement that uses several brief yet similar
The second method for individual progress monitoring that Deno describes is tasks (e.g., reading aloud in grade-
level stories for one minute) that
termed general outcome measurement. Instead of monitoring the achievement of can indicate achievement of the
several sequential tasks necessary to demonstrate mastery of a learning goal (as in learning goal.
gre78720_ch05_120-151.indd Page 130 3/21/09 11:55:14 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 5 . 6
Steps in Designing a Mastery Monitoring System for Progress Monitoring
1. Select an important learning goal for which you want to track progress.
3. Design a checklist for each student listing these knowledge/understandings and skills/
strategies/procedures in a logical sequence.
4. Choose or design a sequence of assessment tasks that represent mastery of each of the
knowledge/understanding and skills/strategies/procedures.
TA B L E 5 . 7
Steps in Designing a General Outcome Measurement System
for Progress Monitoring
1. Choose an important achievement or “general outcome” that students should work
toward across a semester or school year that is consistent with your learning goals.
2. Choose or develop a brief, repeatable measure that validly represents the general
outcome in the content area, and collect or design a variety of equivalent versions.
3. Develop or choose a standard method of administration and scoring for your measure
so that any increase in scores can be attributed only to student growth.
4. Administer different versions of the measure at suitable intervals (e.g., once per month,
once per grading period).
5. Graph the results across time to track individual progress toward the general outcome.
TA B L E 5 . 8
General Scoring Scale for Evaluating Multiple Tasks Within a
Learning Goal
Score Description of Place on Scale
2 No major errors or omissions regarding the simpler details and processes, but major
errors or omissions regarding the more complex ideas and processes.
1 With help, a partial understanding of some of the simpler details and processes
and some of the more complex ideas and processes.
Source: Marzano, R. 2006. Classroom assessment and grading that work. Alexandria, VA: Association for Supervision
and Curriculum Development. Copyright 2004 by Marzano & Associates. Reprinted with permission.
improvement, because students won’t have mastered most of the tasks (writing com-
plete sentences with subjects and verbs, etc.) involved. But gradually their paragraphs
would become more sophisticated as the students mastered more of the key elements
with practice.
The formative assessment assignment chosen by Lynn McCarter described in the
case study at the end of Chapter 4 offers an example of general outcome measurement.
As you may recall, she designed three equally difficult journal entry prompts that
addressed mastery of all three learning goals for her math unit. She uses a different
one at three different points during the unit to assess student progress.
Marzano (2006) suggests an approach similar to general outcome measurement.
He offers a 5-point scoring scale that can be used when evaluating a range of tasks
that could represent a learning goal (Table 5.8). Most student work can be rated from
0, which is characterized as “no understanding or skill demonstrated,” to 4, “in-depth
inferences and applications that go beyond what was taught,” in reference to a specific
standard or learning goal. Scores based on this scale will gradually improve as mastery
increases on the tasks across time, documenting learning gains.
For general outcome measurement progress monitoring, graphing of results is
useful and important, according to both Deno and Marzano. Deno suggests that
these graphs often look like height and weight charts that pediatricians use. You are
measuring a desired broad outcome that represents academic “health” as it increases
across time. Student performance on your graph starts out low but then improves as
instruction proceeds. Figure 5.4 illustrates a graph that tracks writing progress across
a school year for Brittany, a second-grader. For each instance represented by a point
on this graph, the teacher gave students a writing probe (e.g., “Yesterday a monkey
climbed in our classroom window and . . .”) and had them write for three minutes.
She then counted words spelled correctly (WSC) for each probe and plotted it on
this graph.
You can see scores for Brittany’s very first writing sample from August and her
last writing sample from the following April in Figure 5.5. Notice that the line in
gre78720_ch05_120-151.indd Page 132 3/21/09 11:55:15 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
50
45
40
30
25
20
15
10
0
August September November December February March April
FIGURE 5.4 Example of General Outcome Measurement Graph for Paragraph Writing
FIGURE 5.5 Writing Samples from August and the Following April for Brittany
Figure 5.4 shows rising scores and slight declines, which are to be expected. The key
is that overall Brittany shows considerable progress between August and the following
April. And this progress can be clearly seen only because the teacher took the time
to work with Brittany to make this chart. As you look at these writing samples, think
about other ways you might gauge progress. Brittany’s teacher also counted total
words written (TWW) to note progress, although she did not include that measure
on the graph.
You might instead decide to score such writing samples with a scoring guide that
assigns points for each of the key elements (writing complete sentences with subjects
and verbs, using correct spelling and punctuation, writing topic sentences, developing
supporting details). Over time, as Brittany made progress, the points she would earn
on the scoring guide would increase. The key for general outcome measurement is that
you use the same type of task and the same method of scoring each time so you can
legitimately compare scores across time and get a clear picture of progress.
gre78720_ch05_120-151.indd Page 133 4/21/09 7:32:51 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
BOX 5.1 T E A C H E R C O M M E N T S O N A D VA N TA G E S
OF MONTHLY CBM PRO GRESS MONITORING
Instructional influence. It reminds me to check on every- deficit. That’s really important because that’s how people
body. I think with a room full of kids it’s sometimes easy to nowadays understand facts. Some parents don’t take my word
get slack about it, but you really can’t. I can catch those mid- for it. You’ve got to have data.
dle of the road kids, you know, the average kids who don’t get
Impact of noticing progress. Sometimes it’s easy to get
a lot of attention. I can supplement and find out who needs
discouraged and not think anything you are doing is work-
enrichment and who needs reteaching in certain areas. It’s
ing, so it is a good measure for me. Day-to-day stuff isn’t as
good for me to know where my children’s strengths and
sensitive to growth. Even the lowest students make steady
weaknesses are. And I know for right now, vocabulary is a
little gains with CBM. For instance, I have had one special
real weakness for my class.
education student for two years. When she came to me she
Effects on parents. If I can show them that “other fourth started in the 30s and now she’s in the 80s. It’s a morale
graders are able to read the same passage and get at least 100 booster for both of us.
words correct and your child is reading 15,” they can see the
who are responsive to the intervention would continue to be monitored in the general
education classroom.
Because determination of responsiveness to intervention occurs in general educa-
tion rather than special education, all teachers should be familiar with this assessment
process. The potential benefits of RTI approaches include increased accountability for
all learners, reduced special education referrals, and decreased overidentification of
minority students for special education (National Joint Committee on Learning Dis-
abilities, 2005; Marston et al., 2003). Thus combining CBM with responsiveness to
intervention provides information to better serve students.
TA B L E 5 . 9
A Comparison of Mastery Monitoring and General Outcome Measurement
Focus for
Progress
Monitoring Graphic Display Content Advantages Concerns
with CBM and art (see Box 5.2). Similar procedures can be developed in other con-
tent areas. The key is to ensure that the task represents the desired outcome, that it
can be completed quickly and efficiently, and that you can develop a number of ver-
sions that are equivalent.
Finding those appropriate repeatable measures that represent important general
outcomes can be difficult, and this is one major concern with general outcome mea-
surement. Mastery monitoring, on the other hand, can be flexibly applied to more
gre78720_ch05_120-151.indd Page 136 3/21/09 11:55:17 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
B O X 5 . 2 S T U D E N T R E F L E C T I O N O N
C U R R I C U L U M B A S E D M E A S U R E M E N T
A P P L I C A T I O N S T O A R T
am thinking of ways to use the basic idea of CBM in art. about the same level of difficulty each time. I would use the
I One idea that comes to mind is that I could use it to track
progress of students’ drawing abilities over time for mastery
same rubric that focuses on shading each time and see how
scores increase. It would be even nicer if I had some of the
of specific skills. I would set up a still life with two or three same students year after year, and then it would really give an
items and ask students to draw it and then assess specific accurate measurement of their progress. The only drawback I
skills, such as shadows, values, etc. If I wanted to track shad- see is time constraints. I don’t think three minutes would be
ing abilities, I could assess shading skills at various times appropriate; however, ten to fifteen minutes might work. I’m
throughout the year. I could vary the still life each time, but eager to try it out!
keep the same number of items so the drawing would be
fields of study. Another concern with general outcome measurement has been the time
required, as it must be implemented in addition to ongoing assessment requirements
related to specific learning goals for units of instruction.
Systematically monitoring progress—whether using mastery monitoring or gen-
eral outcome measurement—has many advantages for both students and teachers,
described at the beginning of this chapter. We believe these advantages make system-
atic progress monitoring an important component of classroom assessment. Teachers
we know agree. For instance, Tracie Clinton, third-grade teacher at Cotton Belt Ele-
mentary School, uses both types of progress monitoring in her classroom and describes
in Box 5.3 what she sees as the advantages.
We urge you to begin progress monitoring by systematically choosing one learn-
ing goal or general outcome in your classroom, perhaps as part of an action research
project. Focus on a specific content goal and define your strategies for choosing assess-
ments, scoring them, and quantifying the outcomes to track across time. Your experi-
ences and your students’ reactions can help you decide how best to implement progress
monitoring to improve your students’ academic achievement. Recall from Chapter 1
that “Does this help my students learn?” is the overriding question for making these
or any other classroom decisions.
Frequency Distributions
Let’s say you just gave your first test in a class, and you are sitting at your desk with a
pile of graded papers with the following percent-correct scores for each student: 85, 20,
100, 90, 80, 10, 85, 60, 80, 95, 50, 75, 85, 70, 90, 75, 90, 85.
gre78720_ch05_120-151.indd Page 137 3/21/09 11:55:17 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
B O X 5 . 3 T H I R D G R A D E U S E S
A N D C O M M E N T S O N S Y S T E M A T I C
P R O G R E S S M O N I T O R I N G
On General Outcome Measurement charts and graphs. Reading CBM graphs is good practice for
with CBM: the students.”
“Using CBM written expression once a month does take
time, but it is worth it in the long run. My students enjoy On Mastery Monitoring:
seeing their progress throughout the year as they are able to “We use several other tools to measure progress over time. I
monitor their growth. The CBM results provide the students use a graph in Accelerated Reader to show progress over a
as well as myself with a sneak peek in the areas they need to nine-week period. The students place a sticker in their folder
improve in and the areas the students are doing well in. This every time they pass a test. I also use a similar process when
information is also valuable to parents during parent confer- looking at multiplication tables. Students move a tag accord-
ences. I try to use as many student work samples as possible ing to where they are on their tables. Also, as we move through
when talking with parents. The CBM is a great tool for this. the writing process on a long assignment, the students move
Also, the graphs are useful for students because they are vi- their tag according to where they are in writing. These proce-
sual. One of the standards in third grade is being able to read dures help with motivation as we see steady growth.”
How do you make sense of these scores? Did most students do well or poorly?
One of the first things you might do is reorder the tests from lowest to highest: 10, 20,
50, 60, 70, 75, 75, 80, 80, 85, 85, 85, 85, 90, 90, 90, 95, 100. Now you can quickly see
the lowest and highest scores attained on your test. The distance between these is
called the range. The range between scores in this example is 100 ⫺ 10 ⫽ 90 points. Range
The range is useful to begin to get a sense of how the class did. A wide range, as in The distance between the lowest
this case, suggests that at least one student (with a score of 10) has not mastered the and highest scores attained on an
assessment.
goals, and that at least one student (with a score of 100) has mastered them. A narrow
range, if the top score is high, suggests that most students did well. For example, if
the range for a test is 10 and the top score is 95, then grades for that test range from
85 to 95—quite strong scores! A narrow range, if the top score is low, suggests that
most students did poorly. Consider, for example, a test with a range of 10 and a top
score of 70. In this instance the range of grades is 60–70, which means we still have
some teaching to do.
Because the range takes only the top and bottom score into account, you prob-
ably want to explore in a little more depth the information you get from the group of
scores. A frequency distribution should be your next step. A frequency distribution Frequency Distribution
lists the number of students who attained each score ordered from highest to lowest A display of the number of students
scores. Table 5.10, section A illustrates the frequency distribution for the set of scores who attained each score in order
from lowest to highest.
with which we have been working. You can also group the scores into larger intervals
for a grouped frequency distribution, which is displayed in Table 5.10, section B. You
may recognize a grouped frequency distribution as similar to the ones that teachers
sometimes post for students to show the range of grades. We, of course, discourage
this practice of identifying students because it encourages students to compare them-
selves with others instead of focusing on their own score and progress.
Some teachers prefer a graph to a frequency distribution because it communi-
cates the information about the pattern of frequencies visually. In that case, you would Frequency Polygon
design a frequency polygon, which is a line graph in which the frequencies for each A line graph that plots the
score are plotted as points, and a line connects those points. The scores or score frequencies of each score.
gre78720_ch05_120-151.indd Page 138 3/21/09 11:55:17 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 5 . 1 0
Frequency Distributions for a Set of 18 Test Scores
(A) (B)
100 1 90–100 5
95 1 80–89 6
90 3 70–79 3
85 4 60–69 1
80 2 50–59 1
75 2 40–49 0
70 1 30–39 0
60 1 20–29 1
50 1 10–19 1
20 1 0–9 0
10 1
intervals are listed along the bottom of the graph, and the frequencies (used for plot-
ting number of students at each score) are listed up the left side. See Figure 5.6 for a
frequency polygon using the set of scores in Table 5.10.
You can see from examining the frequency polygon in Figure 5.6 that just look-
ing at the range of scores (90 points) does not tell you enough about the pattern of
scores for your students. You have two students who attained very low scores, but you
have eleven students who attained a score of at least 80 percent, which you believe
reflects a fairly high level of mastery, given how you constructed your assessment. If
this had been a preassessment before a unit, you would probably want to redesign your
unit to address more challenging goals based on this pattern of scores. If this assess-
ment were at the midpoint of a unit, you would want to analyze what content the
low-scoring students found difficult and work with them to master it.
10
6
Frequency
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Score
FIGURE 5.6 Frequency Polygon for the Set of 18 Test Scores in Table 5.10
the median is the value halfway between the middle two scores. In our example, we
have 18 scores, so the median would fall between the ninth and tenth scores. Since both
the ninth and tenth scores are 85, and there are no values between them, the median
would be 85. (If the middle two scores had been 75 and 85, however, the median would
be 80, or the value halfway between them.)
The measure of central tendency that takes into account all scores is the mean. Mean
This is the measure with which you may be the most familiar. The mean for the class A measure of central tendency that
is calculated by adding up every student’s score, then dividing by the number of scores. is the average of all scores in a
distribution.
In our example, 10 ⫹ 20 ⫹ 50 ⫹ 60 ⫹ 70 ⫹ 75 ⫹ 75 ⫹ 80 ⫹ 80 ⫹ 85 ⫹ 85 ⫹ 85 ⫹
85 ⫹ 90 ⫹ 90 ⫹ 90 ⫹ 95 ⫹ 100 ⫽ 1325. When you divide this total by 18, the number
of scores, the mean is 73.6. Because the mean is calculated using every score, it is used
most commonly as the measure of central tendency.
Notice that the mean score of 73.6 for our example is much lower than the
median score of 85. Scores that are extremely different from the others (in this case,
the low scores of 10 and 20) disproportionately influence the mean. In contrast, the
median is not influenced by extremely high or low scores (often called outliers) because Outliers
its calculation relies only on the scores at the middle of the distribution. Sometimes Extremely high or low scores that
teachers use the median instead of the mean for summarizing scores if an extreme differ from typical scores.
score distorts the mean. When reporting measures of central tendency for certain
figures, such as average national income, experts often use the median rather than the
mean because the outliers who make billions of dollars a year would considerably alter
the figures and give an erroneous impression. We discuss other aspects of score dis-
tributions (e.g., standard deviation) in Chapter 11.
TA B L E 5 . 1 1
Steps for Designing a Table That Communicates Information About
Student Performance
1. List data for each student in the column next to that student’s name (or student number,
when protecting confidentiality) with the scores in order from lowest to highest.
2. Give your table a title that communicates the content of the table.
3. At the top of each column, indicate what that list of scores represents. Include any
information needed to interpret a score (e.g., number correct, percent correct, number
of items).
4. As the last entry for each column, provide a measure of central tendency (mean or
median).
tool for class-wide data. The steps in designing a good table are listed in Table 5.11. We
want to walk you through these steps one by one so you see the rationale for each aspect
of table design. We have found in the past that many of our students do not realize how
important each element of a table is for communicating information effectively.
A table allows you to visually represent relationships between different pieces of
information. For the table to be useful, however, it needs to communicate what those
relationships are. For example, some of our teacher candidates, when looking at the
performance of their students, have designed a table that looks like this:
Eleanor 30 84
JaMarcus 10 66
Toby 10 50
Sam 20 60
LaKendra 40 95
Leah 50 90
William 40 84
Carol 50 90
LaTrellini 60 86
Buzz 40 45
We can see the relationship between each of the students and their scores, but we don’t
know what the scores represent, and we can’t easily discern any pattern in these scores.
The first step is to put the scores in some order. One common strategy is to put the scores
in order from lowest to highest. We can do this for the first column of scores:
gre78720_ch05_120-151.indd Page 141 3/21/09 11:55:20 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
JaMarcus 10 66
Toby 10 50
Sam 20 60
Eleanor 30 84
LaKendra 40 95
William 40 84
Buzz 40 45
Leah 50 90
Carol 50 90
LaTrellini 60 86
This step provides order to the scores so you can quickly scan the range of scores and
see whether they cluster together or tend to cover a wide spectrum. You can see that
the scores in the first column have a 50-point range with a top score of only 60.
The table also needs a title to communicate what information is presented. Care-
fully labeling what each column of scores represents also provides important back-
ground for understanding what the scores can tell you:
JaMarcus 10 66
Toby 10 50
Sam 20 60
Eleanor 30 84
LaKendra 40 95
William 40 84
Buzz 40 45
Leah 50 90
Carol 50 90
LaTrellini 60 86
As you can see, this new information provides context for interpreting the scores. We
now know that the scores are from a specific type of unit in a social studies class. We
also know that the scores show the percent correct rather than the number of items
correct. We understand that the scores are all quite low on the preassessment because
it was completed before any of the material was taught. We can also tell that the teacher
would expect progress between the first assessment and the second one. Another piece
gre78720_ch05_120-151.indd Page 142 3/21/09 11:55:20 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
of information that would be useful, however, would be to know the total number of
items in each assessment. When you have only percentages listed, you need to explain
what the percentages represent. Was the midterm a major test or more like a quiz? What
about the preassessment? Adding those details communicates more useful information:
JaMarcus 10 66
Toby 10 50
Sam 20 60
Eleanor 30 84
LaKendra 40 95
William 40 84
Buzz 40 45
Leah 50 90
Carol 50 90
LaTrellini 60 86
We now know that the second assessment had many more items than the first, so it
was likely a fairly major assessment. We can look at students’ scores individually to see
whether any made gains, but the table could add means, the measure of central ten-
dency that averages every score, so we can look at the class as a whole:
JaMarcus 10 66
Toby 10 50
Sam 20 60
Eleanor 30 84
LaKendra 40 95
William 40 84
Buzz 40 45
Leah 50 90
Carol 50 90
LaTrellini 60 86
Class mean 35 75
gre78720_ch05_120-151.indd Page 143 3/21/09 11:55:21 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
From this addition to the table, the scores suggest that the class as a whole has made
impressive progress between the preassessment and the midterm.
Under the sorting model of education, teachers might look at the gains in this
table and feel quite confident that the unit was progressing well. The difference in
means between the preassessment and the midterm seems to show that the class as a
whole has benefited from the teacher’s instruction. The students seem to be making
good progress toward the learning goal.
However, the more recent model that emphasizes helping all children to learn
asks us to dig a little deeper. Did some children start with less background knowledge
that might prevent them from learning all they could from this unit? Did some children
make less progress than others? Can we discern any patterns in these results that will
help us differentiate our instruction for the rest of the unit so we can teach the students
more effectively? Because we want to explore any potential patterns in score differences,
we will start by examining more closely the preassessment scores from lowest to high-
est to see whether any relationship between scores and students emerges.
First, you may notice that the top three scorers were girls, and the lowest three
scorers were boys. You decide you need to explore this possible relationship between
gender and scores more thoroughly. One way to do so would be to order the scores
from lowest to highest for each gender separately. Then, to get a sense of where each
group falls in relation to the other, you can calculate means for each gender. Add this
information to your table:
JaMarcus 10 66
Toby 10 50
Sam 20 60
William 40 84
Buzz 40 45
Boys’ mean 24 61
Girls
Eleanor 30 84
LaKendra 40 95
Leah 50 90
Carol 50 90
LaTrellini 60 86
Girls’ mean 46 89
Class mean 35 75
Now you see a very strong relationship between gender and scores. On both the
preassessment and the midterm, boys performed less well than girls. When you just
looked at the mean for the whole class, this relationship was hidden because boys’ and
girls’ scores were averaged together.
gre78720_ch05_120-151.indd Page 144 3/21/09 11:55:21 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
From this example, you can also see why examining your preassessment scores
before you teach a unit can help you plan your unit and differentiate your instruction.
The preassessment showed that some of the boys may have needed some different types
of activities to provide background and engage their interest—perhaps a mini-lesson
on important vocabulary and an ongoing journal assignment to describe the daily life
of revolutionary soldiers. Similarly, several girls who showed they had already mastered
some of the content could use varied activities as well, such as delving into some of
the facets of revolutionary war culture (e.g., music, popular books, gender roles) that
influenced the events of the war. Even after a midterm, of course, the information from
the scores in the table on these gender differences can be used to differentiate your
instruction to help the boys catch up and to enrich the girls’ learning. All of these
suggestions would, of course, be tied to your learning goals for the unit.
TA B L E 5 . 1 2
Disaggregated Student Achievement Data for Language Arts Scores
on One Test
Third Grade Fourth Grade
Need Need
Advanced and Improvement Advanced and Improvement
Proficient and Failing Proficient and Failing
European 65% 35% 55% 45%
American
students. This table shows what percentage of each group, regardless of the size of that
group, has met the standard. This procedure allows clearer comparisons among groups.
Reporting percentages by groups ensures that the smaller groups don’t get
neglected. If all 900 students had been averaged together, the high scores of the 600
European American students would have had the most influence on the total percent-
ages of advanced and proficient students. We would not have seen the discrepancies
between their scores and the scores of the other two groups.
At this point we offer a caution. Although these disaggregated test scores show
average differences for European American students and African American students, your
instructional decisions must be made for individual students, not for groups. Some of your
African American students may be at the Need Improvement and Failing levels, but
others will likely be at the Advanced and Proficient levels. The same thing can be said for
your Latino students and your European American students. Disaggregation can help
sensitize us to the needs of our students, but it should not be used indiscriminately.
A persistent pattern of lower achievement among minorities was one impetus
behind the No Child Left Behind Act (NCLB) passed by the U.S. Congress and imple- No Child Left Behind Act
mented nationwide since 2002. The goal was to increase academic achievement of lower (NCLB)
performing groups and to have all students in the United States achieve proficiency in U.S. federal law aimed to improve
public schools by increasing
math and reading by 2014. Under NCLB, each state accepting federal assistance is accountability standards.
required to assess students in math and reading in grades 3 through 8 each year and
once in high school. (Science assessment requirements were also added in 2007–2008.)
Public reporting of these results must be disaggregated by race, ethnicity, gender, dis-
ability status, migrant status, English proficiency, and status as economically disadvan-
taged. If any of the groups fail to make annual yearly progress (AYP), which is determined
by each state, sanctions follow. We discuss NCLB at greater length in Chapter 11.
The stated purpose of disaggregation of achievement data required by NCLB has
been to increase accountability of schools and teachers. Teachers and principals now
pay close attention to the achievement of all groups. They have begun examining more
carefully the patterns of achievement of their students using many assessment measures
in classrooms. Clues from disaggregation of classroom assessment can help teachers to
close the achievement gap by identifying students who lag behind. Some schools, such
as J.E.B. Stuart Elementary in Richmond, VA, have weekly grade-level meetings where
teachers look at disaggregated data on attendance, teacher observations, and various
assessments for low achieving groups as a diagnostic tool to provide students who are
behind the help they need to do better.
Sometimes data disaggregation can show that some of our assumptions may not
be true about low-achieving groups. For example, the staff at one high school decided
to examine their assumption that low achievement on state standardized tests was
caused by high student absence rates. They disaggregated these test scores by attendance
rates (high attendance vs. low attendance). They did find that a large percentage of
students who were absent often did poorly. But they also found that an equally large
percentage of students who had high attendance also did poorly on the test. This find-
ing led to useful discussions about ways to improve the content and quality of their
instruction. They began to look for other causes over which they had more influence
than absences as reasons for their school’s poor performance (Lachat & Smith, 2005).
goal, Ebony had each student explore a solid and a liquid in zipper plastic bags inside a
“feely box,” where they could touch objects but not see them. Students then made a list of
descriptive words corresponding to each test they could think of to perform on the object
(e.g., does it have a shape?). After instruction and exploration with a number of materials,
she had students come up with rules for why one item is a liquid and one is a solid so
they could later do their own specific tests on unknown objects whenever they wanted.
For example, rules for one group included “If it has a definite shape in a cup but not in
a zipper bag, it must be a liquid.” “If it doesn’t change shape, it’s a solid.” Then, partway
through the unit, as a measure of progress she repeats the “feely box” assessment with a
new liquid and a new solid. This second trial served as a mid-unit check on progress.
Now we can compare these results with the preassessment results, which are both
displayed in Table 5.13. In this table, Ebony added the findings for liquids and solids
together for each assessment, so the total possible number correct is 11. First note that
two students withdrew from school before the mid-unit assessment was conducted.
TA B L E 5 . 1 3
Preassessment and Mid-unit Assessment Results for One Learning Goal
for Unit on Characteristics of Liquids and Solids
Pre-unit and Mid-unit Assessment Results for Learning Goal Students Will
Demonstrate Appropriate Steps to Test an Unknown Solid or Liquid
Percent Correct on Percent Correct on Points Gained
Preassessment Mid-unit Assessment Between Preassessment
(11 possible) (11 possible) and Mid-unit Assessment
Student 8 0% 64% 64
Student 10 0% 54% 54
Student 11 0% 27% 27
Student 15 0% withdrew withdrew
Student 16 9% 45% 36
Student 12 9% 64% 55
Student 5 9% 45% 36
Student 2 18% withdrew withdrew
Student 3 18% 36% 18
Student 9 18% 54% 36
Student 6 18% 82% 64
Student 17 27% 45% 18
Student 4 27% 45% 18
Student 13 27% 36% 9
Student 14 27% 82% 55
Student 7 45% 73% 28
Student 1 54% 27% −27
Class mean 18% 51.9% 32.7
gre78720_ch05_120-151.indd Page 147 3/21/09 11:55:21 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
The school where Ebony worked has a high rate of student mobility, so the composi-
tion of classes changes often because students transfer in and out.
Now let’s look at progress across time. Ebony’s preassessment measure and her mid-
unit measure were identical, except that she used a different liquid and a different solid each
time. Because the measures were so similar, she believed that comparing scores on the two
measures would let her know what students had learned so far on this learning goal. If you
look at the class means for preassessment and mid-term assessment, you see an average
gain of over 30 percentage points. This definitely shows progress for the whole class.
But as we saw in a previous example, sometimes averages for the whole class
mask important differences among groups. Ebony decided to disaggregate these scores
by comparing students who qualified for free or reduced-fee lunch (one measure of
lower income and socioeconomic status) with students who paid full price for lunch.
She did not include the two students who withdrew from the school in this analysis.
(One came from each group.) You can see her findings in Table 5.14.
Notice that both groups were within three points of each other on the preassess-
ment, a relatively insignificant difference. But you see a much larger disparity on the
mid-unit assessment—more than a 20-point difference. These findings suggested to Ebony
that she needed to look more closely at differences in what students had learned so far
so she could decide how to close this achievement gap. She realized she needed to attend
to these students more closely as she finished the unit, especially students 1 and 11. Were
there certain rules about the characteristics of liquids or solids they had not yet grasped?
Was some of the vocabulary of the unit difficult because of lack of previous experience?
For example, one test she introduced students to for examining solids was magnetism.
Did all students have necessary background knowledge about magnets?
One bright spot was that Student 8 and Student 10, who had writing difficulties,
were able to perform in the average range at mid-unit. She had accommodated these
two students, after consulting with her school’s occupational therapist, by letting them
use pencil grips that were easier to grasp and paper with raised lines. (She could have
let these students report answers orally, but she wanted to be sure to work on students’
weaknesses and not just rely on their strengths.) She had also worked with Student 11,
the remaining student who was learning English (the other had withdrawn), by provid-
ing more visual examples, pairing her in activities with native speakers, and providing
mini-lessons on unit vocabulary.
Now we can revisit some of our themes in relation to Ebony’s experience, first stress-
ing the value of critical thinking at higher levels of Bloom’s taxonomy. Ebony has first
graders examine unfamiliar liquids and solids and apply rules to classify them, demonstrating
that even young children can be expected to go beyond memorizing facts. She also uses her
assessments to promote active rather than passive learning among the students by provid-
ing for feedback and subsequent improvement before summative assessment. This also
encourages the students to develop effort optimism and take charge of their learning.
In addition, by repeating the same task with different liquids and solids, Ebony
uses a type of general outcome measurement. We hope you have learned from Ebony’s
case study that carefully analyzing and disaggregating your assessment data can yield
important information that will enhance your teaching and help you move toward
closing achievement gaps. As we mentioned in Chapter 1, many people just shut down
when they see numbers or data. But as you have seen with Ebony’s assessment data,
the numbers summarizing student responses yielded important information that
provided a window on her students’ needs. She knew that even though her lower scor-
ing students had made gains, they needed additional assistance to master the learning
goal. The only way to maximize achievement is to hold high expectations for all stu-
dents and not settle for measures of achievement that mask children left behind.
gre78720_ch05_120-151.indd Page 148 3/21/09 11:55:21 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 5 . 1 4
Preassessment and Mid-unit Assessment Results Disaggregated
by Lunch Status
Pre-unit and Midterm Assessment Results for Learning Goal Students Will
Demonstrate Appropriate Steps to Test an Unknown Solid or Liquid
Free/Reduced Lunch Percent Correct on Pre-unit Percent Correct on Mid-unit
Students: Assessment (11 items) Assessment (11 items)
Student 10 0% 54%
Student 11 0% 27%
Student 5 9% 45%
Student 12 9% 64%
Student 3 18% 36%
Student 4 27% 45%
Student 13 27% 36%
Student 1 54% 27%
Free/Reduced-Fee 18% 41.7%
Lunch Students’ Mean
Regular Fee Lunch
Students:
Student 8 0% 64%
Student 16 9% 45%
Student 9 18% 54%
Student 6 18% 82%
Student 17 27% 45%
Student 14 27% 82%
Student 7 45% 73%
Regular-Fee Lunch 20.6% 63.6%
Students’ Mean
Class mean 18% 55%
As an additional note from this case study, you may get the idea that one assess-
ment is all that you need to do. This would be a misconception. How many times have
you heard the old adage, “Don’t put all your eggs in one basket”? It definitely applies to
assessment. You need more than one assessment “basket” because students have different
strengths and weaknesses. For example, you may be terrible at multiple-choice questions,
but you shine on essay questions. How would you feel if one of your professors gave only
one assessment—a multiple-choice final exam? As a teacher, you want to give your stu-
dents opportunities to work to their strengths, but you also must give them opportunities
to improve their weaknesses. Helping students work on content and strategies they have
difficulty with is crucial for closing achievement gaps. Also, different kinds of assessments
work best for detecting different types of learning. In Chapter 6 you will learn more
gre78720_ch05_120-151.indd Page 149 3/21/09 11:55:22 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
about why multiple assessments are needed when we discuss validity and reliability of
assessments. Then, in Chapters 7, 8, and 9 you will learn more about different methods
of assessment, their strengths and weaknesses, and when to use them.
HELPFUL WEBSITES
http://www.studentprogress.org/
The National Center on Student Progress Monitoring website answers questions and provides
information, resources, a free newsletter, discussion board, and training opportunities for
those interested in curriculum-based measurement.
http://ed.sc.gov/topics/assessment/scores/pact/2007/statescoresdemo.cfm
Most state departments of education post disaggregated scores from their statewide achievement
tests on a website. You can examine these scores to find the types and severity of the
achievement gaps in your state. This website features scores from the Palmetto
Achievement Challenge Test in South Carolina.
2. Describe a situation (academic or not) when you set a goal and monitored your
progress toward that goal. Would you categorize the process as closer to mastery
monitoring or to general outcome measurement? Why?
3. Choose a learning goal in your content area for which you would like to set up
a progress monitoring system. Using your understanding of the advantages and
concerns for both mastery monitoring and general outcome measurement,
describe the progress monitoring system you would design, including a visual
representation, and justify your approach.
4. Think of a specific, short-term goal you would like to achieve in your personal
life and describe what benchmarks you would use to monitor progress toward it.
What visual representation of progress would you use and why?
5. Describe the similarities and differences between a frequency distribution and a
frequency polygon.
6. Describe a situation when you would want to use a frequency distribution or a
frequency polygon to examine scores of students in one of your classes.
7. Choose a learning goal in your content area and assume that the following set of
scores represent student performance in your class related to that learning goal.
Develop a frequency distribution, a grouped frequency distribution, and a frequency
polygon for the preassessment scores. You may use a spreadsheet software program.
8. Now calculate the mean, median, and mode for all 12 students for the pre-
assessment. What differences do you see between mean and median? Why?
9. Now revise your frequency distribution by disaggregating the preassessment data
by gender. Then make a new frequency distribution disaggregated by lunch
status. Interpret what you have found in these analyses for the whole group, the
gender groups, and the lunch groups. Do you see any indications that some
students may not have the background knowledge required for the unit? What
implications do the preassessment data have for your instructional planning?
What would you do next, given the content of your learning goal?
gre78720_ch05_120-151.indd Page 151 3/21/09 11:55:22 PM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
References 151
10. One student attained a score of 100% on the pre-unit assessment. What actions
should you take to differentiate instruction for this student given the content of
your learning goal?
11. Now design a frequency distribution for the mid-unit assessment data for the
whole class; for the gender groups; and for the lunch groups. Calculate means
and medians for all groups. What implications do the mid-unit assessment data
have for your instructional planning for the rest of the unit?
12. Describe at least three ways systematic progress monitoring can help you
address achievement gaps in your classroom.
REFERENCES
Baker, S. K., K. Smolkowski, R. Katz, H. Fien, J. Seeley, E. Kame’enui, and C. T. Beck. 2008.
Reading fluency as a predictor of reading proficiency in low-performing high-poverty
schools. School Psychology Review 37: 18–37.
Bandura, A. 1997. Self-efficacy: The exercise of control. New York: Freeman.
Deno, S. L. 1985. Curriculum-based measurement: The emerging alternative. Exceptional
Children 52: 219–232.
Deno, S. L. 1997. “Whether” thou goest . . . Perspectives on progress monitoring. In E. Kame’einuii,
J. Lloyd, and D. Chard (eds.), Issues in educating students with disabilities. Saddle River,
NJ: Erlbaum.
Deno, S. L. 2003. Developments in Curriculum-Based Measurement. Journal of Special Education
37: 184–192.
Lachat, M. A., and S. Smith. 2005. Practices that support data use in urban high schools. Journal
of Education for Students Placed at Risk 10: 333–349.
Locke, E., and G. Latham. 1990. A theory of goal setting and task performance. Englewood Cliffs,
NJ: Prentice Hall.
Locke, E., and G. Latham. 2002. Building a practically useful theory of goal setting and task
motivation: A 35-year odyssey. American Psychologist 57: 705–717.
Marston, D., P. Muyskens, M. Lau, and A. Canter. 2003. Problem-solving model for decision
making with high incidence disabilities: The Minneapolis experience. Learning
Disabilities: Research and Practice 18: 187–200.
Marzano, R. J. 2006. Classroom assessment and grading that work. Alexandria, VA: Association
for Supervision and Curriculum Development.
Moon, T. 2005. The role of assessment in differentiation. Theory into Practice 44(3): 226–233.
National Research Center on Learning Disabilities. 2005. Understanding responsiveness to
intervention in learning disabilities determination. Retrieved October 10, 2007, from
http://nrcld.org/publications/papers/mellard.shtml.
Reeves, D. B. 2004. The 90/90/90 schools: A case study. In D. B., Reeves, Accountability in action:
A blueprint for learning organizations. 2nd ed. Englewood, CO: Advanced Learning Press.
Shinn, M. R. 1989. Curriculum-based measurement. New York: Guilford.
Shinn, M. R. 1998. Advanced applications of curriculum-based measurement. New York: Guilford Press.
Stecker, P., L. Fuchs, and D. Fuchs. 2005. Using curriculum-based measurement to improve
student achievement: Review of research. Psychology in the Schools 42: 795–819.
Trammel, D., P. Schloss, and S. Alper. 1994. Using self-recording, evaluation, and graphing to
increase completion of homework assignments. Journal of Learning Disabilities 27(2): 75–81.
Urdan, T., and E. Schoenfelder. 2006. Classroom effects on student motivation: Goal structures,
social relationships, and competence beliefs. Journal of School Psychology 44: 331–349.
Zimmerman, B. 2002. Becoming a self-regulated learner: An overview. Theory into Practice
41(2): 64–70.
Zimmerman, B. 2008. Investigating self-regulation and motivation: Historical background,
methodological developments, and future prospects. American Educational Research
Journal 45: 166–183.
gre78720_ch06_152-183.indd Page 152 3/22/09 12:22:07 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
152
gre78720_ch06_152-183.indd Page 153 3/22/09 12:22:10 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 6
ESSENTIAL CHARACTERISTICS
OF ASSESSMENT
INTRODUCTION
We begin with a quotation that was widely used by technicians in the early days of
computers to caution that if the data entered into a computer program are prob-
lematic or nonsensical, the computer output will also be meaningless. Just because
you use a computer doesn’t mean it can magically transform bad raw material into
a wonderful finished product. Now the phrase is used in many situations where the
quality of the “input” will affect the value of the “output.”
Classroom assessment is a prime example. If you design a poor assessment
(the input), you could still use its results to assign grades (the output) that seem
official and objective. But you will really end up with “garbage” that does not tell
you what you need to know about what students are learning. One of our students
describes just such a situation: “In my freshman year of high school, my geom-
etry teacher taught mostly triangles and the Pythagorean Theorem. However, for
our final grade, he handed out directions to make a 3-dimensional dodecahedron.
The key to passing the project was not to know anything about triangles, but to
use sturdy paper so that when he dropped your dodecahedron on the floor, it
wouldn’t fall apart. If your dodecahedron fell apart . . . so did your grade.” This
teacher was able to put a score in the grade book for each student, but what did
it really mean?
153
gre78720_ch06_152-183.indd Page 154 3/22/09 12:22:10 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Ethics Alert: This is an example of score pollution. The teacher is using a task and criteria
that have little relationship to the course subject matter.
In this chapter we discuss concepts that have a strong impact on the quality
of the input—the assessment opportunities you provide your students. These con-
cepts include reliability, validity, bias, and representing diversity in your assessments.
If you take great care with these concerns as you design your assessments, you will
ensure that the output—the scores resulting from the responses of your students—
will be of high quality and will therefore provide you with meaningful information
about student learning.
CRONSISTENT
ELIABILITY: ARE WE GETTING
INFORMATION?
Reliability The first aspect of assessment that we want to consider is reliability. Reliability is the
The degree to which scores on an degree to which scores on an assessment are consistent and stable. A reliable assess-
assessment are consistent and ment will produce roughly similar scores for a student even if the assessment is taken
stable.
on different occasions, if it has slightly different items, or if it happens to be graded
by a different person. The results are reproducible and not a fluke. If you do find sig-
nificant differences in scores across these situations, they are caused by some form of
Error error. Error is the element of imprecision involved in any measurement. Error can
The element of imprecision change a score in either direction from where it would actually fall. It can boost a score
involved in any measurement. higher or decrease it. The more error, the lower the reliability of the score.
Knowing that every single score has an error component leads us to the logical
conclusion that no important decision about any student should be made on the basis of a
single score. This conclusion is not unique to us; professionals in assessment have long
understood the role of error and the need to make decisions based on multiple assess-
ments. For example, most college admissions decisions are based on several factors, such
as high school grade-point average, letters of reference, SAT or ACT scores, and an essay.
Sources of Error
In thinking about error and testing circumstances, one of our students shared with us
her experience about taking the SAT for the first time. The heat wasn’t working in the
scheduled classroom, and so the group was moved to the gym. The proctor used the
time clock on the scoreboard to keep track of the minutes remaining for the test.
Unfortunately, this clock loudly ticked off each second. To top everything off, the proc-
tor had a big bunch of keys that jingled loudly with each step as he constantly patrolled
the room. As you might imagine, this student’s score was lower than she had expected,
and it increased by 150 points the second time she took the SAT. Her experience
illustrates error due to the occasion of testing. Factors from the assessment occasion
that can introduce error into scores include environmental sources shown in Table 6.1.
In the case of our student (and probably others who took the test at the same time)
these sources had a negative impact on scores.
Assessment occasion is only one source of error influencing scores. Other sources
of error are listed in Table 6.1. These include items as well as scoring issues. We discuss
each one in turn.
gre78720_ch06_152-183.indd Page 155 3/22/09 12:22:10 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 1
Potential Sources of Error
Source Examples
Assessment Occasion
The SAT example with the ticking clock is a perfect illustration of how environmental
distractions related to the specific occasion of an assessment can introduce error into the
scores that students receive. You can see how noises from the clock and the keys as well
as the disruption of moving from room to room could have distracted our student,
producing a poorer score for her. Some variations in assessment occasion, however, do
not occur during the assessment, but can still have an impact (negative or positive), such
as class parties. We know one high school teacher who never schedules important sum-
mative assessments during prom week because of the increased potential for error.
Error in scores due to the occasion of the assessment is also influenced by the
respondent, or the student completing the assessment. As you know from your expe- Respondent
rience, students vary from day to day and even from moment to moment in their The student completing the
interest in the concepts and skills being introduced, their mood, their level of energy, assessment.
whether they are sick or healthy, their degree of carelessness, their level of motivation,
and how their relationships with family and other significant people are going, to name
a few issues. These individual variations also contribute to errors in assessment out-
comes. If teachers use more frequent assessments, errors due to individual variation
average out across time with many assessments.
Assessment Items
Another aspect of an assessment that contributes to reliability is the pool of items.
Reliability is increased to the degree that the items display internal consistency. Internal Consistency
Internal consistency refers to the degree that all the items in an assessment are related The degree to which all of the
to one another and therefore can be assumed to measure the same thing. Internal items in an assessment are related
to each other.
consistency is high if all items are strongly related to each other. Internal consistency
is low if they are not strongly related to each other.
Also important to reliability is the number of items. Let us exaggerate for a
moment to show the importance of the number of items. What if your professor were
to create a summative, final examination for this assessment course with only one ques-
tion? The one item may address any concept covered in Assessment Is Essential. Are you
comfortable with a one-item test assessing everything you learned this semester? What
if the test had five items? Are you comfortable yet? What if the test had from 35 to 40 items?
Intuitively, you realize that a one-item or five-item test might fail to capture all the ideas
that you’ve learned this semester. In contrast, you see that a 40-item test that is based
gre78720_ch06_152-183.indd Page 156 3/22/09 12:22:11 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
on the learning goals associated with the text chapters could be a fairer assessment of
your knowledge gained in the course. Thus, in considering consistency in test scores,
use enough items to adequately cover the concepts learned.
Errors related to this type of evidence for reliability will not be problematic in
classrooms if teachers assess often. Teachers usually gather information so frequently
that any random error resulting from one assessment is likely to be quickly corrected
with the next, which often occurs the very next day.
Scoring Issues
A third factor that contributes to reliability involves the way assessments are scored,
particularly when scoring requires careful judgment. Have you ever been in a class
where you compared tests with someone next to you and found that they got more
points for an answer than you did, even though the answers seemed almost identical
to you? Sometimes teachers may not carefully delineate the criteria they are using to
score answers, and so errors in judgment can result. If the criteria for grading are not
as explicit as possible (to both teacher and students), all sorts of extraneous factors
that have nothing to do with quality of the student performance will influence the
score. Significant error will be introduced to the actual scores students deserve. For
example, if a teacher grades some papers at night when she is tired, she might not
notice that several students left out an important point. This would be particularly
likely to occur if she had not specified that the idea was worth a specific number of
points when she allocated points in her scoring guide before she began grading any
papers. Statement of explicit criteria in the scoring guide, such as accuracy and num-
ber of supporting facts, presence of particular kinds of arguments, or adherence to
conventions of written language, make these types of problems less likely. Even when
teachers are tired or distracted, if they have spelled out exactly what they are looking
for, they will more likely be consistent in their scoring. You will learn more about
establishing scoring criteria in Chapters 8 and 9.
Interrater Reliability Interrater Reliability Interrater reliability is the measure of the consistency between
A measure of the degree of two raters. When interrater reliability is high, the two raters strongly agree on the score
agreement between two rates. that should be assigned to the same piece of work. When interrater reliability is low,
the two raters disagree on what score should be assigned. We once had a student who
suffered firsthand from interrater reliability problems. Her high school social studies
teacher had everyone in the class rate each student’s oral presentation on the connec-
tion between a current event and a historical event. Each student’s final score for the
project was to be the sum of 20 different peers’ scores. This student had recently bro-
ken up with a boyfriend. The boyfriend convinced eight of his friends in the class to
assign his former girlfriend’s presentation an “F.” With the inconsistency between these
scores and other students’ ratings, the student ended up with a “D” on the project. The
teacher did not take into account the degree of error involved, so the student’s grade
did not reflect her level of mastery.
In contrast, most teachers know how important consistency across raters can be
for making accurate judgments about student work. We have worked in several schools
gre78720_ch06_152-183.indd Page 157 3/22/09 12:22:11 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 2
Reliability Questions for Classroom Assessment
Have I ensured that the assessment occasion is free of distractions?
Have I designed an assessment that gathers sufficient information for the decisions to
be made?
Have I developed scoring criteria that are clear and unambiguous for both myself and
my students?
Have I compared my scoring standards with other teachers who teach the same content?
in which a common scoring guide is used for writing. In these schools, teachers make
an effort to check out their degree of interrater reliability. They practice scoring papers
that range in quality from several students, and then they discuss any differences in
their resulting scores in hopes of improving their consistency. These teachers want to
make sure that they agree on the standards of quality and what the scores for different
levels of work represent. They want to reduce errors in their scoring and increase their
reliability so that they provide the most precise possible representations of their stu-
dents’ writing quality.
Sufficiency of Information
Recently, educators in the area of classroom assessment have suggested that for the
classroom teacher, reliability may be defined as “sufficiency of information” (Brookhart, Sufficiency of Information
2003; Smith, 2003). These authors suggest that sufficiency of information means that Ensuring the collection of adequate
the teacher needs enough (that is, sufficient) information from any assessment to make data to make good decisions.
a good decision based on the results. With a formative assessment, the teacher must
have enough information to make good decisions about the next step in instruction.
With a summative assessment, the teacher must have enough information to be able
to determine the level of mastery of the learning goals for each student. Usually, your
confidence in the results of your assessment and the reliability of your assessment
increases as the number of items and occasions increases.
An analogy that we heard once from one of our professors is helpful in under-
standing the idea of increasing the sample to increase sufficiency of information (and,
therefore, reliability). Assume that a friend asks you how the food is at a local restau-
rant. If you have eaten several meals there, you will be more confident of your recom-
mendation than if you have eaten there only once. This is because you have more
sufficient information. Similarly, you will feel more confident about making decisions
about student understanding of new concepts and skills if you use a sufficient sample
of items related to the learning goals on different assessment occasions. See Table 6.2
for a list of reliability questions for classroom assessment.
ICMPROVING RELIABILITY IN
LASSROOM ASSESSMENTS
To finish our discussion of reliability, we have a few suggestions about ways to improve
reliability in classroom assessments. These suggestions are listed in Table 6.3. Note that
a common theme is the importance of providing many opportunities for students to
demonstrate what they have learned.
gre78720_ch06_152-183.indd Page 158 3/22/09 12:22:11 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 3
Suggestions for Improving Reliability in Classroom Assessments
Suggestions Examples
Assessment Occasion • Ensure that the • Avoid scheduling important
assessment occasion is assessments at the same time
free of distractions. as other important events.
• Provide multiple assessment
opportunities.
The conclusion that you should provide many opportunities for assessments is a
logical extension of your understanding that every score contains error. The more
assessment opportunities you offer your students, the more likely you are to have a
reliable picture of their performance. Across time, different sources of error cancel each
other out. For example, in a chemistry class on a unit about energy transformation,
you could give students (1) a set of problems calculating the energy required for several
transformations, and (2) a multiple-choice quiz on the three types of transformations
of matter. Some students may make careless errors on the calculation problems that
lower their score. But on the multiple-choice items they make a few lucky guesses, and
the error contribution instead raises their score. One assessment occasion has produced
negative error and the other has produced positive error. When you average these two,
you are likely closer to an accurate score than the score on either single assessment
opportunity. The more scores you have, the more stable is the resulting average. The
fewer scores you have, the more likely the error component is large. Box 6.1 describes
B OX 6 . 1 E XC E R P T O N R E L IA B I L I T Y
C O N C E R N S F R O M T E AC H E R I N T E RV I E W
“. . . This fourth-grade teacher and her team do a number of so they gather information on interrater reliability. This
things to ensure reliability. When they have open-ended teacher also uses several test questions to assess each skill so
questions on a test, all three teachers in the grade read all that the possibility of being lucky or unlucky is removed.”
students’ answers and scores from each teacher are compared
gre78720_ch06_152-183.indd Page 159 3/22/09 12:22:11 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
one teacher’s approach to reliability in her classroom assessments that reflects these
points. And, to conclude our discussion of reliability, we reiterate the point made
earlier in the chapter: No important decision about any student should be made on the
basis of a single score.
TA B L E 6 . 4
Validity Concerns Depend on the Purpose of the Assessment
Formative Summative
Degree to which the practice supports the Degree to which the interpretation of the
learning required to master classroom results provides accurate evidence about
learning goals achievement of the classroom learning goals
Attitudes About
Construct
Universal Healthcare
Potential Measures
Attitudes toward Level of factual
Medicare and knowledge about
Medicaid healthcare alternatives
Construct-related
evidence
Inferences
Valid inference
No construct-
related evidence
FIGURE 6.2 Process of Moving from Measure to Valid Inference About a Construct
gre78720_ch06_152-183.indd Page 163 3/22/09 12:22:12 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 5
Sources of Construct-Related Evidence for Validity
Evidence based on logic and common sense
Content-related evidence
Criterion-related evidence
connections to geometry proficiency. The manual dexterity required and the quality
of materials (e.g., whether the paper was sturdy) are two factors involved that clearly
have absolutely no relationship to geometry. These factors did, however, weigh heavily in
whether students achieved success on the task.
Another common factor that can detract from construct evidence for validity is
reading proficiency. Many assessments aimed at constructs such as history, or even
mathematical skill, may require strong reading-comprehension skills that are irrelevant
to the construct but can strongly influence scores. When assessments require skills or
knowledge clearly irrelevant to the construct being measured, those assessments fall
at the lower end of the validity continuum.
Additional elements of an assessment that may detract from construct validity
include unclear directions, confusing items, unusual formats, and typographical errors.
Each of these factors clouds score interpretation; did the student do poorly because he
didn’t understand the directions or because he did not learn the material? For example,
one student we know misread two questions on a recent test because they both included
the word “not” and he didn’t notice the use of the negative (e.g., “Which of the follow-
ing is NOT an amphibian?”). If the test had included only positively worded questions,
his score likely would have been higher. In later chapters we provide guidelines to help
you in the writing of clear directions and items. In addition, we have stated that any
format that a teacher plans to use in the assessment of learning goals should also be
part of instruction, thus avoiding the problem of unusual item formats.
Variety in Formats
Another common problem in the classroom that can reduce our confidence that an assess-
ment fully addresses the relevant construct is a reliance on a single assessment format.
You can see this in the dodecahedron assessment example as well. The final grade in that
geometry class had only one task, and this task was unrelated to the activities students
had engaged in all semester, such as solving problems related to the Pythagorean Theorem.
One kind of construct-related evidence for validity, then, is checking on whether a sum-
mative assessment contains multiple assessment formats capturing the breadth and depth
of the content addressed. Formative assessments, of course, may often have a single format
if they address only a small portion of the knowledge and skills in a unit.
Another factor in considering formats is that some students vary in their preference
for and competence with different formats. For example, you know people who always
do better with multiple-choice tests, whereas others prefer sharing their understanding
through writing an essay. Researchers have also demonstrated that students perform dif-
ferently with different assessment formats such as teacher observations, short-answer
problems, or writing in science notebooks (Shavelson et al., 1993). As a classroom teacher,
gre78720_ch06_152-183.indd Page 164 3/22/09 12:22:12 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
you should vary the formats you use to increase the validity of your assessments and to
provide opportunities for every student to shine. Of course, students must have practice
in class with these various formats prior to any summative assessment.
Another important reason to vary formats is to give all students opportunities
to work on improving their less developed or weaker skills. If students are encouraged
to do only what they are good at, they will never get better at other assessment formats.
Although we have noted that writing may impede some students from expressing their
understanding of science or history, we also acknowledge the critical role of writing
in conveying information. So writing should be one of an array of assessment forms
during a unit. Using a range of assessment formats is crucial not only to validity, but
also to conveying high expectations to your students. It can be another opportunity to
help close achievement gaps.
Content-Related Evidence
As we have emphasized, every assessment you design necessarily addresses only a
Content-Related Evidence sample of the total content that students are learning in your class. Content-related
The adequacy of the sampling of evidence for validity refers to the representativeness of the understanding and skills
specific content, which contributes sampled. Often people term this type of evidence content representativeness because
to validity.
the key issue is ensuring that the assessment adequately represents the constructs
Content-Representativeness
addressed in the learning goals.
Ensures the assessment adequately
represents the constructs addressed For classroom assessments, content-related evidence is easy to ensure if you align
in the learning goals. your learning goals, your instruction, and your assessment. Your assessments should
address both the content of the learning goals and the cognitive processes you expect
students to master. They should not address content or processes irrelevant to those goals.
For example, you would not use a multiple-choice test to determine whether students
can organize contradictory information into an argument on the pros and cons of impos-
ing a national carbon tax. This would require an essay. Similarly, if your summative
assessment includes writing a persuasive essay, you must also provide classroom activities
that allow students to hone their persuasive writing skills, including formative assessment
with feedback on how to improve. For a summative classroom assessment, we have sug-
gested in Chapter 2 that you design a table of specifications (sometimes called a “blue-
print”) to visually represent how your assessment covers the content of the learning goal
(see Table 2.15). When the table is completed, it should provide the evidence you need
about the degree to which your assessment has content representativeness. If all learning
goals are represented with items in proportion to the amount of instruction that they
received, you will be able to make a positive judgment about content representativeness.
In our classes, when we discuss validity, our students always have many examples
to offer from their experiences in which teachers either did not gather evidence for
content representativeness of a summative classroom test or assumed their test had
content representativeness when the students believed otherwise. For example, one
student told us, “When I was in biology in high school, my teacher would show video
after video of diseases and things which had nothing to do with our textbook. We were
expected to write twenty facts on each video, and we were told our test would come
from our facts. However, when test time came around, my teacher would ask questions
from the textbook and not the movies.” Assessments must be aligned with instruction
and the learning goals if you want your students to perceive them as reasonable.
Lack of content representativeness, in fact, is the most common problem we have
found among student complaints about grading and summative assessments. As a
teacher, you should see this as good news, because it is a problem you can easily rem-
edy with your own assessments. If you use a table of specifications to help align your
assessment with your instruction and the learning goals, your assessments will possess
content representativeness and students will believe they are fair.
gre78720_ch06_152-183.indd Page 165 3/22/09 12:22:13 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 6
Content-Related Evidence and Applications
Definition Classroom Applications
Evidence that the assess- • Formative: Design formative tasks that require practice and
ment representatively feedback and address higher-order thinking skills from the
samples items associated learning goals.
with the learning goals • Summative: Design table of specifications to gauge alignment
among learning goals, instruction, and assessment.
For formative classroom assessments, you must make sure that each task you
spend time on contributes to student advancement toward the learning goals. You also
want to allocate the most time to formative tasks essential to the most important learning
goals. You probably would not design even one formative task to address the learning
goal Recall forms of poetry and definitions (see Table 2.15). This information is at the basic
knowledge (“remember”) level. Students should be able to understand and recall defini-
tions without much assistance. You should save formative assessment tasks for goals
related to more complex understanding and for skills that take time, effort, and feedback
to master. Table 6.6 summarizes the discussion of content-related evidence for validity.
Criterion-Related Evidence
Criterion-related evidence refers to information that examines a relationship between Criterion-Related Evidence
one assessment and some other established measure that is gathered to provide evi- Information that examines a
dence of validity. For example, you would have evidence for the validity of a survey relationship between one
assessment and an established
measuring values associated with a political party if people who attained high scores measure for the purposes of
on that survey also indicated they always voted for that party’s candidates. Researchers discerning validity.
examining the relationship between SAT scores and college freshman grades are con-
ducting studies to assemble evidence of the criterion-related validity of using SAT
scores for predicting college performance.
As you recall, triangulation is the process of developing an accurate conclusion
based on comparison of several sources. In Chapter 3 we discussed drawing conclu-
sions about individual students using triangulation. You can also use triangulation
when drawing conclusions about the validity of your tests. If you look for patterns
across different assessments of similar content, you may find that your students did
much worse (or much better) on one of them. For example, if the geometry teacher
who gave the dodecahedron assignment had compared those scores to scores on pre-
vious geometry assessments, he would have seen that the dodecahedron scores were
much lower for a number of students. Evidence for validity based on the criterion of
a strong relationship to other previous assessments would have been slim.
Sometimes when students do more poorly than usual on an assessment, teachers
may assume they did not study or they were not motivated. These can be dangerous
assumptions because they prevent teachers from critically examining their own prac-
tices. We suggest that when your students do particularly poorly on an assessment, you
use the process of triangulation with previous assessments to look for criterion-related
evidence of validity. That way, instead of blaming students, you will consider whether
something could instead be wrong with the validity of the test you constructed.
Comparison with previous work can also help you ensure that the assessments you
design are at the right level of difficulty to challenge your students but not discourage them.
Sometimes new teachers design assessments that are too hard or too easy for their students,
and so the assessments are not useful. Assessments that are too easy do not give you any
gre78720_ch06_152-183.indd Page 166 3/22/09 12:22:13 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 7
Criterion-Related Evidence and Applications
Definition Classroom Application
information about what concepts your students find challenging, so you do not gain ideas
about what they need to work on to improve. Easy assessments also convey low expecta-
tions for your students. Assessments that are too difficult are discouraging to students and
do not give you information about the steps needed to get them from where they are to
where they need to be. You only know they are not there yet. Comparing student perfor-
mance on current assessments to their performance on previous assessments will help you
identify assessment activities providing an appropriate level of challenge.
A final way to think about criterion-related evidence in classroom assessment is to
use your preassessment as a criterion measure. If your summative classroom assessment
for a unit is sensitive to instruction and squarely addresses the unit content, you should
see evidence that all students improved greatly, in terms of the percentage correct, between
the preassessment you conducted before you began the unit and the summative assessment
you design at the unit’s end. If, on the other hand, the percentage correct does not vary
much from the preassessment to the summative assessment, you have criterion-related
evidence that you are not measuring gains based on the instruction for the unit. Table 6.7
summarizes the discussion of criterion-related evidence for validity.
FIGURE 6.3 Which type of validity evidence does this cartoon best illustrate? Why?
Family Circus © Bil Keane, Inc. King Features Syndicate
gre78720_ch06_152-183.indd Page 167 3/22/09 12:22:13 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
CIMPROVING VALIDITY IN
LASSROOM ASSESSMENTS
We close our section on validity by providing suggestions about methods for improv-
ing validity in classroom assessments, which are listed in Table 6.8. They stem from
the four sources of validity evidence we have discussed, so they should look familiar.
Because the biggest problem related to validity that we hear about from our students
is problems with content-related evidence, we start with and emphasize the suggestion
that you review every assessment task to ensure that it addresses the learning goals
and the associated important instructional content. We believe if all teachers con-
ducted such a review each time they designed an assessment, complaints from students
about unfair assessments would plummet dramatically.
TA B L E 6 . 8
Suggestions for Improving Validity in Classroom Assessments
Suggestions Examples
RAND
ELATIONSHIP BETWEEN RELIABILITY
VALIDITY
We have now explained reliability and validity as two attributes that affect the qual-
ity of the “input” you invest in designing your assessments. As you recall, reliability
involves minimizing random errors that can mistakenly increase or decrease the
true score a student should receive. To reduce the impact of random errors that can
lower reliability, you do your best to ensure the assessment occasion is free of
distractions, you ensure the number and types of assessment tasks give you suffi-
cient information, and you provide clear scoring criteria to support consistency in
your grading.
Validity, on the other hand, relates to the degree of confidence you have that the
inferences you are making based on assessment results are accurate. The primary way
to ensure validity in classroom assessment is alignment among learning goals, instruc-
tion, and assessment. One factor that could affect the degree of confidence about your
inferences could be scores that are not reproducible because they are plagued by ran-
dom errors. If scores are highly unreliable, then validity is also limited. High reliabil-
ity is necessary for high validity.
For example, one of our students recounted an incident involving a summative
assessment from her high school calculus class. The test she had to take consisted of
two very complex problems. The teacher gave credit for each problem only if the stu-
dent had the right answer. Our student received no credit for one of the two problems
because she made a simple mistake in addition, even though she had followed all the
necessary steps and used the correct formulas. Her score on this test, because of her
random calculation error and the limited number of problems, had low reliability and
resulted in a failing grade. This score also caused the teacher to make an invalid infer-
ence about the student’s calculus skills. This incident illustrates the necessity of high
reliability for high validity.
Simply having high reliability on an assessment, however, cannot guarantee
that the inferences you make about that assessment will be valid ones. You can
readily see this if we alter the calculus example. Consider what would happen if the
summative test the teacher designed had 20 items instead of 2 and used several
types of tasks to increase the sufficiency of information. Suppose she also gave
students as much time as they needed, ensured the environment was free of distrac-
tions, and had clear scoring criteria. These actions would definitely improve the
reliability of the assessment. However, if the content were algebra instead of calcu-
lus, no matter how high the reliability, the scores on the assessment would not help
the teacher make valid decisions about her students’ achievement of calculus learn-
ing goals. The teacher would not have carefully aligned the assessment with the
learning goals and the instruction, and so the inferences she could make would not
be related to the construct about which she needed information. These examples
illustrate why assessment theorists assert that reliability is a necessary but not suf-
ficient condition for validity.
score, but it is rarely intentional or conscious. Most teachers we know bend over back-
wards to be fair and give students the benefit of the doubt. Sometimes, however, an
item or an activity can put one group at a disadvantage so that the scores they receive
are systematically lower than their actual scores would be. This systematic bias will
lower the validity of your assessment.
As our culture diversifies, the danger of unintentional bias increases. As of the 2000
census, almost 20 percent of the people over age 5 in the United States spoke a language
other than English at home, and by 2005, the Hispanic/Latino population increased to
14.4 percent of the total population to become our largest minority. Furthermore, family
structure is changing, with 20 million children living with single parents (including more
than 3 million with single fathers) and 5.6 million with grandparents. Such changes sug-
gest that for us to provide equal access to education for all students, we must become
familiar with and respond to the increasing diversity of life and language experiences in
our classrooms. Attention to this issue is particularly important because statistics continue
to demonstrate that teacher education students are overwhelmingly European American
and female. We discuss some of the possible bias pitfalls so that you can avoid them in
your own classroom. Table 6.9 summarizes the pitfalls.
TA B L E 6 . 9
Bias Pitfalls in Classroom Assessment
Pitfall Definition Example
Unfair penalization Using content, examples, or Using exclusively American
language based on life holidays such as Thanksgiving
experiences in an or the 4th of July as the basis
assessment that puts groups for an essay on family
who are unfamiliar with experiences
them at a disadvantage
Lack of opportunity Using assessments that do Testing students who leave the
to learn not take into account classroom for special classes
absences or other on the content they missed
systematic differences without providing
among groups in exposure compensatory experiences
to the curriculum
Teacher bias Systematic but usually Formative assessment: Asking
unintentional teacher questions requiring higher-
preferences (e.g., order thinking skills of only
microaggressions, the students who raise their
preferential interaction hands
patterns, scoring biases) Summative assessment:
that can influence Inadvertently marking down
assessments papers with messy handwriting
gre78720_ch06_152-183.indd Page 170 3/22/09 12:22:14 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
design for an English class could involve a topic that males in your class may have
more experience and knowledge about than females, such as their perspective on how
the rules about football could influence other social interactions. This topic choice may
then influence the quality of the essays differentially for the two genders if the females
had less experience on which to draw. Similarly, in a social studies class you may
choose to ask students to write a letter to a person living during the U.S. Civil War as
an assessment. But you generate the list of eligible correspondents off the top of your
head and happen to limit it to famous officers or civic leaders who come to mind
quickly, all of whom happen to be European American and most of whom happen to
be male. Because your female students and your minority students may have a harder
time identifying with and picturing these figures and their interests, they could be at
a disadvantage in completing the assignment. You must make sure your activities and
assessments take into account important differences in your students’ life experiences.
These include variations ranging from exposure to different types of climate (e.g.,
snow), to family experiences (e.g., vacations, exposure to cultural events), to typical
interaction patterns (e.g., methods of problem solving).
We were reminded of unfair penalization recently when we gave a picture vocab-
ulary test to a bright young African American kindergartener who lived in subsidized
housing in the middle of a city. The procedure involved showing him pictures and
asking him to identify them. He flew through dozens of pictures until he encountered a
saddle for a horse. He was suddenly completely stumped, perhaps because he had never
had the opportunity to see horseback riders. For this city, riding stables could be found
only out in the countryside and tended to draw an upper middle class and wealthy
clientele.
In thinking about unfair penalization related to language, the suggestions we
made in Chapter 3 for accommodating English language learners are worth revisiting
because such accommodations are intended to reduce unfair penalization. For exam-
ple, you should consciously use visual as well as verbal representations of content, and
you should avoid complex syntax and exclusively U.S. cultural references. In formative
assessments, you should also make sure all the vocabulary building blocks necessary
for the unit content are in place for your English language learners.
Opportunity to Learn
Another factor that can introduce bias into assessments is a systematic difference
among students in their opportunity to learn the content of the curriculum. For exam-
ple, if your preassessment shows some students are far behind the rest in their ability
to use and interpret maps for the unit on the five regions of the United States, they
would not be able to benefit from any new instruction requiring map interpretation
until you had taken the time to teach this prerequisite skill. Another factor related to
opportunity to learn is attendance. Students who are absent often are at a disadvantage
because they do not receive the same amount of instruction as other students. Often
the students who are absent the most are also students who lag behind their peers.
Similarly, students who have no supervision in the evenings or no parental encourage-
ment to complete homework are at a disadvantage in their opportunities to learn.
Awareness of the bias that fewer opportunities to learn can breed will sensitize you to
the needs of your students. Efforts to equalize the opportunity to learn for all your
students are one more way to work to close achievement gaps. For example, we know
several teachers who allow students who have a disorganized home environment to
come to their classrooms as soon as the bus drops them in the morning to work on
homework and ask questions rather than waiting outside until the bell rings.
gre78720_ch06_152-183.indd Page 171 3/22/09 12:22:14 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Teacher Bias
Teachers are human, and all humans are susceptible to bias. You know from your own
experience that sometimes teachers have favorite students, and sometimes certain stu-
dents feel they could never do anything right in the eyes of a particular teacher. But
if you asked those teachers, they would probably believe that they are doing their best
to be fair. Bias is rarely intentional.
Because teachers don’t intend to show bias, many authors suggest that the first
step in reducing our biases is to bring our own social identities to our awareness so
we can see the limits of our experience (e.g., Bell et al., 1997; Sue et al., 2007). Willing-
ness to scrutinize our own assumptions, values, and habits generates self-awareness
that can provide insights enhancing our empathy. This in turn may help us accom-
modate diversity in the classroom successfully. Analyzing your own group member-
ship, history, and experiences allows for increased sensitivity to the kinds of assumptions
you make about your students. For example, teacher candidates who have been to
foreign countries where they do not speak the language find they later have more
empathy for the English language learners in their own classrooms than candidates
who have never had such experiences. Or breaking a leg and using a wheelchair can
provide surprising new insights into the barriers facing students with physical dis-
abilities to which a candidate otherwise would have been oblivious. You can see how
broadening your experience allows you to notice some unconscious biases you may
have had. We will now address some aspects of teacher bias that may help sensitize
you to the needs of students.
Microaggressions
Seemingly innocent questions or comments arising from lack of awareness of our own
biases can damage our relationships with others from different backgrounds and be
perceived as “microaggressions” (Sue et al., 2007). Microaggressions are “brief, every- Microaggression
day exchanges that send denigrating messages . . .” (p. 273). For example, when a middle- Brief, everyday remark that
class European American asks an Asian American or Latino American, “Where are you inadvertently sends a denigrating
message to the receiver.
from?” the question implies that the person is a perpetual foreigner, and it negates their
American heritage. Several of our Asian American friends born in the United States
have told us they encounter this question frequently, and it is often repeated, “But where
are you really from?” when they say they are from the United States.
Another common microaggression that we find among our teacher candidates
is a statement such as “I am color blind,” or “When I look at my students, I don’t see
color.” Such comments deny a person’s racial or ethnic experiences and implicitly insist
that assimilation to the dominant culture is required (Sue et al., 2007). If you say you
don’t “see” color, you are probably trying to communicate that you are fair and do your
best not to discriminate on the basis of color. However, recognizing that race is per-
ceived differently depending on one’s position in society is an important step in the
process of becoming culturally sensitive.
A possible outgrowth of such comments (and the beliefs behind them) is insen-
sitivity in designing assessments. One of our African American friends told us about
a teacher of one of her sons who professed not to “see” color. When she had students
design a self-portrait with construction paper, she had only buff colored paper for the
faces and blue for the eyes. When some people don’t “see” color, they tend to ignore
facets of culture and experience beyond their own limited perspective. Their behaviors
can inadvertently alienate their students of color and marginalize them. When students
feel like they don’t belong, their achievement usually suffers. When academic perfor-
mance is depressed for such children, the achievement gap continues to widen.
gre78720_ch06_152-183.indd Page 172 3/22/09 12:22:14 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Because we want to narrow achievement gaps, we must avoid letting our own
background and assumptions limit the learning experiences we provide. We must
learn how the cultural background of our students can influence their identity and
learning so we can take advantage of it in our teaching and in creating an inclusive,
welcoming classroom.
At the same time, we must avoid unconscious negative stereotypes that some-
times can accompany a limited understanding of another culture. For example, Paul
Gorski (2008) describes and debunks several myths of the “culture of poverty,” such
as assuming poor people have low motivation. Such stereotypes can derail high
expectations and undermine our efforts to promote achievement for every child. As
you might imagine, the balance is difficult to achieve. Gradually, with intentional
thoughtful awareness, you will build an understanding of your students, their fami-
lies, and the local community that can help you teach so your students can reach
their maximum potential.
Scoring Biases
Not only can unintended bias influence classroom interactions and formative assess-
ment, it can also affect scoring of summative assessments. Numerous studies have been
conducted over the years investigating scoring biases (see Fiske & Taylor, 1991). We
gre78720_ch06_152-183.indd Page 173 3/22/09 12:22:14 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 1 0
Suggestions for Avoiding Teacher Bias
In General
Analyze the limits of your own assumptions, values, and experience and make an effort to
expand them:
• Read widely about other cultures, social classes, ethnicities, and races.
• Seek out experiences with people unlike yourself by traveling and doing volunteer work.
Monitor teacher attention on a seating chart using checks or + and – for teacher
comments.
When possible, remove clues to the identity of the student whose work you are grading.
When possible, have students type work they turn in to avoid penalizing hard-to-read
handwriting.
want to alert you to some of these findings so you can avoid scoring bias in your own
classroom. In these studies, two groups usually rate the same essay or assignment.
Additional information about the student who completed the assignment is varied
between the two groups to see if it changes perceptions of the quality of the work. For
example, when one group is told the student is “excellent” and the other group is told
that the student is “weak,” those who believe they are grading the work of an “excel-
lent” student assign higher grades than those who think they are grading the “weak”
student. Similarly, when raters are given a photo of a student, those who receive an
attractive photo usually grade higher than those given an unattractive one. Research
such as this impels us to recommend that whenever possible, you grade assignments
without knowing the identity of the student.
Similarly, messy handwriting, length of a paper, and grammatical errors can
strongly influence grades, even when the teacher doesn’t intend it. A close friend who
teaches fourth grade has what he calls a “handwriting fixation” with which he grapples.
He recently caught himself just before lowering the grade for a student who wrote every
cursive “m” with two humps instead of three. To bring home the point about how eas-
ily biases can emerge, we know one professor who has his students grade the same essay
for practice every semester. Some students looked at a messy version and others had a
neat version. Every single semester, the neat version emerges with a higher grade.
Another factor that can have impact on grading is the variance in political or
social perspectives of teachers and students (Barbour, 2007). Teachers we know do
gre78720_ch06_152-183.indd Page 174 3/22/09 12:22:14 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
their best to grade fairly, but they sometimes agonize about how to maintain impartial-
ity and see this potential bias as an important ethical dilemma. A good scoring guide
helps enormously, and in Chapters 8 and 9 we discuss how to design scoring guides
that help you to maintain consistency and objectivity.
As you review Table 6.10, you will notice the repetition of a few recommenda-
tions that we made to enhance reliability and validity—such as providing frequent
and varied assessments and collaborating with peers on designing and scoring assess-
ments. This is because bias can influence the validity of the inferences we make about
students based on their test scores. Does a high score on a writing assignment reflect
that the student has mastered the writing strategies you have taught, or is the high
score because of the student’s neat handwriting? The safeguards we offer will help you
catch yourself and your implicit biases as you reflect on variations in the outcomes
of your assessments.
Stereotypical Representation
Stereotypical Representation Stereotypical representation involves depicting social groups in an oversimplified,
Depicting social groups in an clichéd manner in your formative and summative assessments. For example, if your
oversimplified manner. early childhood students were learning about the instruments in the orchestra, you
might want to be sure to include photographs of women playing brass instruments
and African Americans playing strings as you developed activities for assessments. Our
students tell us that typical stereotypes in music depict males playing brass instru-
ments, females playing stringed instruments, with African Americans few and far
gre78720_ch06_152-183.indd Page 175 3/22/09 12:22:15 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
TA B L E 6 . 1 1
Problems to Avoid When Designing Assessments
Consideration Definition Example
Stereotypical Depiction of groups in an A math word problem that
representation oversimplified, clichéd involves calculations about
manner Juan picking lettuce or grapes
Contextual invisibility Underrepresentation of Using only typically European
certain groups, customs, or American names as
lifestyles in curricular or examples in assessments
assessment materials
Historical distortions Presenting a single Describing slavery exclusively
interpretation of an in terms of economic
issue, perpetuating “need” and avoiding a
oversimplification of moral perspective
complex issues, or avoiding
controversial topics
Adapted from Macmillan-McGraw Hill. 1993. Reflecting diversity: Multicultural guidelines for educational publishing
professionals.
TA B L E 6 . 1 2
Avoiding Sexist Language
Avoid Change To
“mankind” “humankind”
“They will man the table for the bake sale.” “They will staff the table for the bake sale.”
“Everyone should know his vote counts.” “People should know their votes count.”
Adapted from Macmillan-McGraw Hill. 1993. Reflecting diversity: Multicultural guidelines for educational publishing
professionals. Used with permission.
plural whenever possible allows both genders to be represented. See Table 6.12 for
examples of methods for avoiding sexist language and the hidden assumption of
female passivity.
Contextual Invisibility
Contextual Invisibility Contextual invisibility means that certain groups, customs, or lifestyles are not rep-
The concept that certain groups, resented or are underrepresented in your assessment materials (Macmillan-McGraw
customs, or lifestyles are not Hill, 1993). Underrepresentation implies that these groups, customs, or lifestyles are
represented in assessment
materials. less important, and it marginalizes them. For instance, examples of males in parenting
roles are often much less visible in instructional materials than examples of males
actively involved in work outside the home, hinting that nurturing roles are less valued
for men in our society. Similarly, when using names in assessments, many teachers are
inclined to stick with common European American names such as Tom or Mary. In
the movie Stand and Deliver, Jaime Escalante combats contextual invisibility by point-
ing out to his students that the Mayans recognized and used the important concept of
zero well before the Greeks, Romans, or Europeans. Emphasizing the contributions of
multiple groups beyond European Americans to the development of our culture helps
students appreciate a pluralistic society.
Historical Distortions
Historical Distortion Historical distortion involves presenting a single interpretation of an issue, perpetuat-
Presenting a single interpretation ing oversimplification of complex issues, or avoiding controversial topics (Macmillan-
of an issue, perpetuating over-
simplification of complex issues, McGraw Hill, 1993). Historical distortions can occur with past or present issues. The
or avoiding controversial topics. term “historical” used in the definition probably refers to the need to avoid the typically
gre78720_ch06_152-183.indd Page 177 3/22/09 12:22:15 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Eurocentric and male perspective that has dominated much historical and contempo-
rary analysis of political and historical events. For example, in an analysis of textbooks
for English language learners, Ndura (2004) reports that controversial topics, such as
prejudice or racial and ethnic conflict, were avoided in all texts reviewed. She reports,
We hear about the Navajo code talkers [’ contribution to World War II] . . . but
nothing about why their recognition was delayed over three decades. We hear the
pleading voice of Chief Joseph, but nothing about why he and his people had to go
to war. The Universal Declaration of Human Rights is presented, but there is no
discussion of those rights that are often infringed upon in the lives of many
immigrants and people of color. . . . The Civil Rights Movement is featured as a piece
of history with no extension to today’s struggles. The Lewis and Clark story is told
from a White man’s perspective. The voices of York, the male slave, Sacagawea, the
female Indian helper, and of the Indians they met on the expedition are all silenced.
As we mentioned in Chapter 1, varied, meaningful and challenging assessment
tasks encourage students to develop mastery goals and increased motivation. Engaging
tasks that grapple with controversial issues can be ideally suited to preventing histori-
cal distortion. For example, to encourage perception of multiple perspectives on cur-
rent or historical events, you might assign different students to look at different groups’
perceptions when designing a web quest about a controversial issue. You can also
explore conflicts through mini-debates, class discussions, or persuasive essays required
to address opposing views. Such activities have the added benefit of requiring complex
thinking at the highest levels of Bloom’s taxonomy.
If instead you gloss over controversial subjects, you deny your students oppor-
tunities to develop the critical thinking skills essential for active democratic participa-
tion. Exploring controversy and understanding it from several points of view may also
allow students to develop confidence to contribute to solutions. A recent study found
that classroom opportunities such as learning about things in society that need to be
changed, focusing on issues students care about, and encouraging students to discuss
and make up their own minds about political and social issues help students develop
stronger commitments to participation in civic affairs (Kahne & Sporte, 2007).
can be certain you are addressing the appropriate content based on the standards your
students need to be successful. Finally, unbiased assessments that provide inclusive
perspectives promote student motivation and engagement and help avoid marginaliza-
tion of groups who have typically been underrepresented. These are often the groups
who lag behind peers, so enhancing their motivation and sense of belonging can lead
to accelerated progress in working to close achievement gaps.
TA B L E 6 . 1 3
Table of Specifications for Assessments in Utopia and Dystopia Unit
Assessment Items Addressing Each Learning Goal
Learning Goal and Percentage Knowledge-Level
of the Unit It Covers Items Above Knowledge Level
1. Student will be able to 1. Formative assessment on
compare and contrast characteristics of utopias and
characteristics of a utopia dystopias
and dystopia. 2. 10 multiple-choice items
3. 4 essay items
4. Performance assessment on
60% creating a society
2. Student will be able to 1. Formative assessment on
compare and contrast multiple characteristics of utopias and
works concerning utopia/ dystopias
dystopia, paying particular 2. Multiple-choice items 1–3
attention to the role of the 3. Essay items 1–3
government and the role of
the individual. 30%
3. Student will be able to use 1. Essay item 4
persuasive language in writing 2. Performance assessment on
and/or speaking. 10% creating a society
60 percent of the unit to the first learning goal; consequently, the majority of her assess-
ment also addresses this learning goal. In addition, you see she has also taken important
steps to make sure each of her assessments requires higher than knowledge-level cogni-
tive skills to complete. For example, she requires students to use comprehension, anal-
ysis, and evaluation skills from Bloom’s by comparing and contrasting the characteristics
of utopias and dystopias, choosing the most important characteristics, and describing
how different pieces of literature exemplify them in her formative assessment.
Notice that each learning goal has more than one assessment addressing it, and
the assessments require different kinds of skills and activities. Sufficiency of informa-
tion, the key issue for reliability of classroom assessments, requires multiple assessments.
With this variety, Maria will have evidence to determine whether different assessments
addressing the same learning goal result in similar levels of performance among stu-
dents. If she finds consistent patterns, she will be able to conclude that random error
does not contribute in a major way to the obtained scores of her students.
The other reliability issue important to consider in classroom assessments relates to
scoring. Maria has a range of assessment tasks. She includes multiple-choice items, which
have a clear-cut correct answer, as well as essay items and a performance task, which can
be more susceptible to random error associated with making judgments. She must pay
careful attention to her scoring criteria and procedures to ensure she grades them consis-
tently and reliably. Ideally, consulting with another teacher to examine interrater reliabil-
ity could also be helpful if time permits. We know of several schools where interrater
reliability is addressed in team meetings (see Box 6.1). Chapters 8 and 9 provide discus-
sion of procedures for construction of scoring guides that enhance reliability.
gre78720_ch06_152-183.indd Page 180 3/22/09 12:22:16 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
Maria’s use of diverse forms of assessment will support the validity of her decisions
about students’ achievement of learning goals. Whether students are good at multiple-
choice tests is not the focus of an assessment. Rather, the focus is on whether students
show an understanding of the concepts tested in the multiple-choice format. Some stu-
dents best express their knowledge by being assessed with essay questions. Others best
express their grasp of new concepts when assessed by multiple-choice items. Strong
teachers use a mix of formats to assess student achievement because such variety is more
likely to provide valid inferences about student mastery of the learning goals.
In terms of avoiding bias, Maria addresses the opportunity to learn by ensuring
that students who don’t have computers at home will not be penalized by assignments
requiring technology such as PowerPoint slides. She plans to allocate class time in the
computer lab to complete the PowerPoint or other figures needed for any assignments
requiring computer technology. She will also ensure that students will not be tested on
content and skills that have not been thoroughly addressed in class—an issue of valid-
ity. She will take care to reduce any systematic scoring bias on her part through design-
ing scoring guides with relevant criteria, and she will spend time working with students
to clarify her scoring guides for them and for herself.
In terms of representing the diversity of the classroom, Maria began by including
writings by both male and female authors as resources (see Table 6.14). However, she
noticed most of these seemed to represent a Eurocentric perspective. After doing a
little research with an eye toward representing a wider range of groups, Maria decided
to add readings on Chinese notions of utopia/dystopia (e.g., Chairman Mao’s Little Red
Book) and the seventeenth-century social experiments begun by Jesuits in what are
now Paraguay, Argentina, and Brazil, as well as Roland Joffe’s film The Mission. She
also decided to make explicit in class discussions and assessments some of the under-
lying tensions at work in any society between freedom and equality and between
political majorities and minorities. She intends to address issues of politics and power
directly. In addition, she decided to offer students opportunities to do research on
sometimes marginalized perspectives and found information on feminist utopias, Jew-
ish utopias, and musical utopias to get them started in thinking about potential topics.
These additions fit well within the framework of Maria’s learning goals, and the added
complexity will stretch students’ critical thinking skills. The additions also provide
more conflicting perspectives and controversial issues that can deepen and enrich stu-
dent understanding of the facets of utopias and dystopias and the implications for their
lives and for their own civic participation.
TA B L E 6 . 1 4
Resources for Unit on Utopia and Dystopia
Literature Films
Declaration of Independence Gattaca
HELPFUL WEBSITES
http://www.ncel.org/sdrs/areas/issues/methods/assment/as500.htm
This section of the North Central Educational Laboratory website provides resources and video
clips related to enhancing the quality of assessments to support student learning.
http://www.understandingprejudice.org/
The Understanding Prejudice website provides resources, interactive activities, and
demonstrations that provide a variety of perspectives on prejudice, stereotypes, and
appreciation of diversity. It was established with funding from the National Science
Foundation and McGraw-Hill Higher Education.
2. Why is it important to explain that every score has an error component when
sharing a summative test score with parents or students?
3. Decide whether these student concerns are caused by problems with reliability
(R) or validity (V), and explain your reasoning:
V R The teacher tested us on the whole chapter, but she forgot to go over one section,
so the scores on the test of many students were much lower than they should
have been.
V R I had handed in a completed rough draft and she gave me a few comments on
how to improve what she herself called “a well-written paper.” After editing my
paper and incorporating all of her comments, I turned in my final draft. She
returned the paper a week later with a failing grade.
V R In my algebra class, all of the questions on tests were multiple choice and essay
even though during class we mostly worked problems.
V R In one of my high school classes, we would have short readings for homework.
To quiz us on the readings, the teacher would ask specific questions such as
“The word missing from this sentence is _____.” He didn’t test our overall
understanding.
V R In ninth grade I had a teacher who gave only one test for the entire semester. I
am not a good test taker, I get nervous and draw a blank. Everything I knew
left me and I received a low grade in the class even though I was there every
day, did all my homework and assignments, and participated regularly.
4. Think about a recent major classroom assessment you have completed in your
college classes. Did you feel it was a valid assessment? Explain why or why not,
demonstrating your understanding of at least two sources of validity evidence.
5. For another recent major classroom assessment you have completed, describe
the specific information your teacher ideally should have collected as
evidence for the validity of the assessment. Explain how you know whether
such evidence was collected or not. Would such evidence make the teacher
more or less confident of the decisions he or she made about the meaning of
the scores?
6. Why is content-related evidence the most important source of evidence for
validity in classroom assessment?
7. If you were chosen to debate whether validity or reliability is more important
for assessment, which side would you choose? Explain the reasoning for your
choice.
8. Think about your own experiences with people who are different from you.
Describe areas where you believe your experience is limited and could lead to
unintentional bias on your part as a classroom teacher.
9. Identify examples of microaggressions that you have observed or that have been
directed at you.
10. Describe classroom examples you have observed of unfair penalization, lack of
opportunity to learn, and teacher bias. With each example, describe what group
was targeted and what you would have done to rectify the situation.
11. Examine a textbook in your content area for stereotypical representation,
contextual invisibility, and historical distortions. Under each heading, provide
a list of specific examples.
gre78720_ch06_152-183.indd Page 183 3/22/09 12:22:17 AM user-s172 /Users/user-s172/Desktop/21:03:09/MHSF123:GREEN:210
References 183
12. In your estimation, does Maria provide sufficient content-related evidence for
the validity of her assessments in the unit on utopias and dystopias? Justify your
answer. What additional evidence would be useful?
REFERENCES
Barbour, J. 2007. Grading on the guilty-liberal standard. Chronicle of Higher Education 53(42):
(ERIC Document Reproduction Service NO. EJ770974). Retrieved June 10, 2008 from
ERIC database.
Bell, L., S. Washington, G. Weinstein, and B. Love. 1997. Knowing ourselves as instructors. In
M. Adams, L. Bell, & P. Griffin (eds.), Teaching for diversity and social justice: A sourcebook,
pp. 299–310. New York: Routledge.
Brookhart, S. 2003. Developing measurement theory for classroom assessment purposes and
uses. Educational Measurement: Issues and Practices 22 (4): 5–12.
Fiske, S., and S. Taylor. 1991. Social cognition. 2nd ed. Reading, MA: Addison Wesley.
Gorski, P. 2008. The myth of the “culture of poverty.” Educational Leadership 65 (7): 32–36.
Gronlund, N. 2006. Assessment of student achievement. 8th ed. New York: Allyn and Bacon.
Kahne, J., and S. Sporte. 2007. Educating for democracy: Lessons from Chicago. Chicago: Consortium
on Chicago School Research. Retrieved October 2, 2007 from http://ccsr.uchicago.edu/
content/publications.php?pub_id=117.
Macmillan-McGraw Hill. 1993. Reflecting diversity: Multicultural guidelines for educational
publishing professionals. New York: Author.
Ndura, E. 2004. ESL and cultural bias: An analysis of elementary through high school textbooks in
the Western United States of America. Language, Culture, and Curriculum 17 (2): 143–153.
Popham, W. J. April, 2007. Instructional insensitivity of tests: Accountability’s dire drawback.
Paper presented at the annual meeting of the American Educational Research Association,
Chicago, IL.
Shavelson, R., G. Baxter, and X. Gao. 1993. Sampling variability of performance assessments.
Journal of Educational Measurement 30: 215–232.
Smith, J. 2003. Reconsidering reliability in classroom assessment and grading. Educational
Measurement: Issues and Practices 22 (4): 26–33.
Sue, D., C. Capodilupo, G. Torino, J. Bucceri, A. Holder, K. Nadal, and M. Esquilin. 2007. Racial
microaggressions in everyday life. American Psychologist 62: 271–286.
Switzer, J. 1990. The impact of generic word choices. Sex Roles 22: 69.
gre78720_ch07_184-223.indd Page 184 3/26/09 12:29:31 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
184
gre78720_ch07_184-223.indd Page 185 3/26/09 12:29:32 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 7
TEACHER-MADE ASSESSMENTS:
MULTIPLE-CHOICE AND OTHER
SELECTED-RESPONSE ITEMS
Perhaps the most common argument proponents of alternative assessment bring
against the use of [selected-response items] is this: [they] measure only recall and
other lower-order thinking skills whereas alternative methods of assessment require
students to exhibit the higher-order skills such as critical thinking, analysis,
synthesis, reasoning, and problem solving. If this were true, it would be a very
damning argument indeed, but neither assertion is altogether accurate.
–William Sanders and Sandra Horn (1995)
INTRODUCTION
In this and the next two chapters, we examine three forms of assessment: selected-
response (e.g., multiple-choice, true-false, and matching), constructed-response
(e.g., short answer and essays), and performance assessments (e.g., science exper-
iments, music auditions, speeches). Used in combination, these forms of assessment
will provide you with information for making valid decisions about your students.
As we begin our look at these various forms of assessment, we start with the fol-
lowing guiding thoughts:
• A student should get an item or task correct because the student
understands the material, not because we gave clues to the answer or
because the student guesses well.
• A student should miss an item because the student does not understand the
material, not because we wrote confusing items or tasks, tricked the
student, or asked for trivial information.
185
gre78720_ch07_184-223.indd Page 186 3/26/09 1:24:02 PM user-f501
/Users/user-f501/Desktop/TEMPWORK/March2009/26-03-09/MHSF123:Green/MHSF123-
These guiding thoughts apply whether we are talking about true-false items,
multiple-choice questions, essay questions, or performance tasks. They remind us that
the purpose of our tests is to find out what our students know and are able to do.
In this chapter we focus on selected-response items. Selected-response items
offer choices from which a student decides on an answer. We look at three types of
selected-response formats: multiple-choice, true-false, and matching.
GAOALS
LIGNING ITEMS WITH LEARNING
AND THINKING SKILLS
A first step in developing a test is to prepare a table of specifications based on the
learning goals that you covered during instruction (see Table 2.15). Based on the learn-
ing goals addressed in the unit, the table of specifications will ensure that you focus
your test development efforts on items related to instruction. The table of specifications
also prevents you from including test items based on learning goals that had to be
modified during the unit. Finally, the alignment between your instruction and the test
items will support you in making valid decisions about whether students have met the
learning goals associated with the instructional unit.
SELECTED-RESPONSE FORMATS
Having developed a table of specifications, you are now ready to consider the formats
you can use in writing the items.
Multiple-Choice Formats
Stem Conventional multiple-choice items begin with a stem and offer three to five answer
The premise of a multiple choice options. The stem takes the form of a question or an incomplete statement (Figure 7.1).
item, usually a question or an Options are the potential answers, with one correct response and several plausible
incomplete statement.
incorrect responses. Incorrect responses are referred to as distracters because they cap-
Options
ture the attention of students who have not mastered the material. (This choice of term
Potential answers including one
correct response. is conventional but unfortunate because it sounds like teachers are trying to trick stu-
dents.) In Figure 7.1(a), Olive Crews, a high-school teacher, used the following question
Distracters
Incorrect options in multiple
format: What are the class boundaries when the class limits are 8–12? In Figure 7.1(b),
choice items. Rodney Grantham, who teaches Spanish to high school students, used the completion
format: Mis amigos son .
Alternate-Choice Item A second multiple-choice format is the alternate-choice item. This type of question
Selected-response item with only has only two options instead of the three to five options of the conventional multiple choice.
two options. An example is shown in Figure 7.2. Experts suggest that conventional multiple-choice
a b
1. What are the class boundaries when 8. Mis amigos son __________.
the class limits are 8–12?
FIGURE 7.1 Examples of the (a) Question Format and (b) Completion Format for
Multiple-Choice Items
gre78720_ch07_184-223.indd Page 187 3/26/09 12:29:32 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
1. 2. 3.
4. 5. 6.
8. Look at the different kinds of animals in the pictures above. Which three of these animals
have a very large number of young every time they reproduce? Stem
A) 1, 2, and 4
B) 1, 4, and 6 Options with choices
C) 3, 5, and 6* grouped into sets
D) 4, 5, and 6
FIGURE 7.3 An Example of a Complex Multiple-Choice Item from the Fourth-Grade NAEP
Science Assessment
Adapted from the National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
items with three to five options often actually function as alternate-choice items because
they usually have only one plausible distracter (Haladyna & Downing, 1993). In the food
chain item, the choices are consumers and producers. As you recall from elementary
science class, food chains are composed of producers, consumers, and decomposers.
“Decomposers” is not a useful option, however, because these organisms are primarily
fungi and bacteria. Little would be gained by including decomposers as an option because
students would quickly eliminate this implausible distracter.
The third multiple-choice format is complex multiple choice. This type of item Complex Multiple Choice
consists of a stem followed by choices that are grouped into sets (Figure 7.3). The com- Selected response item with stem
plex multiple-choice format is more difficult than other formats. The student must con- followed by choices grouped into
more than one set.
sider each option, but also each combination within an option; so getting the answer
correct may depend on logical skills more than mastery of the content. A student could
know the material; however, in the juggling of the combinations within the options, the
gre78720_ch07_184-223.indd Page 188 3/26/09 12:29:35 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
student might select the wrong answer. This type of format, then, can violate our guid-
ing thought that students should miss an item only if they do not understand the mate-
rial, and so we do not recommend it.
The example of the complex multiple-choice format in Figure 7.3 is from the
fourth-grade science assessment of the National Assessment of Educational Progress
(NAEP), a national test administered to a representative sample of students in grades
4, 8, and 12 (Greenwald, Persky, Campbell, & Mazzeo, 1999). The NAEP Assessment
provides descriptive information about examinees’ achievement in such areas as read-
ing, writing, mathematics, and science. Assessment results are reported by student
groups (e.g., gender, ethnicity, family income level) at the state and national level. Items
on the test include multiple-choice, short constructed-response, extended constructed-
response, and performance tasks. Throughout this chapter we will use examples from
NAEP to illustrate the application of item-writing guidelines we are discussing.
True-False Formats
True-false items provide a statement, also referred to as a proposition, and the student True-False Items
must determine whether the statement is correct or incorrect. An example of a true- Items that provide a statement that
false item is the following: the student must determine to be
correct or incorrect.
T* F One acceptable syllable pattern for a limerick is 9-9-5-5-9.
Sometimes the answer options are Yes/No or Correct/Incorrect.
A variation of the true-false format is to have students correct the false part of
a statement. For example, the question on the syllables pattern for limericks might read
as follows:
T F* One acceptable syllable pattern for a limerick is 8-8-9-9-8.
Then the students would be instructed, “For false statements, correct the underlined
part of the item to make the statement true.” In this case a student might change the
pattern to 8-8-6-6-8 or some other variation that falls within the general syllable pat-
tern for limericks. The requirement to correct the underlined part of the statement
reduces the effect of guessing, but it requires more detailed scoring and, thus, increases Multiple True-False Items
the amount of grading time. Several choices follow a scenario or
question and the student indicates
Another form of the true-false item is the multiple true-false (Figure 7.4). This whether each choice is correct or
format combines the multiple-choice and true-false formats. In the multiple true-false incorrect.
A farmer has a small pond that has recently had a lot of algae growth. Which of the following
statements are actions the farmer might take and the possible results? Indicate whether each
statement is true (T) or false (F).
T F* 5. Adding fertilizer will kill the algae.
T* F 6. Adding small fish to the pond will reduce the amount of algae.
T F* 7. Introducing small fish to the pond will attract herbivores.
item, several choices follow a question or scenario, and the student indicates whether
each choice is true or false.
Matching Format
Matching Formats
Two parallel columns of items A final form of selected response is matching. As shown in Figure 7.5, in a matching
(termed premises and answers) are exercise two columns are formed. In the left-hand column, you list a set of premises
listed and the student indicates
which items from each column (e.g., statements, questions, phrases). In the right-hand column, you list the answers
belong together in pairs. (sometimes referred to as responses). Generally, the premises, or lengthier phrases,
Directions: Write the letter from the “Measurement Units” box on the line next to the best
choice. Some measurement units are used more than once.
5. Which measurement unit should you use?
Measurement Units
1. length of the playground
a. centimeter
2. the height of a house b. kilometer
3. the distance between cities c. meter
d. millimeter
4. the length of a baby
5. the tip of a pencil
6. the width of your hand
PREMISES ANSWERS
should be placed on the left. The answers, or briefer phrases, appear on the right. In
Figure 7.5, the answer column contains the measurement units and the premise col-
umn lists the objects (e.g., house, playground) to be measured.
Directions: Write the letter from the “Types of Organisms” box on the line next to the best
choice. Some organism types are used more than once.
1. apple tree
Types of Organisms
2. fox
a. consumer
3. mold b. decomposer
4. mushroom c. producer
5. rabbit
Questions 8–9 ⎫
The table below shows information about the weather in four ⎬ Directions
cities on the same day. ⎭
City City City City ⎫
1 2 3 4 ⎮
High Temperature 65 F 80 F 48 F 25 F ⎮ Interpretive
Low Temperature
⎬ Material
56 F 66 F 38 F 10 F ⎮
Precipitation—Rain or
2 in 0 in 1 in 1 in
⎮
Snow (inches) ⎭
8. In which city did snow most likely fall at some time during the day?
A) City 1
B) City 2
C) City 3
D) City 4*
without reference to the graph or picture. The multiple-choice item in Figure 7.7 is an
interpretive exercise used in the NAEP science assessment for fourth grade.
Read the following entry of a soldier’s diary. From the information he provides, deter-
mine which “side” this soldier is fighting for. Then answer the questions based on his
view of the Civil War.
We get further and further from home everyday. After we lost the first big battle, my old
job at the factory does not seem so bad. Morale is low. We are tired. However, more sup-
plies and troops are coming which helps to build our strength. Gen. Grant says we move
out tomorrow towards the Mississippi River. I hope we succeed in our mission.
1. Which state do you think this soldier is from?
A. Maryland*
B. New Mexico
C. Virginia
2. How might this soldier defend his reason for going to war?
A. Federalism*
B. Fifth Amendment
C. Fugitive Slave Act
3. To which political party would this soldier most likely belong?
A. Constitutional Union
B. Democratic
C. Republican*
FIGURE 7.8 Example of an Interpretive Exercise That Uses a Fictional Entry from a
Soldier’s Diary
3. Look at the rhythm patterns below. Which of the 4 rhythm patterns matches the one
you hear?
A.
B.
C.
D.
Sound is also incorporated into the interpretive exercises in music for the SCAAP
(Yap et al., 2005). To answer the item shown in Figure 7.10, fourth-grade students
listen to a music clip; then the students select the option that matches the rhythm
pattern that they heard.
Another advantage of the interpretive exercise is that after you have designed an
exercise addressing important information related to one of your learning goals, you
can use many variations of the same basic item as you modify your tests for new groups
of students. You can change the interpretive material (e.g., a different poem, music clip,
map, or literary passage), and/or you can change some of the questions about the
interpretive material (e.g., offer different rhythm patterns to choose, require different
distances to be calculated).
This format, however, takes more administration time than other selected-response
item formats. For example, the NAEP fourth-grade science item in Figure 7.11 requires
students to read the problem, draw conclusions about Julio’s pulse rate, then analyze
each bar graph to determine if it represents the pattern of Julio’s pulse rate. Because this
takes more time than other selected-response formats, the number of such items must
be limited.
FIGURE 7.11 Example of an NAEP Item with the Interpretive Material in the Options
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
TA B L E 7 . 1
General Guidelines for Writing Selected-Response Items
Each item should assess important content across the range of cognitive levels.
An assessment should not require advanced reading skills unless it is a test of literacy skills.
Selected-response items should follow the conventions of written language (e.g., spelling,
grammar, and punctuation).
write test items asking about details from a caption accompanying a diagram or infor-
mation addressed in a footnote. Items should address the key facts, concepts, and
procedures addressed in an instructional unit.
You should also incorporate novel material that has not been discussed explicitly
in class. In Figure 7.5, for example, the objects that are to be measured (e.g., length of
a playground, height of a house) should be different than the objects used during
instruction. Also, keep the content of each item independent from content of other
items on the test. For example, refer to the eighth-grade items from the NAEP civics
item shown in Figure 7.12. After a student answers item 4, the student might think
that the answer to 4 should be the basis for answering item 5, thus influencing the
student’s answer to item 5. If the student got the first one wrong, the student would
also get the second one wrong.
80
70
60 80%
76%
50
66%
Percentage
40
52%
30
20
37%
10
0
Under $10,000– $20,000– $25,000– $50,000 and
$10,000 19,999 24,999 49,999 over
Income
FIGURE 7.12 Example of NAEP Items in Which a Student’s Answer May Influence Answers
to Other Items
SOURCE: National Center for Educational Statistics. 2005. NAEP questions. Reprinted with permission.
control the reading level required by your assessments by keeping vocabulary simple
and by attending to the length of the sentences.
Another factor possibly contributing to reading demands is the use of negatives
in an item. In developing items, when possible, word the stem positively and avoid
negatives such as not or except. Sometimes, however, you want to know if students
gre78720_ch07_184-223.indd Page 198 4/21/09 7:38:18 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
FIGURE 7.13 Illustration of the Calculation of the Grade Level of Test Materials
Reprinted with permission of Microsoft.
11. Which of the following is a problem that could NOT be solved by one nation alone?
A) National debt
B) Highway traffic
C) Ocean pollution*
D) Government corruption
FIGURE 7.15 Fourth-Grade NAEP History Item with Options Written in Complete Sentences
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
14. A farmer thinks that the vegetables on her farm are not getting enough water. Her
son suggests that they use water from the nearby ocean to water the vegetables. Is this
a good idea?
A) Yes, because there is plenty of ocean water.
B) Yes, because ocean water has many natural fertilizers.
C) No, because ocean water is too salty for plants grown on land.*
D) No, because ocean water is much more polluted than rainwater.
stated that her tests serve as models of the conventions of language and her unique
use of capitalization did not provide appropriate models. In addition, the use of ALL
CAPS is likely to make reading more difficult (Thompson & Thurlow, 2002), thus
increasing reader fatigue.
B O X 7 . 1
A Reflection on Reviewing Test Items babysat for 5 hours, and she earned 20 dollars. The jeans
for Stereotypes Mary would like to buy cost 40 dollars.
One of the word problems in the test described a girl who 22. How many more hours does she need to babysit to be
wanted to buy a new pair of jeans and babysat to earn the able to buy the new jeans?
money she needed. This scenario may promote the idea that A. 3*
babysitting is typical for girls, or that girls are usually preoc- B. 10
cupied by clothes. Therefore, a male was chosen as the char- C. 12
acter of this word problem. Additionally, the objective of D. 40
buying a new pair of jeans was replaced with the one of buy-
ing a new pair of ice skates, because ice-skating is not typical Revised Item:
for just one gender and does not have any stereotypical con-
Sam wants new ice skates and he started babysitting to earn
notations.
the amount of money he needs. On Monday, Sam babysat for
2 hours, and he earned 8 dollars. On Tuesday, he babysat for
Original Item: 5 hours, and he earned 20 dollars. The skates Sam would like
Directions: Read the following sentences and use the infor- to buy cost 40 dollars.
mation provided to answer questions 22, 23, and 24. Indi-
22. How many more hours does he need to babysit to be
cate your response by circling the letter next to the best
able to buy the new skates?
answer.
A. 3*
Mary wants a new pair of jeans and she started babysitting to B. 10
earn the amount of money she needs. On Monday, Mary C. 12
babysat for 2 hours, and she earned 8 dollars. On Tuesday, she D. 40
Answer questions 5–7 based on the following excerpt from Beverly Buchanan’s artist
statement.
My work is about, I think, responses. My response to what I’m calling GROUNDINGS. A
process of creating objects that relate to but are not reproductions of structures, houses
mainly lived in now or abandoned that served as home or an emotional grounding. What’s
important for me is the total look of the piece. Each section must relate to the whole struc-
ture. There are new groundings, but old ones help me ask questions and see possible stories
as answers. Groundings are everywhere. I’m trying to make houses and other objects that
show what some of them might look like now and in the past. (Buchanan, 2007)
7. Which of the following artist quotes would Beverly Buchanan most agree with based
on her artist statement?
A. “Painting is easy when you don’t know how, but very difficult when you do.”
Edgar Degas
B. “Great things are not done by impulse, but a series of small things brought
together.” Vincent Van Gogh*
C. “I feel there is something unexplored about woman that only a woman can
explore.” Georgia O’Keeffe
FIGURE 7.17 Visual Arts Item About the Works of Beverly Buchanan, an African
American Artist
the contributions of women artists? If so, does the assessment for the unit gauge stu-
dent learning about the role of women artists? A visual arts unit by Leslie Drews, an
art teacher, provides an illustration of making visible the contributions of women in
the arts. In Leslie’s unit, she draws from the works of Beverly Buchanan, an African
American artist. The interpretive exercise in Figure 7.17 requires students to analyze
a quote from Buchanan’s artist statement and then compare its meaning to quotes from
other artists—both male and female. In this manner, Leslie contributes to lifting the
veil of contextual invisibility and acknowledging the works of women and African
American artists.
Review the NAEP history item in Figure 7.18. Which people are the focus of this
question? Asking fourth-grade students about the Sinaguan people reminds students
that American history is multicultural with many diverse people contributing to the
story of the United States of America.
Historical distortions occur when instructional units and assessment materials
present only one interpretation of an issue or avoid sensitive topics. For example, a social
studies standard might indicate that students should be able to do the following:
Summarize the Holocaust and its impact on European society and Jewish culture,
including Nazi policies to eliminate the Jews and other minorities, the “Final
Solution,” and the war crimes trials at Nuremberg. (SCDE, 2005, p. 58)
Although focus would naturally be on Nazi actions and atrocities, historical com-
pleteness would require acknowledging the refusal of some countries to help Jewish
refugees fleeing the Third Reich (United States Holocaust Memorial Museum, 2007c).
Similarly, when allied forces liberated prisoners in concentration camps, they did not
free all those who were persecuted by the Nazis. Rather, allied forces transferred some
of the gay men who were liberated from concentration camps to German prisons to serve
the remainder of their sentences (United States Holocaust Memorial Museum, 2007a).
gre78720_ch07_184-223.indd Page 202 3/26/09 12:29:59 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
1. The remains of this Sinaguan cliff house tell us something about the way ancient people lived in what is now the
southwestern part of the United States. Which of the activities below would be the best way to learn how the Sinaguan
people lived in ancient times?
A) Talk to people living near the cliff houses
B) Study letters and diaries left in the cliff houses
C) Camp out in the cliff houses for a couple of days
D) Study tools, bones, and pottery left in the cliff houses*
FIGURE 7.18 NAEP Item That Presents the Multicultural Aspect of United States History
Adapted from National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
A B C
6. Which country contributed most 11. Which country did NOT 15. Allied forces transferred members from
to liberating Jews and others provide help to Jewish which group to German prisons after their
from Nazi concentration camps? refugees on the St. Louis? liberation from the concentration camps?
a. Italy a. Belgium a. Gay men*
b. Japan b. Denmark b. Jews
c. United States of America* c. United States of America* c. Soviets
FIGURE 7.20 NAEP Item That Presents Balanced View of a Historical Event
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
the Supreme Court’s decision in the case. Thus, the item presents a balanced view of
a historical event.
Avoid Bias
In addition to representing diversity, you should avoid introducing bias in an assess-
ment. For example, the fourth-grade science item from NAEP in Figure 7.21 may
introduce bias because of the options offered. Students may not have had equal oppor-
tunity to be familiar with the instruments. The options use language likely to be known
by students in upper middle class homes or wealthy schools. Whether students from
low-income homes or underfunded schools have access to such equipment is question-
able because such science equipment is expensive and, in the case of the “periscope”
option, obscure.
Evidence of bias in this item, however, is equivocal. The NAEP website also
provides statistics on the percentage of students who correctly answered an item. Per-
centage correct is provided for various student groups. The science item in Figure 7.21
was correctly answered by 90 percent of students not eligible for free and reduced-fee
lunch versus 81 percent of students who received free or reduced-fee lunch (National
Center for Educational Statistics [NCES], 2005). Percentage correct was 90 percent for
gre78720_ch07_184-223.indd Page 204 3/26/09 12:29:59 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
1. If you wanted to be able to look at the stars, the planets, and the Moon more closely,
what should you use?
A) Magnifying glass
B) Microscope
C) Periscope
D) Telescope*
FIGURE 7.21 An Item That Might Exhibit Bias for Students from Low-Income Homes
and Schools
Adapted from National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
white students, 79 percent for African American students, and 82 percent for Hispanic
students. Reviewing items to determine if the concepts are addressed in instruction
and the language is accessible to all students helps to avoid bias.
Write Clear Stems One key to developing effective multiple-choice items is to ensure
that the stem clearly conveys what the test taker should do with the item. The stem
TA B L E 7 . 2
Guidelines for Writing Multiple-Choice Items
Write a clear stem for the multiple-choice item.
FIGURE 7.22 A Completion Item in Which the Stem Structures the Problem to Be Addressed
Adapted from National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
FIGURE 7.23 Example of an Interpretive Exercise with Distracters That Provide Information
about Student Misconceptions
must structure the problem to be addressed. This is often best accomplished by having
the stem ask a direct question, especially for beginning item writers. Stems should also
be fairly short and the vocabulary simple. In the fourth-grade NAEP civics item in
Figure 7.22, the problem to be addressed is who the signatory parties in a peace treaty
are, so it communicates clearly what students should do with the item. When the item
uses the completion format, the omitted information should occur at the end, or near
the end, of the stem as shown in Figure 7.22.
Develop Distracters That Provide Insight into Student Learning In designing options
for selected-response items, develop distracters that can help you understand students’
thinking. Student choice of a distracter can provide diagnostic information about why
a student missed an item if distracters include misconceptions, faulty algorithms, or
incomplete calculations (Clarke et al., 2006). If the distracters contain common mis-
conceptions or errors typically made by students with partial knowledge, a test with
multiple-choice items can serve as a formative assessment that informs later instruc-
tional decisions. For example, in Figure 7.23, option B is the correct answer because
the phrase “melts away” completes the vignette described in the haiku and provides
the appropriate number of syllables (i.e., 5 syllables on the first line, 7 syllables on the
second line, and 5 syllables on the third line). If a student chose option A, the teacher
knows that the student likely realizes the poem must have meaning (i.e., Winter is
done); however, the student may not understand the 5-7-5 syllable pattern required
for a haiku. Option C indicates that the student may not understand the syllable pat-
tern and may think that haiku must rhyme.
In addition, the options provided should be conceptually similar. In the fourth-
grade NAEP item in Figure 7.24, the options are conceptually similar: in options A,
gre78720_ch07_184-223.indd Page 206 3/26/09 12:30:00 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
A)
*
B)
C)
D)
FIGURE 7.24 Example of an NAEP Multiple-Choice Item with Distracters That Provide
Information About Student Misconceptions
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
B, and D, three parts of the figure are shaded. In option A, however, the shaded parts
are 3 of 7, not 3 of 4. In option D, 3 of 4 parts are shaded; however, the shaded parts
are not of equal size. These options, then, provide information about the partial knowl-
edge that students have and implications for additional instruction on fractions.
Develop Reasonable Options In writing the options, develop as many plausible ones
as you can, but make sure none are preposterous. Sometimes, in an effort to generate
the typical four or five options, teachers come up with absurd options that no one
would ever choose. These provide no useful information about student thinking and
increase the reading load. Research indicates that three options (i.e., two plausible
distracters and one right answer) are adequate for multiple-choice items (Haladyna,
2004). Another important consideration is that the choices are independent; that is,
the choices should not overlap (Figure 7.25).
If you use None of the above and All of the above, do so carefully. A problem with
this format is that for a four-option item, if two options are correct, the test-wise student
knows to select All of the above. This is so even if the student does not know whether
the third option is correct. Another concern with None of the above and All of the above
is that teachers tend to include these options only when they are the correct answer. If
you use these options as the correct answers, you should also use them as distracters in
some items. We suggest that you use All of the above and None of the above sparingly.
Formula C CO2
14. Based on the information in the table above, which is a reasonable hypothesis
regarding elements and their compounds?
A) An element retains its physical and chemical properties when it is combined into
a compound.
B) When an element reacts to form a compound, its chemical properties are
changed but its physical properties are not.
C) When an element reacts to form a compound, its physical properties are changed
but its chemical properties are not.
D) Both the chemical and physical properties of a compound are different from the
properties of the elements of which it is composed.*
FIGURE 7.26 Example of a Science Multiple-Choice Item with Options Arranged Shortest
to Longest
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
shortest to longest). The options are ordered from shortest to longest in the interpretive
exercise from twelfth-grade NAEP science shown in Figure 7.26.
Arranging items in numerical order is another helpful strategy. In the eighth-
grade mathematics item in Figure 7.27, the options for the interpretive exercise have
been arranged in numerical order. If students know an answer, we want them to be
able to select it and move on. Little is gained if students spend time searching through
a set of options for an answer. A student who knows that the answer is 1.4 will select
the response and quickly move to the next problem.
For ease of reading, options should be listed in one column or in a single row.
As you see in Figure 7.28, teachers at times will incorrectly format the answer options
into two columns. A student could easily understand that he lives in North America
and, if a teacher has students record their answers on a separate answer sheet, the
student might record a “b.” A teacher might inaccurately interpret the student’s response
to mean the student does not even know the continent on which he lives. In actuality,
the formatting of the options likely confused the student. Thus, options should be listed
in one column or arranged in a single row.
You might wonder if ordering options in a logical manner makes a difference.
A history teacher shared with us that his students noticed when he started organizing
options logically. At the beginning of a test, students asked the teacher whether there
was a pattern to the answers. When the teacher told his class he arranged the choices
in alphabetical order and by length of answer, the students looked at him a bit aston-
ished. They appeared a little surprised at the amount of thought and effort that had
gone into making the test. Perhaps the greatest advantage of arranging options in
logical order is that it conveys to students that you are not trying to trick them with
a confusing order for the answers.
gre78720_ch07_184-223.indd Page 208 3/26/09 12:30:02 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
O 1 x
1
11. On the curve above, what is the best estimate of the value of x when y 0?
A) 2.0
B) 1.1
C) 1.4*
D) 1.7
E) 1.9
Avoid Providing Clues to the Answer In writing multiple-choice items, avoid provid-
ing clues to the answer. For example, use of the article an can lead a student to ignore
any options that begin with consonants. As shown in Figure 7.29, use of “a(n)” can
eliminate this type of grammatical clue. Similarly, to rule out an option, teachers some-
times add the qualifier always or never. Students soon learn that such absolutes are
rarely true and quickly eliminate such choices.
Another habit in item writing is the tendency to make the correct response the
longest response, as was the case in Figure 7.26. Students who are test-wise will pick
up on this tendency and use it when they must guess an answer. To avoid this problem,
some authors recommend keeping options about the same length. Another possibility
gre78720_ch07_184-223.indd Page 209 3/26/09 12:30:04 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
8. A living thing that uses the energy from the sun to make its own food is a(n) .
A) bird
B) frog
C) insect
D) tree*
FIGURE 7.30 SCAAP Theater Item with Options Arranged from Short to Long
SOURCE: South Carolina Arts Assessment Program (SCAAP). 2002. Reprinted with permission.
12. A green tree frog lives in a forest. How does the frog’s green color help it to survive?
A) By keeping the frog cool
B) By helping the frog find other frogs
C) By allowing the frog to make its own food
D) By making the frog hard to see when sitting on leaves*
FIGURE 7.31 Example of an NAEP Science Item That Avoids Giving Clues to the Answer
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
is to follow our previous recommendation to arrange options that are phrases or com-
plete sentences from shortest to longest, as you see in Figure 7.30. If you do so, you will
quickly notice if the correct answer is typically last, and you can revise some options.
Another clue to an answer can occur when a teacher includes a distracter that
is silly or outrageous. Quick elimination of that choice enables students to zero in on
the plausible answers. Our previous discussion on the inappropriate use of humor
during an assessment also recommends against using ridiculous distracters.
Finally, teachers sometimes repeat a word from the stem in the correct option.
The fourth-grade science item from NAEP, shown in Figure 7.31, shows an instance
where the writers avoided this mistake. If the writers had written option D as “By
making the frog hard to see when sitting on green leaves,” the student would have been
directed to the answer because “green” would appear in both the stem and the answer.
To avoid giving clues to items, attend to issues of grammar, qualifiers (e.g., all, none),
lengthy choice options, implausible distracters, and repetition of words in the stem and
the correct response.
True-False Format
Guidelines specific to writing true-false items are shown in Table 7.3. In this section,
we discuss these guidelines and provide examples of their application.
gre78720_ch07_184-223.indd Page 210 3/26/09 12:30:05 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
TA B L E 7 . 3
Guidelines for Writing True-False Items
Use interpretive material to develop true-false items that span the cognitive levels.
In writing a true-false item, you should not mix partially true with partially false statements.
Balance the number of items that are true and the items that are false.
Develop True-False Items That Span the Range of Cognitive Levels Use of interpre-
tive material can raise the cognitive level of an item from remember to apply or ana-
lyze. Figure 7.32 shows an example. In item 8, for example, the student will need to
balance the equations before answering true or false. Such thinking requires students
to apply the procedures for balancing chemical equations.
Avoid True-False Items That Are Partially True and Partially False In writing a true-
false item, you should not mix partly true with partly false statements. In Figure 7.33,
the first part of the statement is true (Foxes are carnivores); however, the second part
of the statement (that serve as decomposers in a food chain) is false, because foxes are
consumers. Your students are left with a dilemma. If they mark the item as true, and
you count their response wrong, then the students’ grades might not reflect their knowl-
edge of carnivores. True-false statements should be all true or all false.
Balance the Number of True and False Items As you develop items, you should make
sure you balance the number of items that are true and the number of items that are
false. However, people who do not know an answer tend to use the true response, so
it can be beneficial to use slightly more false than true statements.
Use the following chemical equation to answer items 6–9. Circle T if the statement is
true and F if the statement is false.
NaOH H2SO4 → Na2 SO4 H2O
6. T* F Water is one product of this chemical reaction.
7. T F* Sodium sulfate is a reagent in this chemical equation.
8. T F* The balanced equation will have 8 oxygen atoms.
9. T* F The same elements will be present in the balanced equation.
FIGURE 7.32 Example of the Use of Interpretive Material to Assess Higher-Order Cognitive
Skills with True-False Items
TA B L E 7 . 4
Guidelines for Writing Matching Items
The premises and responses should be arranged in a logical order (e.g., alphabetically,
numerically, chronologically).
Use no more matching items than 10 to 15 for older students and 5 to 6 items for younger
students.
Limit guessing by using answers more than once or having more answers than premises so
that students cannot get items correct through the process of elimination.
Avoid Providing Clues to the Answer As with multiple-choice items, the use of qual-
ifiers (e.g., none, all) cue students that the statement is likely false because few rules,
guidelines, or concepts are appropriately described in such absolute terms. Thus, the
statement All bodies that orbit the sun are called “planets.” gives a clue to students that
any exception to this general statement makes it false.
Matching Items
Guidelines for writing matching items are shown in Table 7.4. In this section, we dis-
cuss these guidelines and examine examples of their application.
Arrange Matching Options in a Logical Order The premises and responses should
be arranged in a logical order (e.g., alphabetically, numerically, chronologically). Con-
sider a matching exercise in which a student is to match significant historical events
with the major figures involved with each. Ordering the list of historical figures alpha-
betically by name will assist the student in locating the appropriate person associated
with an event and then moving to the next event. You may be inclined to say, “I want
my students to search out the material; so it makes sense to jumble the names instead
of alphabetizing them.” But recall our introductory guiding thought suggesting that
students should miss an item only if they do not know the answer. If students miss an
item because they had difficulty finding the answer in a long, jumbled list of names,
we do not know what they actually understand.
Also, matching items should be arranged on a single page so students do not have
to flip back and forth between pages when answering the items. When considering the
number of matching items, you should limit yourself to no more than 10 to 15 per
exercise if you teach older students. For elementary students, you should use a maxi-
mum of 5 or 6 items. Also, in the directions prior to the matching items, state whether
each answer option is to be used once or if options are used more than once.
Avoid Providing Clues to the Answer To avoid providing clues for matching items,
you should make the responses homogeneous. In Figure 7.6 the answers are all general
categories of organisms associated with a food chain. The premises are all specific
instances of organisms in a food chain. Thus, the response options of consumers,
decomposers, and producers are all plausible answers for the premises.
Another method to avoid clues is to use the answers more than once or have
more answers than premises. Having a limited number of answers means each will be
gre78720_ch07_184-223.indd Page 212 3/26/09 12:30:05 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
used several times. Thus, students cannot use the process of elimination and save an
item they are unsure of to see which response is left and must be the correct answer.
For example, in Figure 7.5, the answer “meter” might be used for both “length of the
playground” and “the height of a house.” Having more answers than premises will also
prevent the process of elimination from providing the right answer. Getting matching
items right by the process of elimination violates the guiding thought at the beginning
of the chapter that students should get items correct only if they know the material,
not because they find clues to guide them to the right answer.
As was true for multiple-choice and true-false items, the use of interpretive mate-
rial can raise the cognitive level of an item from lower cognitive levels (e.g., knowledge)
to higher ones. As Figure 7.34 shows, a student must remember the value of each coin
and then combine the values of the coins to come to the total.
Put the letter of the correct numerical amount in the blank to the left of each row of coins.
a. .50
b. .75
1.______ c. $1.00
2._____
3._____
4._____
5._____
FIGURE 7.34 Example of a Matching Item That Engages Students in Cognitive Skills at the Understanding
Level or Higher
gre78720_ch07_184-223.indd Page 213 3/26/09 12:30:07 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Focusing on trivia 5. In A Wrinkle in Time, what was Address 5. Which theme is developed in
the color of Calvin’s hair? important A Wrinkle in Time when the
a. Black content children visit Camazotz?
b. Blond a. Ambition
c. Brown b. Conformity*
d. Red* c. Truth
Requiring students only 8. A limerick is a Incorporate Use the following poem to answer
to remember facts a. 3-line poem about nature. higher levels items 8–11.
b. 5-line poem about something of cognitive There once were some very
funny.* skills naughty cats
c. 9-line poem that describes a Who gave chase to some terrified
person. rats.
The rats grew tired,
And a bull dog they hired.
Now the chase is on for the cats!
8. The above poem is a
a. cinquain.
b. haiku.
c. limerick.*
Using implausible or What is the national anthem of the Use plausible What is the national anthem of the
confusing distracters United States? distracters United States?
a. No Air a. America, The Beautiful
b. Dream Big b. God Bless America
c. Stop and Stare c. The Star-Spangled Banner*
d. The Star-Spangled Banner* d. This Land Is Your Land
Ignoring the conventions 4. Annette completes an experiment Apply the 4. Annette completes an experiment
of written language in in which she applies different conventions in which she applies different
formatting items amounts of fertilizer to her of written amounts of fertilizer to her
tomato plants. She applies one language tomato plants. She applies one
scoop of fertilizer to one plant, scoop of fertilizer to one plant,
two scoops to the second plant, two scoops to the second plant,
and three scoops to the third. and three scoops to the third.
During the growing season she During the growing season she
records the weight of tomatoes records the weight of tomatoes
she harvests from each plant. she harvests from each plant.
Which of the following is true? Which of the following is true?
A. the fertilizer is a control A. The fertilizer is a control
variable variable.
B. the experiment lacks an B. The experiment lacks an
independent variable independent variable.
C. the weight of the tomatoes is C. The weight of the tomatoes is
the dependent variable* the dependent variable.*
TA B L E 7 . 5
Guidelines for Constructing the Assessment
Prepare test directions that tell students how to answer the items and the number of points
each item is worth.
Provide new directions for each type of item format (e.g., multiple-choice, matching, essay).
Load the beginning of each section of the test with easier items to lower the anxiety level
of students.
Place items in the test so that you do not have patterns in correct answers (AABBCCAA).
Vary the location of the right answer so that the correct answer is fairly evenly distributed
across the options.
roughly the same number of correct answers that are associated with the options A,
B, C, and D. Test-wise students quickly pick up on patterns and recognize, for example,
when a teacher rarely uses option A. Or, test-wise students quickly grasp when a
teacher uses a repeating answer pattern to make grading easier. When a student gets
an item correct, it should be because the student knows the answer, not because we
gave clues.
TAND
HE VALUE OF STUDENT-GENERATED ITEMS
CRITICAL THINKING
If you teach students in upper-elementary grades or higher, consider involving students
in the development of some items. This idea is not as farfetched as it may seem initially.
Such a practice is consistent with math exercises in which students are given a number
sentence and then asked to give an example of a story that fits the problem. Such tasks
allow students to practice the skill of establishing a problem consistent with a math-
ematics operation.
In one of our classes, a kindergarten teacher shared her experience in writing
multiple-choice items with her students. Students helped in formulating the wording
of some of the questions for a life science unit on the characteristics of organisms. In
one instance, the teacher could not decide if she should use the word “regurgitate” or
the words “cough up” to tell how the mother penguin feeds the baby. As we discussed
earlier, difficult-to-read test items should not be used because they can interfere with
students’ expression of their knowledge in science, social studies, or mathematics. The
teacher addressed the issue of readability by involving her kindergarten students in
the selection of key terms for inclusion in the multiple-choice items. In the case of the
penguin example, the children voted, and the word “regurgitate” was the one they
wanted included. The teacher indicated that the involvement of her kindergarteners in
developing the questions gave them ownership. In addition, to reduce reading demands
for her kindergarteners, the teacher administered the assessment orally.
Involving students in designing items offers the potential to deepen their under-
standing of the learning goals that are the focus of the assessment. For example, as
students in social studies learn to write interpretive exercises that use graphs to show
trends, the students also begin to understand the manner in which they should approach
a line graph in an interpretive exercise on a test they are taking. The possibility also
exists that students will begin to look at items in published tests in a more analytic
manner. Thus, a skill you have taught and assessed in your classroom may generalize
to other testing situations. Engaging students analytically also assists them in acquiring
the ability to think critically—an important skill for citizens in a democracy.
Student development of items will also help you understand their depth of
knowledge and their misconceptions. For example, we have worked with our teaching
assistants to develop tests for our undergraduate courses in assessment. Our teaching
assistants, who are advanced doctoral students, follow the item-writing guidelines in
preparing items for a test; however, their initial items focus on less-important concepts
and skills than those we consider to be key ideas. This item-writing experience gives
us insights into our teaching assistants’ grasp of the concepts. Also, if a student has
marked the wrong option as the answer, or cannot develop an item related to the
learning goals, you have information about concepts the student has not mastered.
Writing selected-response items is a complex task; so if you plan to involve students
in writing items, you should model the skill of item writing. At first your students’
items are likely to require only remembering of factual knowledge. However, if you
gre78720_ch07_184-223.indd Page 217 3/26/09 12:30:08 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
model for students the construction of an interpretive exercise, they can begin to
develop items that require analysis or application. A group activity might involve stu-
dents analyzing some well-written examples and differentiating them from problematic
items. Then students should practice writing items in small groups and, ultimately,
begin writing items on their own.
The items students develop can be used for formative purposes and used as
practice for a summative assessment. However, you should not use student items on a
summative test because at least some already know the correct answer; therefore, they
would not have to use higher thinking skills to answer. Also, errors students made in
writing items may make the item unclear to their peers. Another reason for not using
student-developed items in a summative test is that students will have unequal access
to the answers to the items. As they write their items, students will likely share them
with their friends, so some students will have seen several of the items and other
students will have seen few of the items.
ASELECTED
CCOMMODATIONS FOR DIVERSE LEARNERS:
-RESPONSE ITEMS
Individual teachers, as well as state testing programs, have developed a multitude of
methods in which diverse student needs can be accommodated to help ensure that the
performance of students on an examination depends only on their understanding of
the content the examination measures. In this section we consider methods for accom-
modating (i.e., differentiating) assessments that use selected-response items to gauge
student attainment of learning goals. We examine each of the student considerations
first described in Chapter 3 and then discuss appropriate accommodations for them
related to the selected-response formats presented in this chapter.
TA B L E 7 . 6
Accommodation Considerations for Selected-Response Items
Issue Type of Accommodation
Sensory challenges • Provide signing of oral directions for students with auditory
(e.g., visual, auditory) impairments.
• Enhance legibility.
• Limit use of graphics to only those that contain information
being assessed.
• Use large print or Braille for visual impairments.
Learning goal already • Use interpretive exercises to engage student at the analysis
mastered and application levels of cognition.
Visual cues in the form of photographs and graphics might provide information
that clarifies the meaning of the text for students who are learning English. These
students might also benefit from the use of bilingual dictionaries if the assessment is
focused on areas other than reading. Several teachers we know with experience with
English language learners also suggest that students learning English sometimes
respond more positively to written language when it is presented one sentence at a
time rather than in a continuous paragraph. The teacher should also consider use of
alternate-choice items to reduce the reading load.
understanding will benefit from visual cues that clarify the meaning of the text or the
intent of the test item.
Read the following statements made by imaginary people. Based on what we have
learned about the characteristics of utopian and dystopian societies and literature,
decide if the imaginary person lives in a utopia or dystopia.
If the imaginary person lives in a utopia, circle U.
If the imaginary person lives in a dystopia, circle D.
U* D Hiking in the great outdoors is what I like to do with my free time. Not only do I
enjoy its beauty, but I also feel like I can really learn a lot from nature.
U D* As the mayor of our town, I’m pleased to report that crime has dropped
considerably, largely due to the fact that our police can now monitor every
portion of our city with real-time video surveillance.
HELPFUL WEBSITES
http://nces.ed.gov/nationsreportcard/
A website with items from the National Assessment of Educational Progress (NAEP). Includes items
in the subject areas of civics, economics, geography, mathematics, reading, science, United
States history, and writing. All NAEP questions in this chapter come from this website.
http://www.scaap.ed.sc.edu/
A website with sample selected-response items for the South Carolina Arts Assessment
Programs. Arts areas assessed include dance, music, theater, and the visual arts. SCAAP
art questions in this chapter come from this website.
gre78720_ch07_184-223.indd Page 222 3/26/09 12:30:09 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
REFERENCES
Buchanan, B. 2007. Artist’s statement. Retrieved October 26, 2007 from http://www.
beverlybuchanan.com/statement.html.
Clarke, N., S. Stow, C. Ruebling, and F. Kaynoa. 2006. Developing standards-based curricula and
assessments: Lessons for the field. The Clearing House 79(6): 258–261.
Downing, S. M., R. A. Baranowski, L. J. Grosso, and J. R. Norcini. 1995. Item type and cognitive
ability measured: The validity evidence for multiple true-false items in medical specialty
certification. Applied Measurement in Education 8: 189–199.
Greenwald, E., H. Persky, J. Campbell, and J. Mazzeo. 1999. The NAEP 1998 writing report card for the
nation and the states (NCES 1999–462). Washington, DC: U.S. Government Printing Office.
gre78720_ch07_184-223.indd Page 223 3/26/09 12:30:09 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
References 223
Haladyna, T. 2004. Developing and validating multiple-choice test items. 3rd ed. Hillsdale,
Lawrence Erlbaum Associates.
Haladyna, T. M., and S. M. Downing. 1993. How many options is enough for a multiple-choice
item? Educational and Psychological Measurement 53: 999–1010.
Haladyna, T., S. Downing, and M. Rodriguez. 2002. A review of multiple-choice item-writing
guidelines for classroom assessment. Applied Measurement in Education 15(3): 309–334.
L’Engle, M. (1962). A Wrinkle in Time. New York, NY: Ferrar, Strauss, and Giroux.
Microsoft Office Word 2007. 2007. Test your document’s readability. Redmond, WA: Author.
Myers, M., and P. D. Pearson. 1996. Performance assessment and the literacy unit of the New
Standards Project. Assessing Writing 3(1): 5–29.
National Center for Educational Statistics. 2005. NAEP questions. Retrieved June 19, 2008 from
http://nces.ed.gov/nationsreportcard/itmrls/.
Sanders, W., and P. Horn. 1995. Educational assessment reassessed. Education Policy Analysis
Archives 3(6). Retrieved October 11, 2007 from http://epaa.asu.edu/epaa/v3n6.html.
South Carolina Arts Assessment Program (SCAAP). 2002. Sample test. Retrieved October 26,
2007 from http://www.scaap.ed.sc.edu/sampletest/.
South Carolina Department of Education. 2005. South Carolina social studies academic standards.
Retrieved June 2, 2008 from http://ed.sc.gov/agency/offices/cso/standards/ss/.
Thompson, S., and M. Thurlow. 2002. Universally designed assessments: Better tests for everyone!
NCEO Policy Directions No. 14. Minneapolis, MN: National Center on Educational
Outcomes.
United States Holocaust Memorial Museum. 2007a. Press Kit: Nazi persecution of homosexuals,
1933–1945. Retrieved October 26, 2007 from http://www.ushmm.org/museum/press/kits/
details.php?content=nazi_persecution_of_homosexuals&page=02-background.
United States Holocaust Memorial Museum. 2007b. Voyage of the Saint Louis. Holocaust
Encyclopedia. Retrieved September 9, 2007 from http://www.ushmm.org/wlc/article.php?
lang=en&ModuleId=10005267.
United States Holocaust Memorial Museum. 2007c. Wartime fate of the passengers of the St. Louis.
Holocaust Encyclopedia. Retrieved September 9, 2007 from http://www.ushmm.org/
wlc/article.php?lang=en&ModuleId=10005431.
Worthen, B., and V. Spandel. 1991. Putting the standardized test debate in perspective.
Educational Leadership 48(5): 65–69.
Yap, C. 2005. Technical documentation for the South Carolina Arts Assessment Project (SCAAP):
Entry-level dance & theatre assessment field test 2005. Columbia: University of South
Carolina, Office of Program Evaluation.
Yap, C., M. Moore, and P. Peng. 2005. Technical documentation for the South Carolina Arts
Assessment Project (SCAAP) Year 3: 4th-grade music and visual arts assessments.
Columbia: University of South Carolina, Office of Program Evaluation.
gre78720_ch08_224-261.indd Page 224 3/26/09 12:47:32 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
224
gre78720_ch08_224-261.indd Page 225 3/26/09 12:47:36 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 8
TEACHER-MADE ASSESSMENTS:
SHORT ANSWER AND ESSAY
Many issues that confront individuals and society have no generally
agreed-on answers. Yet understanding issues like these constitutes much
of what is genuinely vital in education
–Robert M. Thorndike
INTRODUCTION
The quote from Thorndike reminds us that many questions can be answered from
more than one point of view. For many, a hallmark of a democratic society is the
freedom to express diverse opinions. The ability to present an argument and to
justify decisions are key skills. Thus, in assessing students’ understanding of com-
plex topics, at times you will want to use an assessment format that requires this
range of thinking skills and allows students to express their understanding in their
own words. Constructed-response items, such as short-answer items and essays,
offer such a framework. This approach can be contrasted with selected-response
items (see Chapter 7), which provide possible answers from which a student
chooses.
We have observed firsthand the power of essays to engage students in critical
thinking. In a unit on historical fiction, students in one of our sixth-grade classes
read Friedrich (Richter, 1987), a novel about two friends, one a Jewish boy and the
other a Christian boy, growing up during the rise of Nazism. The book portrays the
different fates of the boys as they grew into their teens as the Nazi movement grew
225
gre78720_ch08_224-261.indd Page 226 3/26/09 12:47:36 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
in power. Students also read Sadako and the Thousand Paper Cranes (Coerr, 1977),
which related the story of a Japanese girl who developed leukemia from the fallout
from the nuclear bomb dropped on Hiroshima. Students then wrote an essay in which
they argued whether contemporary society visits such injustices on groups of people.
Students’ essays were graded not on the position they took but on their use of evidence
to develop their argument. In grappling with an issue with no simple, generally agreed
upon answer, the students worked on the critical skills necessary for participation in
a democratic society.
AAND
LIGNING ITEMS WITH LEARNING GOALS
THINKING SKILLS
In considering the use of constructed-response items for formative assessment or in
a test, you first should review the learning goals and cognitive strategies that are the
focus of your instruction. Constructed-response items will be useful in assessing stu-
dent achievement of the learning goals associated with application, analysis, evalua-
tion, and creation. For example, if one learning goal for students is to “Establish and
use criteria for making judgments about works of art” (Ohio State Department of
Education, 2003), then the assessment will require students to develop criteria and
apply them in critiquing paintings or sculptures. The constructed-response format
lends itself well to assessment of students’ ability to engage in the cognitive process
of evaluation.
CONSTRUCTED-RESPONSE FORMATS
Three forms of constructed-response assessments are the short answer, the essay, and
the performance task. In this chapter we focus on short-answer and essay forms. We
discuss performance tasks in Chapter 9.
Short-Answer Formats
Short-Answer Items Short-answer items take the form of a question or an incomplete statement. Student
Items that require students to responses may be a single word, a phrase, or a few sentences. In Figure 8.1, the first
supply short responses such as a item is written as a question. The second item is written as an incomplete statement
single word, a phrase, or a few
sentences. and requires only a single-word response. The science item from NAEP shown in
Figure 8.2 is an example of a short-answer item that requires students to write a few
sentences.
Question Format:
12
Number of Students
10
0
Vanilla Chocolate Strawberry
Ice Cream Flavors
The flavor of ice cream that the most students ordered was ___________________.
FIGURE 8.1 Question and Incomplete Statement Formats for Short-Answer Items
5. You are going to the park on a hot day and need to take some water with you. You have
three different bottles, as shown in the picture below. You want to choose the bottle that
will hold the most water. Explain how you can find out which bottle holds the most water.
Complete
Student demonstrates an understanding of how to measure and compare the volumes of
three different bottles by outlining a method for finding which bottle holds the most water.
a. The bottles are filled with water and the water is measured in a graduated cylinder,
measuring cup, or other measuring device to see which bottle holds the most water.
b. Using a displacement method, each bottle is filled with water. Then placing each
bottle into a measured volume of water, the amount of displaced water is measured.
The bottle displacing the most water has the greatest volume.
c. Filling one bottle with water and pouring it into the other bottles, the one that holds
the most water can be determined.
d. Weighing the bottles with and without water, the bottle that holds the greatest weight
in water can be determined.
e. Student fills each bottle at the same constant rate to determine which takes longest to fill.
FIGURE 8.3 Possible Complete Answers for the Short-Answer Item from the Fourth-Grade
NAEP Science Assessment (Figure 8.2)
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
gre78720_ch08_224-261.indd Page 229 3/26/09 12:47:41 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
2)
FIGURE 8.4 An Example of an NAEP Essay Item That Limits the Content of the Answer
SOURCE: National Center for Educational Statistics. 2006. NAEP question. Reprinted with permission.
of the response. In the eighth-grade NAEP civics item in Figure 8.4, the question is
structured to limit the students’ response to two types of information needed and a
justification of the importance of that information.
In other instances, essays allow students to organize information to express their own
ideas. For example, the essay prompt in Figure 8.5 establishes a purpose for the writer—that
is, to persuade a friend that the writer’s position on registering to vote is the right one.
However, the intent of this task is to assess students’ ability to build a persuasive argument,
not to see if they can provide correct factual information. This essay allows flexibility for
students in the choice of arguments and type of support provided for them.
1. Your school is sponsoring a voter registration drive for 18-year-old high school
students. You and three of your friends are talking about the project. Your friends say
the following,
Friend 1: “I’m working on the young voters’ registration drive. Are you going to come
to it and register? You’re all 18, so you can do it. We’re trying to help increase
the number of young people who vote and it shouldn’t be too hard—I read
that the percentage of 18- to 20-year-olds who vote increased in recent
years. We want that percentage to keep going up.”
Friend 2: “I’ll be there. People should vote as soon as they turn 18. It’s one of the re-
sponsibilities of living in a democracy.”
Friend 3: “I don’t know if people should even bother to register. One vote in an elec-
tion isn’t going to change anything.”
Do you agree with friend 2 or 3? Write a response to your friends in which you explain
whether you will or will not register to vote. Be sure to explain why and support your posi-
tion with examples from your reading or experience. Try to convince the friend with whom
you disagree that your position is the right one.
FIGURE 8.5 An Example of a Persuasive Essay Item from the Twelfth-Grade NAEP
Writing Assessment
SOURCE: National Center for Educational Statistics. 2005. NAEP question. Reprinted with permission.
the conflict between Teresia and Corollia. In responding to the item in an essay format,
the student must
1. Provide the additional information required for making decisions about
involvement in conflicts.
2. Justify the importance of the required information.
3. Compose a brief answer.
A disadvantage of essays is that they assess less broadly than other formats, even
though they probe in more depth. In a multiple-choice format, the student reviews the
options with plausible types of information and justifications and selects an acceptable
response. Most students can answer a multiple-choice item in about one minute. The
essay in Figure 8.5, in contrast, will require between 15 and 30 minutes to compose. To
balance the need for in-depth coverage of concepts with the need for broad sampling
of ideas that are the focus of the learning goals, one option we recommend is to use
both multiple-choice and essay items in a test.
TA B L E 8 . 1
General Guidelines for Writing Constructed-Response Items
Develop items requiring the same knowledge, concepts, and cognitive skills as practiced
during instruction.
Incorporate novelty so that the task requires students to generalize the skills and concepts
presented during instruction.
Consider the developmental level of students in terms of the complexity of the task and
the length of the response.
Incorporate Novelty
If you use the same constructed-response items in your assessment that you used in class
activities, the cognitive strategy you tap is remembering. By including novel material as
an integral part of a constructed-response task, you will be able to write short-answer
items and essay prompts that require students to analyze, evaluate, and create. By novel
material, we are not suggesting you should assess students on new concepts and strategies
that were not part of a learning unit. Instead, we mean new examples related to the
concepts being learned. For example, during instruction you might engage students in
the interpretation of political cartoons. An essay item, then, would have students inter-
pret a political cartoon from the same historical period, but a cartoon the students had
not discussed in class. Students would analyze the new cartoon and link it to events
occurring in the historical period that is the focus of the instructional unit.
cm
23. What is the approximate length in centimeters of the scissors in this picture?
student development should be reflected in the complexity of the item and the length
of the expected response. For example, topics for younger students might refer to
specific situations (e.g., school rules), whereas older students can write about more
abstract concepts (e.g., civil liberties).
Determine Whether Short-Answer Format Is Best First, you must decide whether
an item functions better as a short-answer item or a multiple-choice item. Look at item 3
in Figure 8.7. When the item is offered as a multiple-choice item, students have various
equations to analyze to determine which equation models the rate plan. By contrast,
TA B L E 8 . 2
Guidelines for Writing Short-Answer Items
Determine whether an item should be short answer or multiple choice.
Use interpretive material to develop items that engage students at the higher cognitive levels.
Omit only key words and phrases when using the completion form of short answer.
Multiple-Choice Short-Answer
Use the following information to answer 3. A cellular phone company charges
item 3. monthly rates according to the
A cellular phone company charges monthly following plan:
rates according to the following plan: • Monthly fee of $23.95
• Monthly fee of $23.95 • The first 100 minutes of calling
• The first 100 minutes of calling time time are free
are free • $0.08 charge per minute of calling
• $0.08 charge per minute of calling time over 100 minutes
time over 100 minutes If c is the total monthly cost, and m is the
3. If c is the total monthly cost, and m is number of minutes of calling time, write
the number of minutes of calling time, the equation that models this rate plan
which equation models this rate plan when m is greater than 100 minutes.
when m is greater than 100 minutes?
A. c ⫽ 0.08m ⫺ 76.05
B. c ⫽ 0.08(m ⫺ 100) ⫺ 23.95
C. c ⫽ 23.95 ⫹ 0.08(m ⫺ 100)*
D. c ⫽ 23.95 ⫹ 0.08m
the short-answer version of the item requires students to create an equation to model
the rate plan. Both versions of the item are challenging; however, the short-answer item
requires more of the students. If your learning goal is for students to be able to analyze
information to determine the appropriate model, and your instruction involves students
analyzing verbal information and determining the correct equation, a multiple-choice
item might be appropriate. However, if your learning goals and instruction focus on
students writing (creating) a linear model, then the short-answer version is more
appropriate for assessing student knowledge.
Engage Students at Higher Cognitive Levels Let’s dispel the myth that short-answer
items necessarily focus on the simple recall of facts. The modeling of an equation in
Figure 8.7 and the calculating of density in Figure 8.8 both require students to use
cognitive levels higher than remembering. If combined with interpretive materials,
such as charts, quotes, or short musical pieces, an item may require the student to
analyze the material prior to answering the item.
Conversely, we do not want you to think that writing an item in short-answer form
will always increase the cognitive level of the item. Look at the science item in Figure 8.8.
Cover the answer options and read the item in a short-answer form. Notice that your
answer to the question is going to be sample 1, 2, 3, or 4. The multiple-choice form simply
gives you a method for recording your response. So, sometimes the cognitive level will
not be influenced by the use of short-answer items versus multiple-choice items.
Use Brief Stems In writing short-answer items, keep the stem brief. Notice the format
of our examples that use interpretive material. A set of directions introduces the interpre-
tive material, indicating that the student should use the following material to answer the
gre78720_ch08_224-261.indd Page 234 3/26/09 12:47:44 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Physical Properties
Sample
Number Mass Volume
1 89 g 10 mL
2 26 g 10 mL
3 24 g 100 mL
4 160 g 100 mL
13. Given that the density of water is 1 g/mL, which of the samples is most likely cork?
A. 1
B. 2 Cover the answer options to see the item requires the
C. 3* same cognitive skills whether written as a short-
D. 4 answer item or multiple-choice item.
FIGURE 8.8 Example of an Item with Similar Cognitive Demands for a Short-Answer or
Multiple-Choice Item
SOURCE: California Department of Education. 2008. California standards test released test questions: Grade 8 science.
From http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp. Reprinted with permission.
6 0 3
5 1 2 5 7
4 1 3 3 6
3 1 3 4 5 5 6 7 7 7 8 9
2 0 5 6 9
3 9 means 39 blinks
16. Write the number that is the mode for blinking. Edited version
item. Next is the interpretive material, which includes scenarios, diagrams, and tables,
followed by the item number and a stem that presents the problem. In Figure 8.9 in which
students are to interpret a stem-and-leaf plot, the stem is “Write the number that is the
mode for blinking.” Setting the stem apart from and after the interpretive material serves
to highlight the actual problem; thus, a student does not have to search for the question.
gre78720_ch08_224-261.indd Page 235 3/26/09 12:47:48 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Use the underlined words in the following sentence to indicate the parts of speech in
items 11 and 12.
Casting a silver light, the moon rose over the valley.
• Use of a(n) to avoid a
11. The word “silver” is used as a(n) ______________. grammatical clue.
• Use of equal-size blanks
12. The word “rose” is used as a(n) ______________.
to avoid clues to the
length of an answer.
Ensure Only One Answer Is Correct In some ways, the greatest challenge is to write
short-answer items that avoid ambiguous answers. The item should be worded so that
only one answer is correct. To help you meet this challenge, we suggest looking at the
short-answer as if it were a selected-response item. Then ask yourself, “What is the
right answer?” Next ask yourself, “What are the possible wrong answers a student
might use?” Is your item written so that only the right answer can complete the
response? For example, for Figure 8.9, we first wrote, “What is the mode for blinking?”
However, with a little thought, we realized some students would write, “The most
frequently occurring number.” Well, those students would be correct. But we want to
know whether the student can read the stem-and-leaf plot to determine the most
frequently occurring response for number of blinks. We then edit the item to read,
“Write the number that is the mode for blinking.”
Omit Only Key Words and Phrases For the completion form, the word or phrase
that has been omitted should consist of key concepts rather than trivial details. In
addition, the omitted material should be near the end of the incomplete statement. A
quick look back at the bar graph item in Figure 8.1 shows the blank at the end of the
incomplete statement.
Avoid Clues to the Answer Grammatical clues can inappropriately point students to
a correct answer. In Figure 8.10, the issue of appropriate use of “a” or “an” before a word
with an initial consonant or vowel is handled by using the “a(n)” designation. In addi-
tion, avoid letting blanks in a completion item offer hints as to the length of the response
or the number of words that are needed to complete the statement. Instead, place only
one blank of a standard size at the end of the completion-form item to avoid clues.
TA B L E 8 . 3
Guidelines for Developing Content-Based Essay Items
Write directions that clearly define the task.
Write the prompt to focus students on the key ideas they should address in their response.
Use formatting features, such as bullets and ALL CAPS, to clarify the task.
gre78720_ch08_224-261.indd Page 236 3/26/09 12:47:48 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Write Directions That Define the Task Clearly As was true for short-answer items, your
first challenge is to write the essay item so the task is clearly defined for the student. One
way to do so is to identify the cognitive process you want to assess by beginning the essay
question with a verb that cues that cognitive process for answering. Using Table 2.9 from
Chapter 2, choose verbs at the comprehension levels and higher. Examples include sum-
marize, compare, provide original examples of, predict what would happen if.
In the NAEP mathematics item in Figure 8.11, students analyze a graph to indi-
cate the activity that would be occurring during each time interval. Notice the use of
8
Speed (miles per hour)
0
20 40 60 80
Time (minutes)
19. The graph above represents Marisa’s riding speed throughout her 80-minute bicycle
trip. Use the information in the graph to describe what could have happened on the
trip, including her speed throughout the trip.
During the first 20 minutes, Marisa
the term “describe” in the directions for the mathematics item. If you relied only on
the verb “describe” to determine the cognitive processes a student is using to solve the
problem, you would miss the need for the student to analyze the graph and make the
biking activities consistent with the graph. Sometimes the meaning of the verb depends
in part on the context of the task, and the task demands can make the actual cognitive
process higher. “Describe” means analyze in this case.
Using terms such as when, where, or list will result in essays based on students
remembering factual information, which can be more effectively done with selected-
response items. Essays, however, can be developed that require students to both recall
factual information and apply the information to a novel context. For example, to
respond to the essay item in Figure 8.12, students must recall the meaning of the terms
associated with the elements of art. They then apply the terms in an analysis of artist
Laura McFadden’s pen and ink drawing.
Focus Students on the Key Ideas to Address in Their Response In writing the essay
prompt, you should focus on the specific points students should concentrate on. Notice
how the item writer focuses students in Figure 8.13 by specifying the elements of the
answer. These will then provide the basis for creating a scoring guide. When you
specify the elements of a good answer, you ensure that the requirements of the task
are clear to you and to your students.
Use Formatting Features to Clarify the Task Attention to the formatting of essay
items can also clarify the task. For example, clarity can be enhanced through the use
of bullets to draw students’ attention to the aspects of the essay that they must address.
The prompt in Figure 8.13 uses bullets to highlight the countries about which the
student might write. In addition, the prompt uses the letters “a,” “b,” and “c” to list
(and highlight) the three aspects of the task the student must address.
Using ALL CAPS or italics can bring emphasis when exceptions are being made,
as in Figure 8.4. The use of ALL CAPS in the direction to “Identify two pieces of
information NOT given above that you would need before you could decide whether
or not the United States military should help Teresia” cues students to take careful note
of the exception.
Structuring responses also may contribute to item clarity. The NAEP mathemat-
ics bicycling item (Figure 8.11) provides an illustration. Notice the problem is struc-
tured to draw students’ attention to three time periods in the bicycle trip. The
developers of the item could have stopped after writing, “The graph above represents
Marisa’s riding speed throughout her 80-minute bicycle trip. Use the information in
the graph to describe what could have happened on the trip, including her speed
throughout the trip.” Such a response format requires the students to create their own
framework for their response. The item developer, however, then used paragraph start-
ers, such as “During the first 20 minutes, Marisa . . .” to structure the task and direct
the students to the relevant time spans. This presents a slightly different task in that
students have the structure of the response provided.
Providing such paragraph starters may make some teachers wince because they
prefer that students write their own short paragraph to convey the information. How-
ever, the bicycling item is from a mathematics assessment, so the developer of the
item did not want students’ writing abilities to influence their mathematics scores.
A C O N V E R S A T I O N R E M E M B E R E D
The pen and ink drawing below, A Conversation Remem- Also, discuss your response to the painting and how the
bered, is by artist Laura McFadden. In an essay, describe artist uses the elements and principles of design to create
Laura’s work using the elements and principles of design. the response.
FIGURE 8.12 Essay Task That Requires Students to Remember Art Terms in Analyzing Laura McFadden’s
A Conversation Remembered
Reprinted with permission.
gre78720_ch08_224-261.indd Page 239 3/26/09 12:53:59 PM user-f501
/Users/user-f501/Desktop/TEMPWORK/March2009/26-03-09/MHSF123:Green/MHSF123-
34. Each of the following world regions was a “hot spot” in the 1900s.
• Israel
• The Korean Peninsula
• Vietnam
• The Persian Gulf
Select one world region from the list and answer the following three questions about
conflict in that area.
a. Identify the area you selected and summarize the conflict that occurred.
In your summary, identify the major participants in the conflict.
b. Explain the historical reasons for the conflict.
c. Explain why the conflict has had a major effect on the rest of the world.
FIGURE 8.13 A Released History Item from the 2004 PACT Social Studies Field Test
SOURCE: South Carolina Department of Education. 2004b. Reprinted with permission.
informative, and persuasive (NCES, 2008). Narrative writing includes stories or per- Narrative Writing
sonal essays. Students use informative writing to describe ideas, convey messages, A composition that relates a story
and provide instructions. Informative writing includes reports, reviews, and letters. or a personal experience.
Students use persuasive writing to influence the reader to take action or create change. Informative Writing
Examples of each type of prompt appear in Table 8.4, and Table 8.5 lists guidelines for A composition that describes ideas,
conveys messages, or provides
constructing them. instructions.
Persuasive Writing
Develop Interesting Prompts To create writing prompts that will encourage stu- A composition that attempts to
dents to enjoy writing and to excel at it, your first consideration is to create topics influence the reader to take action
or create change.
of interest to students. You will also want the writing prompts to be of interest to
you because you will dedicate a substantial amount of time to reading your stu-
dents’ thoughts on a topic. Also, bias can creep into your scoring of essays if you
become bored with a topic; so develop essay prompts that are likely to stimulate
thought.
Incorporate Key Elements In developing writing prompts, you should consider the
following elements:
Subject Who or what the response is about
Occasion The situation that requires the response to be written
Audience Who the readers will be
Purpose The intent of the writing, i.e., inform, narrate, persuade
Writer’s role The role the student assumes in the writing (e.g., a friend,
student, concerned citizen)
Form The type of writing, e.g., poem, letter, realistic fiction
(Albertson, 1998; Nitko & Brookhart, 2007)
We apply these elements to the prompt in Figure 8.14. The subjects of the prompt are
the student writer and a person whom the student has read about or seen in a movie.
The occasion is a day the student would spend with this person. The audience is not
specified; however, the prompt was part of a high school exit examination and stu-
dents would be aware that raters would read their essays. The purpose is for students
to write a narrative in which they explain how they would spend the day with this
gre78720_ch08_224-261.indd Page 240 3/26/09 12:47:53 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
TA B L E 8 . 4
Examples of Prompts Used to Assess Writing Skills
Forms of
Writing Examples of Prompts Grade Levels
Narrative One morning a child looks out the window and Fourth Grade
discovers that a huge castle has appeared overnight.
The child rushes outside to the castle and hears
strange sounds coming from it. Someone is living in
the castle!
The castle door creaks open. The child goes in.
Write a story about who the child meets and what
happens inside the castle.
Informative Your school has a program in which a twelfth grader Twelfth Grade
acts as a mentor for a tenth grader at the beginning
of each school year. The mentor’s job is to help the
tenth grader have a successful experience at your
school. The tenth grader you are working with is
worried about being able to write well enough for
high school classes.
Write a letter to your tenth grader explaining what
kind of writing is expected in high school classes
and what the student can do to be a successful
writer in high school.
As you plan your response, think about your own
writing experiences. How would you describe “good”
writing? What advice about writing has been helpful
to you? What writing techniques do you use?
Persuasive Many people think that students are not learning Eighth Grade
enough in school. They want to shorten most school
vacations and make students spend more of the year
in school. Other people think that lengthening the
school year and shortening vacations is a bad idea
because students use their vacations to learn
important things outside of school.
What is your opinion?
Write a letter to your school board either in favor of or
against lengthening the school year. Give specific
reasons to support your opinion that will convince
the school board to agree with you.
person and why it would be so special. The form of the writing is a narrative. Notice
how this prompt uses bullets to focus students on key qualities to include in their
writing. Not every element is directly evident in the prompt because these elements
serve to help you in developing prompts. They are not meant to dictate the structure
of your prompts.
gre78720_ch08_224-261.indd Page 241 3/26/09 12:47:53 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
TA B L E 8 . 5
Guidelines for Developing Writing Prompts
Develop prompts that are interesting to your students and you.
Incorporate the elements of subject, occasion, audience, purpose, writer’s role, and form.
Avoid prompts that require students to (a) write about their personal values or their
religious beliefs or (b) criticize others.
WRITING
Write your response on the lined pages in your test booklet. Use only the lines provided
in your test booklet. Do not write beyond the lines or in the margins.
Writing Prompt
If you could spend a day with a person you have read about or seen in a movie, who would
that person be? How would you spend your day with that person?
Write an essay in which you explain how you would spend your day with this person
and why it would be so special. Include detailed descriptions and explanations to support
your ideas.
As you write, be sure to
• consider the audience.
• develop your response around a clear central idea.
• use specific details and examples to support your central idea.
• organize your ideas into a clear introduction, body, and conclusion.
• use smooth transitions so that there is a logical progression of ideas.
• use a variety of sentence structures.
• check for correct sentence structure.
• check for errors in capitalization, punctuation, spelling, and grammar.
et al., 2009; Nitko & Brookhart, 2007). Prompts that require students to criticize their
parents or community members may create controversy that will take time away from
instruction and student learning.
Scoring Essays
Scoring Guide
Essay tests pose greater challenges in scoring than do selected-response items and
An instrument that specifies the
criteria for rating responses. short-answer items. A critical tool developed to address these challenges is the scoring
Performance Criteria guide. A scoring guide presents the elements of a performance to be scored. These
The key elements of a performance elements are referred to as performance criteria. The checklist, the analytic rubric,
specified in a scoring guide. and the holistic rubric are three forms of scoring guides.
gre78720_ch08_224-261.indd Page 243 3/26/09 12:47:54 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Checklist
Score Description
Content:
________ Presents a comprehensive, in-depth understanding of the subject content.
________ Defines, describes, and identifies the terms, concepts, principles, events,
interactions, changes, or patterns completely and accurately.
________ Uses subject-appropriate terminology to discuss the primary concepts or
principles related to the topic.
Note. May include a minor error, but it does not detract or interfere with the overall
response.
Supporting Details:
________ Develops the central idea by fully integrating prior knowledge or by exten-
sively referencing primary sources (as appropriate).
________ Provides a variety of facts (to explore the major and minor issues).
________ Incorporates specific, relevant examples to illustrate a point, and ties
examples, facts, and arguments to the main concepts or principles.
Analysis, Explanation, Justification:
________ Thoroughly analyzes the significance of a term, concept, principle, event,
interaction, change, or pattern by drawing conclusions, making inferences,
and/or making predictions.
________ Conclusions, inferences, and/or predictions are consistent with the sup-
porting details.
Note. Student receives 1 point for each complete and accurate descriptor or criterion in the
checklist.
A checklist is a list of key task elements that are organized in a logical sequence. Checklist
A checklist allows a teacher to document the presence (or absence) of important dimen- Key elements of a task organized
sions of a performance task. The teacher indicates the presence of observed dimensions in a logical sequence allowing
confirmation of each element.
by placing a check in the appropriate blank. Figure 8.15 is a checklist for scoring the his-
Analytic Ruberic
tory item about world conflict. Note that for each criterion the student receives 1 point.
A scoring guide that contains one
Analytic rubrics identify criteria to use in reviewing the qualities of a student’s or more performance criteria for
performance. Notice that the analytic rubric in Figure 8.16 has six performance criteria evaluating a task with proficiency
(e.g., has five lines, tells a funny story). In the analytic rubric, proficiency levels accompany levels for each criterion.
each performance criterion. Proficiency levels provide a description of each level of that Proficiency Levels
performance criterion along a continuum. For example, in the rubric for scoring limericks The description of each level of
quality of a performance criterion.
in Figure 8.16, the highest proficiency level of the performance criterion tells a funny story
is “The story that the limerick tells is complete and funny.” In this rubric, one performance
criterion has two proficiency levels and the others have three.
When using this analytic rubric, a teacher would assign a score for each perfor-
mance criterion. Thus, a student would get six scores. These scores may be reported
separately or added for a total score.
gre78720_ch08_224-261.indd Page 244 3/26/09 12:47:54 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
1 2 3
Tells a funny story The story that the The story that the limerick The story that the
limerick tells is tells is incomplete or is not limerick tells is
incomplete and is not funny. complete and funny.
funny.
1 2 3
Has correct rhyming Limerick does not rhyme. Limerick has some rhymes, Limerick has correct
pattern but the rhyming pattern is rhyming pattern
not correct. (AABBA).
1 2 3
Has correct capitalization Capitals are not used or Capitals are sometimes Capitals are correctly used
are used incorrectly. correctly used. in the poem and title.
1 2 3
Has correct punctuation Limerick is not Limerick has some correct Limerick is correctly
punctuated or is punctuation. punctuated.
punctuated incorrectly.
1 2 3
Contains descriptive words Specific nouns, adjectives, The selection of specific Specific nouns, adjectives,
and adverbs to “paint a nouns, adjectives, and and adverbs to “paint a
picture” for the reader adverbs to “paint a picture” picture” for the reader
are not used. for the reader is attempted. are effectively selected.
Holistic Rubric In contrast to an analytic rubric, a holistic rubric provides a single score that
A scoring guide that provides a represents overall quality across several performance criteria. The scoring guide
single score representing overall shown in Figure 8.17 is an example of a holistic rubric. Take a moment to read the
quality across several criteria.
highest quality level (i.e., 4 points) of the holistic rubric. Similar to the checklist, this
scoring guide describes student responses in terms of the performance criteria con-
tent, supporting details, and analysis. Unlike the analytic rubric, the performance
criteria are not separated into their own categories with points assigned for each one.
Instead, all performance criteria are joined together to describe general proficiency
levels on the essay. Thus, in scoring holistically a teacher arrives at a single overall
judgment of quality.
Figures 8.18 and 8.19 show two rubrics for scoring writing prompts. Figure 8.18
is an analytic rubric with four performance criteria (i.e., content/development, organiza-
tion, voice, and conventions). Figure 8.19 presents a holistic rubric for scoring student
writing. Notice that for the holistic rubric a student would receive only one score
between 1 and 6.
gre78720_ch08_224-261.indd Page 245 3/26/09 12:47:54 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Holistic Rubric
Score Description
4 points Superior Achievement
The response presents a comprehensive, in-depth understanding of the subject content. Using subject-
appropriate terminology, the response completely and accurately defines, describes, or identifies the
primary terms, concepts or principles, events, changes, or patterns. The response develops the central idea
by extensively referencing and integrating prior knowledge, citing relevant details from primary sources,
incorporating appropriate examples to illustrate a point, and/or providing a variety of facts that support the
premise. The response includes a thorough analysis of the significant events by drawing conclusions, making
inferences, and/or making predictions that are consistent with the supporting details. The response may
include a minor error in the subject content, but it does not detract or interfere with the overall response.
0 points No Achievement
The response is blank, inaccurate, too vague, missing, unreadable, or illegible.
Performance Criteria
Score Content/Development Organization Voice Conventions
246
gre78720_ch08_224-261.indd Page 247 3/26/09 12:47:55 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
6 Points The writing is focused, purposeful, and reflects included, but development is erratic. Word
insight into the writing situation. The paper choice is adequate but may be limited, predict-
conveys a sense of completeness and whole- able, or occasionally vague. There is little, if any,
ness with adherence to the main idea, and its variation in sentence structure. Knowledge of
organizational pattern provides for a logical the conventions of mechanics and usage is usu-
progression of ideas. The support is substan- ally demonstrated, and commonly used words
tial, specific, relevant, concrete, and/or illus- are usually spelled correctly.
trative. The paper demonstrates a commitment 2 Points The writing is related to the topic but includes
to and an involvement with the subject, clarity extraneous or loosely related material. Little
in presentation of ideas, and may use creative evidence of an organizational pattern may be
writing strategies appropriate to the purpose demonstrated, and the paper may lack a sense
of the paper. The writing demonstrates a ma- of completeness or wholeness. Development of
ture command of language (word choice) with support is inadequate or illogical. Word choice
freshness of expression. Sentence structure is is limited, inappropriate, or vague. There is lit-
varied, and sentences are complete except tle, if any, variation in sentence structure, and
when fragments are used purposefully. Few, gross errors in sentence structure may occur.
if any, convention errors occur in mechanics, Errors in basic conventions of mechanics and
usage, and punctuation. usage may occur, and commonly used words
5 Points The writing focuses on the topic, and its organi- may be misspelled.
zational pattern provides for a progression of 1 Point The writing may only minimally address the
ideas, although some lapses may occur. The topic. The paper is a fragmentary or incoherent
paper conveys a sense of completeness or listing of related ideas or sentences or both.
wholeness. The support is ample. The writing Little, if any, development of support or an or-
demonstrates a mature command of language, ganizational pattern or both is apparent. Lim-
including precision in word choice. There is ited or inappropriate word choice may obscure
variation in sentence structure, and, with rare meaning. Gross errors in sentence structure
exceptions, sentences are complete except when and usage may impede communication. Fre-
fragments are used purposefully. The paper quent and blatant errors may occur in the basic
generally follows the conventions of mechanics, conventions of mechanics and usage, and com-
usage, and spelling. monly used words may be misspelled.
4 Points The writing is generally focused on the topic
but may include extraneous or loosely related
material. An organizational pattern is apparent, Unscorable
although some lapses may occur. The paper ex- The paper is unscorable because
hibits some sense of completeness or wholeness. • the response is not related to what the prompt
The support, including word choice, is adequate, requested the student to do.
although development may be uneven. There is • the response is simply a rewording of the prompt.
little variation in sentence structure, and most • the response is a copy of a published work.
sentences are complete. The paper generally fol- • the student refused to write.
lows the conventions of mechanics, usage, and • the response is illegible.
spelling. • the response is incomprehensible (words are arranged
3 Points The writing is generally focused on the topic in such a way that no meaning is conveyed).
but may include extraneous or loosely related • the response contains an insufficient amount of writing
material. An organizational pattern has been to determine if the student was attempting to address
attempted, but the paper may lack a sense of the prompt.
completeness or wholeness. Some support is • the writing folder is blank.
TA B L E 8 . 6
Guidelines for Developing Scoring Guides
Decide whether a checklist, analytic rubric, or holistic rubric is appropriate for reviewing
student responses (see Table 8.7).
For all scoring guides: Determine a small number of non-overlapping performance criteria
that students should demonstrate in their responses.
For analytic and holistic rubrics: Develop a continuum of proficiency levels from least to
most skilled for each performance criterion.
• Use parallel language to describe achievement across the levels of proficiency.
• Avoid use of negative language.
• Design descriptive rather than evaluative proficiency levels that can be clearly
distinguished from one another.
TA B L E 8 . 7
Aspects of Checklists, Holistic Rubrics, and Analytic Rubrics to Consider in Selecting a Scoring Guide
Checklist Holistic Rubric Analytic Rubric
Description Simple list of requirements A single score represents Separate scores are
to check off if present. overall quality across provided for each
several criteria. criterion.
Communicate Expectations Useful when more than two Useful when Useful when each criterion
to Students or three unambiguous communicating a has important distinctions
elements (e.g., page general impression of among the levels of a
length, format) are the levels of overall quality performance that
required in the product. quality for the product. students must learn.
Feedback to Students Offers feedback on Offers feedback providing Offers feedback with detail
presence or absence of a general impression of and precision on all
unambiguous criteria. the product. criteria.
Communication Among Provides information on Provides general grading Provides most information
Faculty Across Sections/ basic elements required. guidelines. about requirements to
Courses facilitate consistency in
grading.
Optimal Use Best used with assignments Best used with brief Best used with products
with several clear-cut assignments carrying involving complex
components; drafts or less grading weight. learning outcomes;
preliminary work. opportunities for
formative assessment
and improvement.
you will find it helpful to think about the qualities you will see in a top-notch essay. The
qualities you are looking for in your students’ essays will be the performance criteria.
Unfortunately, sometimes performance criteria focus on less relevant aspects of
a performance, such as length, neatness, and punctuality. Ideally, a scoring guide focuses
on the dimensions of an essay that are the focus of the learning outcomes. When we
include performance criteria that assess frivolous aspects of a student essay, such as
neatness, error is introduced to the scores.
Developing a Checklist
Having established the performance criteria of interest, let’s consider the elements to
develop in checklists.
List the Key Performance Criteria To develop a checklist, you list the performance
criteria with a space beside each (see Figure 8.15). Each criterion should be a key
quality of the essay. For example, in the checklist for the social studies item, one criterion
gre78720_ch08_224-261.indd Page 250 3/26/09 12:47:55 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
The emphasis in the descriptors is on what the student should do in future story
writing.
You should also determine the appropriate number of proficiency levels for
each performance criterion. Each proficiency level must be clearly distinguishable
from the others if your rubric is to be useful (Lane & Stone, 2006). Sometimes our
teacher candidates believe they should have the same number of proficiency levels
for all performance criteria. However, you should develop the number of profi -
ciency levels you can describe distinctly enough that your assignment of scores will
be consistent. The analytic rubric in Figure 8.18 makes our point. Those who devel-
oped the rubric could describe only three proficiency levels for voice and wisely did
not write a level 4 descriptor that could not be distinguishable from the level 3
descriptor.
The scoring rubric should be drafted as the essay item is being developed. You
will then be able to ensure alignment between them. For content-based essay items,
developing the scoring guide will be facilitated by first reviewing the materials on
which the test is based. The scoring of content-based essays can be facilitated by devel-
oping a model answer or an outline of key points. In addition, before scoring, read a
sample of the responses to determine whether your expectations are realistic. Based
on the review, make any necessary edits to the scoring guide.
TA B L E 8 . 8
Factors Contributing Error to Essay Scores and Recommended Solutions
Scoring Challenges Possible Solutions
Test-to-test carryover effects Recalibrate by reviewing the scoring guide and rereading
a sample of previously scored papers. After completing
the scoring of one essay item, shuffle student papers
before scoring the next item.
Item-to-item carryover effects Score the same question for all students before moving to
next item.
Other issues: language Note on the scoring guide that these criteria are not the
mechanics, handwriting focus of scoring for content-based essay items.
effects, essay length
gre78720_ch08_224-261.indd Page 252 3/26/09 12:47:56 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Teacher Unreliability Probably the most common criticism of essays is the unreli-
ability of teacher scores. Nearly a century ago two studies showed the unreliability of
the scoring of students’ work in English and mathematics (Starch & Elliot, 1912; Starch
& Elliot, 1913). As we have become more aware over the years of the various factors
that contribute error to scores, we have developed methods for reducing that error.
For instance, to reduce error due to teacher judgment, in addition to carefully design-
ing a scoring guide, you should frame the question to focus student attention on the
key issues. When the question is overly broad, students’ answers will diverge and
require more teacher judgment about the correctness of the response. The use of bul-
lets in Figure 8.13 provides an example of framing the question to focus students’
attention. To improve your scoring, you can select examples of student work for each
proficiency level. These serve as anchors as you grade.
Halo Effect The halo effect occurs when teachers’ judgments about one aspect of a
student influence their judgments about other qualities of that student. In such an
instance, a teacher may score an essay more leniently for a student who is generally a
strong student. On the other hand, the teacher might score a different student severely
because of perceptions that the student is weaker (Nitko & Brookhart, 2007). The best
way to guard against the halo effect is to mask students’ names when scoring.
Pitfalls to Avoid: Top Common Challenges in Constructed-Response Items and Scoring Guides 253
Essay Pitfalls
Requiring Higher-Order Thinking Aligned with Learning Goals
When we addressed the pitfalls in writing multiple-choice items, our first example was
failing to address important content tied to the learning goals. Avoiding this problem
with essay questions is even more important because an essay covers fewer topics in
more depth than a group of multiple-choice items requiring similar testing time. If you
write an essay prompt that doesn’t relate to critical content or skills, your assessment
does not do justice to your instruction or to student learning.
In addition, requiring students to employ higher-order thinking skills is also
crucial. Essay questions are more time-consuming to score, as well as more susceptible
Requiring students to List and explain the Include items that require 1. In your opinion, of the
list facts in response factors leading to use of higher-order factors leading to World
to an essay item. World War I. thinking skills aligned War I discussed in class,
with learning goals. which factor do you believe
was most important?
Support your answer with
at least three examples of
evidence from the chapter
and lectures.
Writing prompts that are Compare President Nixon Clearly define the task. 2. Compare President Nixon
broad or vague. to President Reagan. to President Reagan in
terms of the similarities
and differences in their
foreign policy initiatives.
Lacking consistency in No statement about the Align scoring guide and task Number of points for writing
criteria addressed in importance of writing directions. conventions specified in both
the task directions and conventions in task task directions and rubric.
the scoring guide. directions but included
in scoring.
Evaluative terms used in “Bad title.” Use descriptors that foster “Title needs to more accurately
scoring guide. learning rather than summarize the main idea.”
evaluative terms.
Frivolous or untaught Neatness, length Align performance criteria Persuasiveness. Organization
performance criteria (frivolous). Creativity with instruction and around central idea. Creativity
included. (if not taught). learning goals. (if taught).
FIGURE 8.20 Top Challenges in Designing Constructed-Response Items and Scoring Guides
gre78720_ch08_224-261.indd Page 254 3/26/09 12:47:56 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
to error in scoring than selected-response or short-answer items. This means you want
to use essays only when you are tapping understanding and skills you can’t assess with
more direct, less error-prone methods.
Making sure you are appraising higher-order thinking skills can be tricky.
Recently, one of our teacher candidates developed a history essay prompt that required
students to list and explain the factors leading to World War I. Although such an essay
prompt might appear to require students to use critical thinking skills from higher
levels of Bloom’s taxonomy, the factors leading to the war had been thoroughly dis-
cussed in class. For such an essay, students could simply list facts memorized from
class instruction. To avoid using essays for listing memorized facts, decide whether
the essay question requires students to recall a list of facts based on instruction or
whether students are required to analyze new material or evaluate a situation from a
different perspective.
TAND
HE VALUE OF STUDENT-GENERATED ITEMS
CRITICAL THINKING
Recall that one of the first guidelines for developing writing prompts was to create
topics of interest to both students and yourself. By involving students in developing
ideas for essays, you are more likely to generate topics of interest. Students can help
in identifying key concepts in content areas. If students help you do this, they begin
to learn to tease out what is important for themselves.
Students also can contribute to developing rubrics. After an essay prompt is devel-
oped, you can work with students to identify key criteria for assessing their responses.
A benefit of student involvement in developing checklists and rubrics is the discussion
of the criteria. Evaluative criteria are often very abstract concepts. For example, the first
time we encountered the criterion of voice we had little clue to its meaning. Only with
the help of a rubric did we understand that elements of voice include the following:
• Precise and/or vivid vocabulary appropriate for the topic
• Effective rather than predictable or obvious phrasing
• Varied sentence structure to promote rhythmic reading
• Awareness of audience and task; consistent and appropriate tone
During the development of a checklist or rubric, discussing important features of an
essay and linking each feature with your scoring criteria will help students understand
the connection between the rubric and the qualities they are aiming for in their essays.
This involvement serves as useful instructional time because the development process
helps students understand and internalize the elements of quality essays.
ACONSTRUCTED
CCOMMODATIONS FOR DIVERSE LEARNERS:
-RESPONSE ITEMS
In Chapter 7 we described various forms of test accommodations that may assist
diverse learners in your class to complete an examination with multiple-choice, true-
false, and matching items. In this chapter we focus on accommodations that may assist
your students in completing short-answer items and essays. We do not repeat sugges-
tions that we offered in Chapter 7, so you may want to revisit Table 7.6 to gain addi-
tional ideas of the accommodations diverse learners may benefit from.
A critical issue in accommodations for diverse learners is the demand of writing
when short-answer items and essays are used to assess student learning. Writing can
gre78720_ch08_224-261.indd Page 256 3/26/09 12:47:56 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
TA B L E 8 . 9
Accommodation Considerations for Constructed-Response Items
Issue Type of Accommodation
Difficulty with fine motor skills • Use speech-recognition software for word processing.
Learning goal already mastered • Write prompts to engage student at the evaluation and
creation levels of cognition.
Literacy skills below those of • Allow use of spell check and grammar check in
typical peers (e.g., learning word-processing program.
disability) • Explicitly discuss rubric and criteria.
• Review sample essays.
present physical demands for these learners in terms of manually recording thoughts
on paper. Writing also presents cognitive demands in terms of organizing the ideas a
student wants to use in a response. In examining the accommodations for diverse
learners, we consider the need for the assessment to avoid interfering with a student
demonstrating achievement of learning goals. See Table 8.9 for accommodation sug-
gestions for several groups of diverse learners.
TA B L E 8 . 1 0
Essay Questions on Utopia/Dystopia Unit
1. Given what we know about utopia—that it is an ideal place, state, or condition—we
know that the existence of a utopia is impossible. Conversely, a dystopia often reflects a
society taken to the worst possible extreme. Given the two extremes, do you think the
United States is more utopic or dystopic? Support your answer with three characteristics
of utopia or dystopia and relate these reasons to examples from works we have read.
(Example: You might say that the United States is more like a dystopia when police want
to use video surveillances to monitor public places. In 1984, we see constant surveillance
when the government uses telescreens to monitor the citizens of Oceania.)
2. Each group in this class has created a version of a utopian society. Write an
(individual) 2–3 page essay that will persuade other classmates that they would want
to move to your group’s community. Your essay should describe at least three societal
aspects. One must be the government, and two others can be chosen from work,
family, religion, education, attitude toward technology, or others on which your group
has focused. A good essay will briefly describe the aspect, then use persuasive
language to tell other people why they will want to live there.
6 9 12
Description Describes fewer than 3 Describes 3 aspects; however, Clearly describes 3 aspects
aspects and/or the descriptions are vague and with vivid, detailed
descriptions are very need more detail. language.
limited.
6 9 12
Persuasion Support for position is Position supported but some Position supported with
limited or missing. reasons or examples are relevant reasons and
not relevant or well-chosen. well-chosen examples.
2 4 6
Organization Is disorganized and lacks Is organized, but may lack Is focused and well
transitions. some transitions. organized, with effective
use of transitions.
2 4 6
Mechanics Frequent errors in Errors in grammar, spelling, Few errors in grammar,
grammar, spelling, or or punctuation do not spelling, or
punctuation. These inhibit understanding. punctuation. These
errors inhibit errors do not inhibit
understanding. understanding.
Maria limits the scoring guide to four criteria: description, persuasion, organiza-
tion, and mechanics. The criteria description and persuasion most directly relate to the
unit learning goals. When students write persuasively about the United States as a
utopia or dystopia, the description dimension of the task requires them to support their
argument by providing examples of either utopic or dystopic qualities. Similarly, in the
second essay students must describe the utopian qualities of the society that they create.
The descriptions the students provide will help Maria to gauge whether students have
developed an understanding of the utopia concept. The language and arguments the stu-
dents use in their essays will allow Maria to assess their grasp of writing persuasively.
The criteria of organization and mechanics are less directly related to the unit.
However, these two criteria are part of the content standards for writing in most states.
Many English language arts teachers will address organization and mechanics (i.e., con-
ventions) in their instruction all year long. Thus, including them in the scoring guide
makes sense.
The organization and mechanics criteria are generic in that they will apply across
writing assessments throughout the course of the year. However, given their indirect
relation to the learning goals of the unit, the lower weighting of organization and mechan-
ics (12 total possible points) as compared to description and persuasion (24 possible
points) helps to ensure most of the grade for these essays will be based on the learning
goals that are specifically the focus of the unit.
Also notice that the proficiency levels for each performance criterion use specific
language to establish a continuum of performance. Descriptors for persuasion, for
example, state the following:
• Support for position is limited or missing.
• Position supported, but some reasons or examples are not relevant or well chosen.
• Position supported with relevant reasons and well-chosen examples.
The descriptors avoid evaluative terms, such as “poor,” “good,” and “excellent.” In addition,
the four criteria address key qualities and do not focus on criteria that are frivolous
(e.g., neatness) or untaught (e.g., creativity).
HELPFUL WEBSITES
http://nces.ed.gov/nationsreportcard/
A website with constructed-response items and scoring guides from the National Assessment of
Educational Progress (NAEP). Includes items in the subject areas of civics, economics,
geography, mathematics, reading, science, United States history, and writing. All NAEP
questions and rubrics in this chapter come from this website.
http://rubistar.4teachers.org/index.php
A website that provides templates for constructing rubrics. You also can look at rubrics made by
teachers.
REFERENCES
Albertson, B. 1998. Creating effective writing prompts. Newark: Delaware Reading and Writing
Project, Delaware Center for Teacher Education, University of Delaware.
Arter, J., and J. Chappuis. 2007. Creating and recognizing quality rubrics. Upper Saddle River, NJ:
Pearson Education.
California Department of Education. 2008. California standards test released test questions: Grade
8 science. Retrieved February 18, 2008, from http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp.
Chase, C. 1986. Essay test scoring: Interaction of relevant variables. Journal of Educational
Measurement 23: 33–41.
Coerr, E. 1977. Sadako and the thousand paper cranes. New York: Puffin Books.
Creighton, S. 2006. Examining alternative scoring rubrics on a statewide test: The impact of
different scoring methods on science and social studies performance assessments.
Unpublished doctoral dissertation, University of South Carolina, Columbia.
Daly, J., and F. Dickson-Markman. 1982. Contrast effects in evaluating essays. Journal of
Educational Measurement 19(4): 309–316.
Dunbar, S., D. Koretz, and H. Hoover. 1991. Quality control in the development and use of
performance assessments. Applied Measurement in Education 4(4): 289–303.
Florida Department of Education. 2005. Florida Writing Assessment Program (FLORIDA
WRITES!). Retrieved December 26 from http://www.fldoe.org/asp/fw/fwaprubr.asp.
Gredler, M., and R. Johnson. 2004. Assessment in the literacy classroom. Needham Heights, MA:
Allyn & Bacon.
Grobe, C. 1981. Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings.
Research in the Teaching of English 13: 207–215.
Hogan, T., and G. Murphy. 2007. Recommendations for preparing and scoring constructed-
response items: What the experts say. Applied Measurement in Education 20(4): 427–441.
Hopkins, K. 1998. Educational and psychological measurement and evaluation. 8th ed. Englewood
Cliffs, NJ: Allyn and Bacon.
Hopkins, K. D., J. C. Stanley, and B. R. Hopkins. 1990. Educational and psychological measurement
and evaluation. Englewood Cliffs, NJ: Prentice Hall.
Johnson, R., J. Penny, and B. Gordon. 2009. Assessing performance: Developing, scoring, and
validating performance tasks. New York: Guilford Publications.
Kuhs, T., Johnson, R., Agruso, S., and Monsod, D. 2001. Put to the Test: Tools and Techniques for
Classroom Assessment. Portsmouth, NH: Heinemann.
Lane, S., and C. Stone. 2006. Performance assessment. In R. Brennan (ed.), Educational
measurement, 4th ed., pp. 387–431. Westport, CT: American Council on Education and
Praeger Publishers.
Markham, L. 1976. Influences of handwriting quality on teacher evaluation of written work.
American Educational Research Journal 13: 277–284.
National Center for Educational Statistics. 2005. NAEP questions. Retrieved June 19, 2008, from
http://nces.ed.gov/nationsreportcard/itmrls/.
National Center for Educational Statistics. 2008. What does the NAEP Writing Assessment
measure? Retrieved June 19, 2008, from http://nces.ed.gov/nationsreportcard/writing/
whatmeasure.asp.
gre78720_ch08_224-261.indd Page 261 3/26/09 12:47:57 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
References 261
Nitko, A., and S. Brookhart. 2007. Educational assessment of students. 5th ed. Upper Saddle River,
NJ: Pearson.
Ohio State Department of Education. 2003. Academic content standards: K–12 fine arts.
Retrieved June 11, 2007, from http://www.ode.state.oh.us/GD/Templates/Pages/ODE/
ODEDetail.aspx?page=3&TopicRelationID=336&ContentID=1388&Content=12661.
Richter, H. 1987. Friedrich. New York: Puffin Books.
South Carolina Department of Education. 2004a. Extended response scoring rubric. Retrieved
May 24, 2008, from http://ed.sc.gov/agency/offices/assessment/pact/rubrics.html.
South Carolina Department of Education. 2004b. Released PACT history item. Columbia, SC:
Author.
South Carolina Department of Education. 2006. South Carolina High School Assessment Program:
Release form. Retrieved July 1, 2008, from http://ed.sc.gov/agency/offices/assessment/
programs/hsap/releaseitems.html.
South Carolina Department of Education. 2007. PACT math release items. Retrieved February 18,
2008, from http://ed.sc.gov/agency/offices/assessment/pact/PACTMathReleaseItems.html.
Starch, D., and E. Elliot. 1912. Reliability of grading high school work in English. School Review
20: 442–457.
Starch, D., and E. Elliot. 1913. Reliability of grading high school work in mathematics. School
Review 21: 254–259.
Stewart, M., and C. Grobe. 1979. Syntactic maturity, mechanics of writing, and teachers’ quality
ratings. Research in the Teaching of English 13: 207–215.
Thorndike, R. 2005. Measurement and evaluation in psychology and education. 7th ed. Upper
Saddle River, NJ: Pearson Education.
gre78720_ch09_262-289.indd Page 262 3/26/09 1:01:37 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
262
gre78720_ch09_262-289.indd Page 263 3/26/09 1:01:39 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 9
TEACHER-MADE ASSESSMENTS:
PERFORMANCE ASSESSMENTS
We cannot be said to “understand” something unless we can employ it wisely,
fluently, flexibly, and aptly in particular and diverse contexts.
–Grant Wiggins
INTRODUCTION
Now that we have described teacher-made selected-response and constructed-
response formats for designing assessments in Chapters 7 and 8, we move on to the
most complex teacher-designed assessment type. Performance assessments require
students to construct a response to a task or prompt to demonstrate their achieve-
ment of a learning goal. In the previous chapter, we noted that the essay is a special
case of a performance assessment. In responding to an essay prompt, students
retrieve the required facts and concepts, organize this information, and construct a
written response. However, not all performance assessments require a written
response. In music, for example, students sing or play an instrument. In masonry,
students might lay a brick corner.
To illustrate the power of performance assessments, let us take you into one
of our classes. Actually, we are taking you into the media center at Germanton
Elementary. In the biography section of the media center, you see a group of fifth-
grade students standing by the bookshelves. At each shelf a student counts the
books from left to right. After counting five books, the student pulls the fifth book,
reads the title to determine who the biography is about, and sometimes reads the
blurb inside the book cover. Each student then tells another student, who is serving
as a recorder, the ethnicity of the person who is the focus of the biography. The
student then returns the book to the shelf, counts five books, and pulls the next
263
gre78720_ch09_262-289.indd Page 264 3/26/09 1:01:39 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
biography. After completing this process for each of the shelves in the biography sec-
tion, the students return to the classroom to compile their data. In the classroom, the
students graph their results, showing that a majority of the biographies focused on
European Americans. Students then write a brief report about their findings and pre-
sent the report to the media specialist.
What was the purpose of this exercise? The students had just completed a
learning unit on the use of pie charts and bar graphs to show patterns in data.
Why bother with students actually counting the biographies? Why not simply give
students some fabricated data to graph? Enhancing the meaningfulness of a prob-
lem is a key quality in performance assessment (Johnson et al., 2009). By involving
students in the data collection, we make real the use of visual representations of
data to detect patterns and answer significant questions. This particular exercise
also involves students in examining equity in representing the diversity of our
culture—linking their critical thinking skills with issues relevant to participation
in a democracy.
In performance assessment, the form of response required of students may be
Process a process (e.g., delivering a speech) or a product (e.g., a house plan). Process should
The procedure a student uses to be the focus of a performance assessment when (a) the act of performing is the target
complete a performance task. learning outcome, (b) the performance is a series of procedures that can be observed,
or (c) a diagnosis of performance might improve learning outcomes (Gronlund,
2004). For example, in elementary schools, when a student engages in the process of
retelling a story, the teacher can examine the student’s ability to capture the main idea
of the text, identify key details of the narrative, and use this information to summa-
rize the story.
Product A product, on the other hand, should be the focus of learning outcomes
The tangible outcome or end result when (a) a product is the primary outcome of a performance, (b) the product has
of a performance task. observable characteristics that can be used in evaluating the student’s response,
and (c) variations exist in the procedures for producing the response (Gronlund,
2004). For example, in the visual arts the student engages in a process to create
a sculpture; however, the focus should be the qualities expressed in the final
product.
A systematic method for scoring student responses must accompany any perfor-
mance assessment. In Chapter 8, we described three types of scoring guides: checklists,
holistic rubrics, and analytic rubrics. We will revisit analytic rubrics later in this chap-
ter as the preferable means of scoring performance assessments.
TAHINKING
LIGNING ITEMS WITH LEARNING GOALS AND
SKILLS
As with any item type, a critical component of developing a quality performance
assessment is the alignment of the task with the understanding and skills that were the
focus of instruction. Figure 9.1 presents an example of how content of learning goals
can affect the selection of performance tasks versus selected-response, short-answer,
or essay formats. As shown, to assess whether a student uses appropriate instruments
and tools safely and accurately, a teacher will need to use a performance assessment
and observe the student in the process of conducting a scientific investigation. How-
ever, the teacher can use selected-response items to assess students’ learning to classify
observations as quantitative or qualitative or to categorize statements as observations,
predictions, or inferences.
gre78720_ch09_262-289.indd Page 265 3/26/09 1:01:39 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Standard 4-1:
The student will demonstrate an understanding of scientific inquiry, including the processes,
skills, and mathematical thinking necessary to conduct a simple scientific investigation.
Possible Assessment
Indicators (i.e., Learning Goals) Formats
4-1.6 Construct and interpret diagrams, tables, and graphs Performance Assessment
made from recorded measurements and observations.
FIGURE 9.1 Aligning Content Standards and Learning Goals with Assessment Forms
Adapted from South Carolina Department of Education. 2005. South Carolina science academic standards. Columbia,
S.C.: Author. From http://ed.sc.gov/agency/offices/cso/standards/science/. Reprinted with permission.
frequency counts for the types of vegetables or fruits displayed in textbooks. Instead,
students were engaged in the more meaningful task of investigating the representation
of the diversity of our society, summarizing the data in graphs, and writing a written
report with recommendations. Performance assessments are best utilized when the con-
sequences broaden your students’ understanding and contribute to their development.
TA B L E 9 . 1
Guidelines for Developing Performance Assessment Prompts
Specify the content knowledge and cognitive strategies the student must demonstrate in
responding to the prompt.
Determine the format (i.e., oral, demonstration, or product) of the final form of the
student’s response.
Keep the reading demands at a level that ensures clarity of the task directions.
L I B R O D E N I Ñ O S S O B R E
E L M E D I O A M B I E N T E
FIGURE 9.2 A Performance Task for Students in a High School Spanish Class
Reprinted with permission.
gre78720_ch09_262-289.indd Page 268 3/26/09 1:01:40 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Gloria did, you can determine whether the task addresses the knowledge, skills, and
strategies that were the focus of instruction. In addition, a review of the content
requirements allows you to make sure the cognitive strategies required for the task
engage students in analyzing, evaluating, and creating.
In some instances, teachers write performance tasks to assess student learning
across content areas. As discussed in Chapter 8, assessing student development of
writing skills can be integrated with assessing their content knowledge. In Figure 9.3,
the prompt for an art class directs students to create a painting in one of the artistic
styles they have studied. In addition, students develop a placard describing the ele-
ments of the painting. In this task, the teacher assesses in an integrated way her stu-
dents’ learning in both the visual arts and writing.
We have been studying several artistic styles, such as cubism, life using one of the styles of art that we studied. Also, write a
impressionism, and pointillism. We are going to turn our description of what it is that your family member should see
classroom into an art gallery for when your families come to in your painting that will help him or her understand which
our school’s open house. Galleries usually have a variety of style you used. Your description should address the elements
art styles in their exhibits and hanging beside each piece is a and principles of the visual arts. In writing your description,
placard that tells about the artwork. Your contribution to our be sure to follow the conventions of written language that we
gallery is to create a painting of some person or event in your have studied.
FIGURE 9.3 Performance Task That Integrates Art and Writing Skills
Q U A D R A T I C R E L A T I O N S A N D
C O N I C S E C T I O N S
C Ó M O L L E G O ?
Request directions. Student does not Student asks for Student asks for 2
know how to ask directions but with directions with no
for directions. some difficulty. difficulty.
Follow directions. Student can’t follow Student can follow Student can follow 3
directions given by directions with directions without
partner. some difficulty. hesitation.
Identifying places. Student can’t identify Student identifies Student can identify 5
places in and places but with the places without
around the some difficulty. mistakes.
destination.
Earned points_______
Total possible points: 50
FIGURE 9.5 A Spanish Performance Assessment That Incorporates the Element of “Meaningfulness”
Reprinted with permission.
algebraic models to make a decision about evacuating areas of a county that may be
affected by a toxic spill. In “Quadratic Relations and Conic Sections,” students see real-
life implications for the algebraic skills they are developing.
Carmen Nazario, a middle-school Spanish teacher, incorporated meaningfulness
into the task she developed in which students ask for directions and provide directions
in Spanish (Figure 9.5). Students in her class recognize giving directions as a necessary
skill and, thus, the task is likely to be more meaningful to students than textbook
exercises at the end of a unit.
gre78720_ch09_262-289.indd Page 270 3/26/09 1:01:40 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Ethics Alert: In the “¿Cómo Llego?” task in Figure 9.5, the grade of the student depends
in part on the quality of the instructions he receives in Spanish from his partner. Because
the student’s grade should be based only on his understanding of directions stated in the
correct Spanish form, the teacher will need to correct any directions in which the Spanish
is incorrect. Then the grade of the student who is receiving directions will represent his
understanding of directions stated in Spanish.
D A T A A N A L Y S I S A N D P R O B A B I L I T Y
Angela Black
Student Name:_________________ Date:____________
Standard: Mathematics Grade 4: Data Analysis and Probability
Standard 4-6.3: Organize data in graphical displays (tables, line graphs, bar graphs with scale increments greater than one).
We have been learning about collecting data, interpreting (reading) the data, organizing the data for a reader, and the different
types of graphs that are used for data.
You will create a graph using Microsoft Excel from the data that were collected in class from the survey on the sports that
fifth-grade students liked to watch on television. The graph should represent the data clearly and meet the grading categories
in the rubric.
Directions:
1. Bring collected data to Computer Class during Related Arts time.
2. Enter data into Microsoft Excel spreadsheet.
3. Create graph using Chart Wizard.
4. Label and adjust graph to make graph clear and accurate to the reader.
5. Attach your graph to this sheet before turning it in.
FIGURE 9.6 A Data Analysis Performance Task That Links to Students’ Personal Experiences
Reprinted with permission.
The “¿Cómo Llego?” task in Figure 9.5 engages students in providing directions
using Spanish—an example of a process performance task. Other process tasks include
making patterns with math manipulatives, passing a basketball, and playing a musical pas-
sage. The outcomes in the previous performance tasks in which students write a bilingual
children’s book (Figure 9.2) and create a graph (Figure 9.6) are examples of products.
Today, the distinction between process and product can become blurred because
students’ responses or demonstrations can be recorded and scored later. Previously,
students’ performances of a play or a dance could be observed and evaluated only at
the time students engaged in the process. We do not advocate delayed scoring because
the amount of time required to grade a recorded performance is likely to be added
to the time spent watching the original presentation.
TA B L E 9 . 2
Examples of Process and Product Performance Tasks
Processes Products
Oral Demonstration
S O N N E T S E Q U E N C E :
E N G L I S H I I I C O L L E G E P R E P
Frank Harrison 3. In the last six lines, resolve the problem, make general
comments or conclusions, or answer the question.
Instructions: Write a Petrarchan sonnet about any topic, 4. The grid will help you as you draft your sonnet. Follow
character, theme, setting, period, etc., you’ve encountered the pattern for stressed and unstressed syllables shown
thus far in your study of American literature. Use your text, along the top of the grid.
required novels, summer reading, videos shown in class, 5. Follow the pattern for end rhymes shown along the
and/or class discussions/notes for sources. right side of the grid.
1. Write exactly fourteen lines, putting one syllable in 6. Final drafts should be typed and double-spaced.
each box of the grid below.
2. In the first eight lines, describe a problem, introduce
an issue, or pose a question.
ⴚ ´ ⴚ ´ ⴚ ´ ⴚ ´ ⴚ ´
1 A
2 b
3 B
4 a
5 a
6 b
7 b
8 a
9 c
10 d
11 e
12 c
13 d
14 e
FIGURE 9.7 An English Performance Task That Uses a Grid to Guide Student Development of a Sonnet
Reprinted with permission.
When students are working on long-term projects, teachers should review stu-
dents’ progress at several points during the process. (We have addressed some of the
benefits of such formative assessment in Chapter 4.) Periodic reviews also teach stu-
dents to pace themselves and not wait until the last minute to complete a task. No one
wants to grade projects done the night before.
Another benefit to incorporating checkpoints and working with students during the
process is to reduce the likelihood of shoddy last-minute work or even plagiarism. For
gre78720_ch09_262-289.indd Page 274 3/26/09 1:01:41 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
4. Using the axes below, construct a graph showing the number of each species of
Paramecium the student found each day. Be sure to label the axes.
example, after reading and discussing a historical novel, students in one class prepared to
write their own historical novelettes. To frame the collection of information about a time
period, the teacher prepared a chart with the headers of historical events, clothing, food,
etc. (Kuhs et al., 2001). Students used the chart to take notes about people during the
historical period that was the focus of their novel. After reviewing their notes with the
teacher, students then began writing a five-chapter novelette. The student reviewed each 3-
to 4-page chapter with the teacher and revised as needed. At the end of the process, students
had an original novelette of approximately 15 pages. Think of the different position of the
students if the teacher had simply instructed them to write a historical novelette, assigned
a due date for the final project, and not monitored student work as they progressed.
FIGURE 9.9 The Scoring Guide Used with the Task “Sonnet Sequence: English III College Prep”
Reprinted with permission.
gre78720_ch09_262-289.indd Page 276 3/26/09 1:01:41 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Proficiency Levels
Performance
Criteria 1 2 3 4 Score
Data Data in the table Data in the table Data in the table Data in the table
need to be are somewhat are organized, are well
organized, organized, accurate, and organized,
accurate, and accurate, and easy to read. accurate, and
easy to read. easy to read. easy to read.
Title A title that clearly Title is somewhat Title clearly
relates to the related to the relates to the
data should be data; it is data graphed
printed at the present at the and is printed
top of the top of the at the top of
graph. graph. the graph.
Labeling of The X axis needs The X axis has The X axis has a The X axis has
X axis an accurate a label, but clear label that an accurate
and clear label the label is describes the and clear label
that describes not clear or units, but the that describes
the units. accurate. label lacks the units.
accuracy.
Labeling of The Y axis needs The Y axis has a The Y axis has a The Y axis has
Y axis an accurate label, but the clear label that an accurate
and clear label label is not describes the and clear label
that describes clear or units, but the that describes
the units. accurate. label lacks the units.
accuracy.
Design The graph is The graph is The graph has
plain. It needs somewhat an attractive
an attractive plain. It needs design.
design. an attractive
Colors that go
design.
Colors that go well together
well together Or are used to
should be make the
Colors should be
used to make graph more
used that make
the graph readable.
the graph
more readable.
more readable.
FIGURE 9.10 Analytic Rubric and Grading Scale for the Task “Data Analysis and Probability”
Reprinted with permission.
gre78720_ch09_262-289.indd Page 278 3/26/09 1:01:42 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Students then use the conversion chart and calculate their grades from performance
tasks and their other grades during a nine-week grading period.
Developing a task that We have been studying the Keep the task related to We have been studying the
generates excitement but use of propaganda posters learning goals. use of propaganda posters
is detached from the in WWI and WWII. in WWI and WWII.
learning goals. Develop a propaganda Develop a poster that
poster that will get students America could have
involved in the upcoming used for anti-German
clean-up day at school. propaganda.
Assigning a task without A school prepares students Incorporate all the elements Preparation for the science
providing students with for the annual science fair of the task into instruction fair is integrated with
practice in the knowledge by sending home a and class activities. science instruction and
and skills required in the booklet on how to do a activities. Prior to the
task. science fair project. science fair, in class
activities, teachers and
students conduct
experiments, record the
outcomes, and write
summaries of their
findings.
Including too many criteria To guide students in writing Limit the number of criteria The teacher focused the
in a rubric. a novelette, a teacher in the scoring guide. checklist on the criteria
developed a checklist with of character and plot
15 criteria that reflected development, voice, and
the skills learned during conventions.
the year.
FIGURE 9.11 Top Challenges in Designing Performance Tasks and Scoring Guides
gre78720_ch09_262-289.indd Page 279 3/26/09 1:01:42 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
performance task emphasized the propaganda aspects of the lesson. This type of error
also occurs when teachers find an interesting activity on the Internet but forget to
check the alignment with their learning goals. Care must be taken to align the activi-
ties in the performance task with the learning goals.
TAND
HE VALUE OF STUDENT-GENERATED ITEMS
CRITICAL THINKING
Performance assessments allow you to gauge student understanding of complex con-
cepts and skills. The meaningfulness of the task to students is a central quality of this
form of assessment. To establish meaningfulness, a teacher specifies an audience for
the task and provides a context. Establishing an audience and context, however,
increases the cognitive load of the task. Students have to read directions and then
determine the demands of the task. To keep the task directions and response require-
ments from becoming too overwhelming, we recommend you draft the task and then
review it with students to put it into their language. This process supports you in mak-
ing valid decisions about student learning because your students are more likely to
clearly understand the task. Students benefit from being engaged in a process that
allows them to see the meaning of the directions and the form their response should
gre78720_ch09_262-289.indd Page 280 3/26/09 1:01:42 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
take. If students are similarly engaged in developing the scoring guide, they are more
likely to see the connections between the task expectations and the way their response
will be scored.
Developing or reviewing the task and scoring guide with students helps them to
internalize the criteria and develop a metacognitive awareness of the critical elements
they should attend to in a task (Shepard, 2006). In developing these skills, students
can more easily take responsibility for their own learning and develop the self-governing
skills needed for participation in civic culture.
Also consider involving your students in brainstorming the form (i.e., oral, dem-
onstration, or product) of the performance assessment. Use Table 9.2 to show students
the possible forms an assessment might take. Then discuss potential assessment activi-
ties and the reasons some would be appropriate and others not. For example, if you are
studying visual artists’ use of the elements of art to create a response in the viewer, a
timeline about an artist’s life would be inappropriate. Instead, you might discuss the
appropriateness of an assessment in which your students write journal reflections on
artists and the responses the artists’ work provokes. Such participation in decision mak-
ing gives students a sense of control over their learning, which promotes mastery goals
and increases how much they value the tasks (Urdan & Schoenfelder, 2006).
APERFORMANCE
CCOMMODATIONS FOR DIVERSE LEARNERS:
TASKS
When diverse learners use writing in their response to a performance task, the chal-
lenges they face will be the same as those discussed in Chapter 8 for short-answer and
essay items. Also relevant are the accommodations related to reading demands and the
testing environment discussed in Chapter 7 about tests with selected-response items.
So, you will want to revisit Table 7.6 and Table 8.9 to gain additional ideas for accom-
modations that diverse learners may benefit from when completing constructed-
response assessments.
Because not all forms of performance assessment require writing, we expand our
consideration of accommodations here. Products such as photographs, graphs, and
framing for a house do not require written responses. Neither do process-oriented
assessments, such as story retelling (an oral assessment) or application of first-aid skills
(a demonstration).
Before considering the needs of specific groups of diverse learners, we want to
discuss a few issues relevant to all learners. For example, all students grade 2 and above
are likely to benefit from having a set of written directions and a copy of the rubric.
Often our directions are communicated orally and students have to recall the various
tasks they must complete. Their memory, like ours, is faulty, so students will benefit
from having directions presented in both written and oral forms. Similarly, providing
a rubric with the task and discussing it will benefit all students.
As you write the performance assessment, use plain language (i.e., simple vocabu-
lary and sentence structure). We are not suggesting you “dummy down” your tasks. For
instance, it is appropriate to use the new vocabulary terms that are part of the learning
unit. Our point here is that as you write a performance task, keep simplicity and clarity
in mind. If you have the choice between writing two simple sentences or combining
them into one complex sentence, choose the simpler structure of two sentences.
As you plan an assessment, decide whether, for all students, the form of the
assessment (i.e., oral, demonstration, or product) is the only appropriate manner to
gauge student achievement of learning goals. For example, if you have a student with
gre78720_ch09_262-289.indd Page 281 3/26/09 1:01:42 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
TA B L E 9 . 3
Accommodation Considerations for Performance Tasks
Issue Type of Accommodation
Difficulty with fine • Use assistive devices to control tools (e.g., paint brushes).
motor skills • Use head/eye controlled input pointers for computer.
Difficulty focusing • Provide structure within the task (e.g., list of steps to complete).
attention • Keep oral assessments and demonstrations to a short period of time.
Literacy skills below • Keep oral assessments and demonstrations to a short time frame.
those of typical • Provide voice-recorded and written directions.
peers (e.g., learn-
ing disability)
a speech impairment, is retelling a story the only method to assess her understanding?
If so, you may need to consider speech production devices for students with severe
impairments. Also ask yourself whether you might on some occasions allow the stu-
dent to write a brief paragraph about the story she read.
You should also consider whether the activity you developed is the only possible
mode for students to use in demonstrating their knowledge. A well-intentioned teacher
may decide that during the year all students will present a newscast on the school’s
closed-circuit broadcast. However, the stress of being broadcast on closed-circuit tele-
vision will have a negative effect on some students’ performance. Students who are
English language learners and students with speech impairments may perform poorly
in such a situation; however, they may be quite effective in a small-group setting. We
want to provide an opportunity for students to demonstrate their achievement of learn-
ing goals, and this may be achieved using different forms of performance assessment
or different types of activities for different students.
to complete a task. For example, some students may benefit from hand grips that can
be used with pencils or paint brushes. Some performance tasks may require students
to use a computer to work on a spreadsheet or use graphics to develop a presentation.
However, lack of fine motor control may make work at the computer difficult. An assis-
tive device referred to as a head pointer allows students to complete such activities.
Student Will Be Able To (SWBAT) compare and contrast characteristics of a utopia and
dystopia.
SWBAT use persuasive language in writing and/or speaking.
Assignment:
Based on the characteristics we have learned about utopias and dystopias, work to-
gether in groups (3–4 people) to apply your knowledge and create your own society.
An anonymous billionaire has donated a large, remote island and practically unlimited
funds to your group so that you may create a new society. Your new country exists in the
present time, on earth, and all natural laws apply—anything else is up for debate. Name
your country and give it a meaningful motto or creed. Then, create some form of govern-
ment and fully describe at least two other institutions or aspects of society like work, family,
religion, education, entertainment, or technology (including the government, your group
should present as many institutions as there are group members). These institutions should
represent your motto. Finally, your group must present your country to the class (each
group member should present an institution or aspect) and the class will vote to determine
which new society is “best.” An important part of this assignment is persuading others why
they might want to become a citizen of your new country.
Group presentations will take the form of a PowerPoint presentation. Presentations
should be 10–12 minutes long, and each member must present information. The group will
receive a grade and each individual will receive a grade. Keep in mind public speaking con-
cerns we have discussed like clarity, volume, and eye contact.
Component (see Figures 9.13–9.15). When Maria includes the rubric with the task
directions, it can provide students some structure for completing the task. Maria’s
rubric reminds us of the possibility of weighting the evaluative criteria. Notice for the
performance criterion of persuasive language, at the highest proficiency level a student
Presentation
Content
Performance
Criteria Proficiency Levels
Country Name 0 1
No Yes
Country 0 1 2
Motto No motto, or motto does Motto makes an attempt Motto describes the
not describe any to describe society, basic beliefs of the
beliefs. but it is unclear. society.
FIGURE 9.13 Rubric Section for Scoring the Content in the Presentation
Reprinted with permission.
gre78720_ch09_262-289.indd Page 285 3/26/09 1:01:43 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
Presentation
Quality
Performance
Criteria Proficiency Levels
Visuals 1–3 4–6 7–9
Slides are very hard to Slides are not as easy Slides are easy to see
read/see. Effects are to read/see as they and read. Effects
distracting. could be. A few enhance rather than
distractions. distract.
FIGURE 9.14 Rubric Section for Scoring the Quality of the Presentation
Reprinted with permission.
Individual Component
Performance Criteria Proficiency Levels
Preparedness 1 2 3
Speaker gets lost in Speaker needs to Speaker knows and
the presentation. practice a few is comfortable
more times. with the material.
Clarity 1 2 3
Speaker mumbles, Speaker attempts Speaker enunciates
is unintelligible, to enunciate, well. Speed is
or speaks too but occasionally appropriate.
quickly. slips. Speed
needs work.
Volume 1 2 3
Only people right People in the People in the back
next to the middle can hear can hear the
speaker can hear. the speaker. speaker.
Body Language 1 2 3
Speaker does not Speaker tries to Speaker makes eye
attempt to make make some eye contact, does
eye contact, contact, mostly not read the
reads the slides, uses the slide for slides, does not
and fidgets. reference, and fidget, and uses
tries to use appropriate
appropriate movements like
movements. pointing.
FIGURE 9.15 Rubric Section for Scoring the Individual Contribution to a Presentation
Reprinted with permission.
gre78720_ch09_262-289.indd Page 286 3/26/09 1:01:43 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
might earn from 16 to 20 points and for the lowest level from 1 to 5 points. A student’s
score for country motto, in contrast, can be a maximum of 2 points and a minimum
of 0 points. The differences in possible points make sense because persuasive writing
is a key component of the learning unit, and the motto is less central.
Notice also that Maria includes a section in the rubric for an individual component
(see Figure 9.15). Group projects teach our students to work together to accomplish a
goal, a key skill in a democracy. However, if the students are to be graded, the teacher
must include some aspects of the task that students complete individually. This is impor-
tant because we want a student’s grade to reflect his or her learning. The grade should
not reflect which group the student was in. Members of a group bring different levels of
participation to a task. Some students contribute and others do little. As a teacher you
can monitor participation levels. In addition, some groups will function better than
others. Focusing on whether each individual has met the learning goals is the key to
judgment. Therefore, incorporating tasks individuals can complete independently of
the group is important. As we saw in previous chapters, Maria also assesses individual
mastery through several other summative assessments targeting the learning goals.
Ethics Alert: When students are assessed for their work in a group, you must include an
individual component of the assessment that allows you to gauge each student’s achieve-
ment of the learning goals. A student’s grade should not be largely determined by the group
to which the student was assigned.
HELPFUL WEBSITES
http://www.intranet.cps.k12.il.us/assessments/Ideas_and_Rubrics/Assessment_Tasks/Ideas_Tasks/
ideas_tasks.html
An extensive list of activities to consider in designing your performance tasks. Lists tasks in the
visual and performing arts, health and physical development, language arts, mathematics,
biological and physical sciences, and social science. Published by Chicago Public Schools.
http://palm.sri.com/palm/
A website with mathematics performance assessments and scoring guides aligned with the
mathematics standards of the National Council of Teachers of Mathematics (NCTM).
http://pals.sri.com/
Website with science performance assessments for grades K–12. Each performance assessment
has student directions and response forms, administration procedures, scoring rubrics,
and examples of student work.
label
⎩
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎧
6. What criteria would you use in a rubric for the visual arts performance task in
Figure 9.3? Which criteria would you weight more than others?
7. Review the grading scale in Figure 9.10. What would be a reasonable grading
scale for the rubric for scoring sonnets (Figure 9.9)?
8. A teacher and students read limericks and then identify common characteristics
of the poems. Subsequently, the teacher assigns a performance task in which the
students must write an original limerick. What type of common error does this
exemplify?
9. After finding a copy of a performance assessment and accompanying rubric
from the Internet or from a teacher, analyze whether and how it avoided the
most common errors for designing performance assessments and rubrics
described in Chapters 8 and 9.
10. Design a performance assessment that addresses one or more key learning goals
in your content area. Make sure you carefully address each of the guidelines for
development of performance tasks.
11. Design an analytic rubric for the performance assessment you designed in item
10. Include the relevant number of performance criteria for the age group you
work with. Carefully label each proficiency level for each performance criterion
using guidelines in Chapters 8 and 9.
REFERENCES
Angle, M. 2007. An investigation of construct validity in teaching American history portfolios.
Unpublished doctoral dissertation, University of South Carolina, Columbia.
Arter, J., and J. Chappuis. 2007. Creating and recognizing quality rubrics. Upper Saddle River, NJ:
Pearson Education.
Arter, J., and J. McTighe. 2001. Scoring rubrics in the classroom: Using performance
criteria for assessing and improving student performance. Thousand Oaks, CA:
Corwin Press.
California State Board of Education. 1997. Mathematics content standards for California Public
Schools kindergarten through grade twelve. Retrieved April 18, 2008, from http://www.cde.
ca.gov/re/pn/fd/.
Gredler, M., and R. Johnson. 2004. Assessment in the literacy classroom. Needham Heights, MA:
Allyn & Bacon.
Gronlund, N. 2004. Writing instructional objectives for teaching and assessment. 7th ed. Upper
Saddle River, NJ: Pearson Education.
Hogan, T., and G. Murphy. 2007. Recommendations for preparing and scoring constructed-
response items: What the experts say. Applied Measurement in Education 20(4):
427–441.
Johnson, R., J. Penny, and B. Gordon. 2009. Assessing performance: Developing, scoring, and
validating performance tasks. New York: Guilford Publications.
Kuhs, T., R. Johnson, S. Agruso, and D. Monrad. 2001. Put to the test: Tools and techniques for
classroom assessment. Portsmouth, NH: Heinemann.
Ovando, C. 2005. Language diversity and education. In J. Banks and C. Banks (eds.), Multicultural
education: Issues and perspectives, 5th ed., pp. 289–313. New York: Wiley.
Shepard, L. 2006. Classroom assessment. In R. Brennan (ed.), Educational measurement,
4th ed., pp. 623–646. Westport, CT: American Council on Education and Praeger
Publishers.
South Carolina Department of Education. 2005. South Carolina science academic standards.
Columbia, SC: Author. Retrieved May 2, 2008, from http://ed.sc.gov/agency/offices/cso/
standards/science/.
gre78720_ch09_262-289.indd Page 289 3/26/09 1:01:44 AM user-s172 /Users/user-s172/Desktop/25.03.09/MHSF123:GREEN:210
References 289
Urdan, T., and E. Schoenfelder. 2006. Classroom effects on student motivation: Goal
structures, social relationships, and competence beliefs. Journal of School
Psychology 44:331–349.
Virginia Department of Education. 2003. English standards of learning curriculum framework.
Retrieved September 7, 2006, from http://www.pen.k12.va.us/VDOE/Instruction/
English/englishCF.html.
Wiggins, G. 1993. Assessing student performance. San Francisco: Jossey-Bass.
gre78720_ch10_290-321.indd Page 290 4/4/09 3:00:14 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
290
gre78720_ch10_290-321.indd Page 291 4/4/09 3:00:19 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 10
INTRODUCTION
In Chapter 1 we touched on grading as one area where we have found teachers
often have difficulty agreeing about what practices are ethical. We recently asked
more than 100 teachers to write about a situation related to classroom assessment
where they found it difficult to know what was the right or wrong thing to do.
About 40 percent of them described a dilemma related to grading. This category
had the largest number of responses by far—issues related to large-scale standard-
ized testing was a distant second with only about 20 percent of the responses. You,
too, can no doubt draw on personal experiences where a grade caused some con-
flict. In this chapter, we address the thorny issue of grading and do our best to give
you practical guidelines to help you grade confidently and fairly. We define a grade
as a score, letter, or narrative that summarizes the teacher’s judgment of student
achievement of the relevant learning goals.
SWOHYMANY
DOES GRADING CAUSE
PROBLEMS?
First, we discuss several reasons why grading frequently seems to produce pre-
dicaments and quandaries for teachers. Table 10.1 presents some issues that we
will consider.
291
gre78720_ch10_290-321.indd Page 292 4/4/09 3:00:19 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
TA B L E 1 0 . 1
A Few Reasons Why Grading Can Cause Problems for Teachers
Grading brings into conflict teachers’ role as student advocate vs. their role as evaluator
and judge of student achievement.
Grades are a powerful symbol and have the capacity to impact students profoundly in
positive and negative ways.
TA B L E 1 0 . 2
Practices That Overstress the Advocate and Nurturer Role
1. Giving numerous extra credit opportunities.
TA B L E 1 0 . 3
Practices That Overstress the Judging and Sorting Role of Teachers
1. Test questions covering obscure points or skills/understanding not addressed in
learning goals.
2. Test question intentionally designed to mislead the test taker to provide the wrong
answer (“trick” questions).
4. Comments suggesting that doing well is very rare. (“I hardly ever give an ‘A’. ”)
their duty to encourage every student to learn. Table 10.3 describes several such prac-
tices that often appear unnecessarily punitive to students. Such practices can under-
mine the integrity of the student teacher relationship. Teachers instead must take their
role seriously as an adult authority figure who uses power fairly.
Finding the right balance between the advocate role and the judge role can be
difficult. If teachers accentuate advocacy too much when assigning grades, they may
overestimate level of mastery by always giving students “the benefit of the doubt.” If they
emphasize judging too much, they may end up sorting students based on the under-
standing and skills they bring to class rather than focusing on teaching everyone the new
content to master. The middle ground requires development of assessment practices and
grading policies that explicitly take into account both of these important roles.
Ethics Alert: No teacher should ever assign artificially low grades to “motivate” students.
Instead, grades should communicate what students know.
Such stories, the numerous blog entries we have encountered about perceived
unfair grading, and perhaps some experiences in your own academic career illustrate
the intense impact grades can have on our psyches. As one student pointed out, “OK,
that grade, that’s me. I am the grade. I did that” (Thomas & Oldfather, 1997). Thus
another reason grading practices so often cause problems is the sway they hold over
gre78720_ch10_290-321.indd Page 294 4/4/09 3:00:19 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
us and our sense of self-worth. This relevance to our ego magnifies the effects of the
judgments that accompany the translation of assessments into grades. As teachers, we
must remember the power grades possess so we grade thoughtfully and judiciously.
what would be the most logical interpretation I could make about what the grade
represents about the student’s knowledge of that academic subject?” He believes this
question should focus us on the essential meaning each grade should communicate.
You can see that Allen’s question assumes the primary purpose of grading is to
communicate about levels of student achievement. Most members of the assessment com-
munity in education strongly agree (Brookhart, 2004; Guskey & Bailey, 2001; Marzano,
2000, 2006; O’Connor, 2002). Surveys of practicing teachers, however, suggest that many
nonetheless weight heavily some factors unrelated to level of achievement (e.g., Brookhart,
1994; Cross & Frary, 1999; McMillan et al., 2002). These findings illustrate some of the
dimensions of this ongoing struggle.
TA B L E 1 0 . 4
Guidelines for Fair and Accurate Grading Decisions
1. Learn and follow your school district’s grading policy.
2. Design a written policy in which you base grades on summative assessments that
address your learning goals derived from the standards.
5. Allow recent and consistent performance to carry more weight than strict averages.
6. Avoid unduly weighting factors unrelated to a student’s mastery of the learning goals,
such as the following:
• Effort
• Participation
• Attitude or behavior
• Aptitude or ability
• Improvement
• Extra credit
• Late work
• Zeroes
• Group grades
for a culminating project, and use of Venn diagrams to compare and contrast beliefs
and practices, material adaptations to the environment, and social organizations. These
could allow students to work on reasoning and research skills as well as to organize and
master the content of the unit.
Summative assessments for grading purposes could then include students inde-
pendently constructing a Venn diagram (addressing a different focus from any earlier
Venn diagrams), and projects or presentations on specific persons of the time, focus-
ing on their relationship to their culture and the impact of the war. Tests and quizzes
could also be administered at several points during the unit, and they might use
excerpts from original sources, asking students to explain their significance or compare
perspectives of people. The summative assessments would cover similar content to the
formative assessments, but they would also provide novel opportunities for students
to summarize their understanding of the period and its cultures and to apply their
knowledge to new examples to demonstrate mastery. Grades for the unit based on
these summative assessments would then reflect the degree to which each student had
met the learning goals. This is because the instruction, as well as both formative and
summative assessments, would have been carefully aligned with those goals.
TA B L E 1 0 . 5
Greenville County Schools Grade Weighting for English Language Arts at Elementary, Middle, and
High School Levels
Minor Assessments Major Assessments Other
Elementary (7) 60% 30% (1) Writing Portfolio Spelling: (8–9) 10%
Language Arts Response Journals, Learning Logs, Writer’s (1) Major Test
(Writing, Research, Craft, Writing Conventions, Writing
Communication and Process, Writing Rubrics, Research Process,
Language Skills) Reference Materials, Use of Technology,
Presentation Rubrics, Writing Prompts,
Constructed Responses, Anecdotal
Records, Observation Checklists, etc.
can save you grading time, and your summative assessments will allow students to
thoughtfully demonstrate what they have learned.
Involve Students
As we have discussed previously—in relation to formative assessment, designing rubrics,
and progress monitoring—involving students in the assessment process fosters their
development of mastery goals and a sense of ownership of their learning. It also helps
to ensure that your grading is perceived as fair. At a minimum, before any summative
assignment or project is due, you must inform students of the specific characteristics
you are looking for in it. We will never forget our frustration as students when we
received a paper back with 10 points marked off for not including a diagram we had
not known was expected and had therefore not included. Those of us in the class never
saw the rubric until we found it stapled to our paper with our (unsatisfactory) grade.
We also recommend you share with your students a clear picture of the types of
questions you will ask on exams as well as a sense of the areas of content on which
gre78720_ch10_290-321.indd Page 299 4/4/09 3:00:20 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
the exam will focus and the relative weight of the different areas. We have found that
providing students with a list of topics or learning goals and the number of questions
or percentage of points on each one helps reduce anxiety and gives students an under-
standing of the relative importance of different topics.
Remember also the benefits of not just informing students about how assess-
ments will be graded, but also allowing students to help discuss and contribute to some
aspects of assessments. Discussing and defining criteria on a rubric using good and
poor models can help students clarify and internalize the standards of quality. Design-
ing sample questions can help students take a teacher’s perspective on what’s important
and how to think critically.
TA B L E 1 0 . 6
Calculating Grades Based on Median Scores Versus Mean Scores
Consistent Student Inconsistent Student A Inconsistent Student B
Test Grade 82 82 82
Presentation 85 85 100
Quiz Grades 83 83 83
Venn Diagrams 80 40 80
capture more accurately the level of student achievement (Marzano, 2006; O’Connor,
2002). However, you must not build disincentives into your grading policy. Students
may take advantage of a median-based grading policy and “take a break” on the last
assessment or two without affecting their grade if they have done well on a majority
of the earlier ones (see Inconsistent Student A).
Effort
The most common nonachievement factor teachers like to include in grades is effort
(Marzano, 2000). We have heard teachers say that students who work hard should
receive a higher grade than students at the same level of mastery who do not try hard.
Similarly, many teachers also believe that they should lower the grade for “lack of
effort” if a student turns in no homework but gets “A”s on summative assessments.
Teachers like to take effort into account because they feel they can then use
grades as a motivational tool to increase effort (Gronlund, 2006). However, measuring
effort is not easy. Some teachers try to look at specific behaviors, such as turning in
homework. But a student may not turn in homework because he is the sole caretaker
of his siblings every night after school, or because she has already mastered the mate-
rial and doesn’t need additional practice. Besides, if homework is considered formative
assessment, scores on it should make a minimal contribution to the total grade.
Others look at demeanor of students during class. But most students at one time
or another have pretended to be diligent when their minds were a million miles away.
Having a systematic means of choosing behaviors that represent effort and assigning
points for their presence or absence would be time consuming, not to mention contro-
versial. But simply adding a few points to one student’s grade and subtracting a few from
another’s depending on your momentary perception of effort is arbitrary and unfair.
gre78720_ch10_290-321.indd Page 301 4/4/09 3:00:21 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
We believe other, more effective methods for increasing effort exist than assign-
ing a bad grade. For example, as we have discussed throughout this text, use of forma-
tive assessment, student involvement in decision making, and challenging, relevant
assessment tasks have been shown to increase mastery goals on the part of students
and enhance motivation and persistence (Ames, 1992; Schunk et al., 2008). Ultimately,
effort will be reflected directly in the level of mastery attained on the learning goals.
Students who invest effort to take advantage of feedback on formative assessments and
revise their work or sharpen their thinking will perform better on the summative
assessments you provide.
Participation
Some teachers like to allocate some percentage of the grade to participation because
they believe students learn more when they actively take part in classroom activities,
and participation enlivens instruction. However, defining participation (like defining
effort) can be a difficult task. If you simply count the number of times students speak
up in class without regard to the quality of their comments, you could be encouraging
a lot of talk without a lot of thought. Also, introverts and students from cultures that
discourage assertive speaking from children are penalized unfairly if they participate
less than other students.
Jeffrey Smith and his colleagues (2001) suggest instead that you consider “con-
tribution” as a small percentage of the grade rather than participation. You can clearly
specify to your students what counts as a contribution, including a range of possibili-
ties, such as bringing in articles or artifacts that offer new information on what the
class is studying, asking an important question, and making comments in class or in
written work that relate the subject matter to previous topics of study. These authors
suggest you note contributions weekly in your grade book, but we know of at least one
teacher who passes out “Constructive Contribution” slips at appropriate moments and
has students keep track of them until the end of each marking period. If you use this
type of approach, it should count for 10 percent or less of the grade.
Attitude or Behavior
Teachers sometimes include attitude or behavior as a factor in grading, sometimes
weighted quite heavily (Marzano, 2000). They naturally prefer students who conform
to their rules and have a positive attitude and may tend to favor them when it’s time
to calculate grades. Such considerations dilute the key role of grades as an indicator
of achievement. Similar to the problems with considering effort, figuring out systematic
and fair ways to quantify teachers’ range of impressions of “appropriate” attitudes and
behavior is daunting. Usually, attitudes and behaviors are more effectively addressed
in the teacher’s day-to-day classroom management plans.
In addition, researchers have found some systematic patterns suggesting that
behavior is taken into account in grading more often in “troubled” schools, where
discipline is a problem and teachers may be concerned more with compliance than
with fostering achievement (Howley et al., 2001). Because these schools tend to serve
students from poorer and minority backgrounds, the emphasis on behavior issues
may overshadow and interfere with encouraging the advances in learning that must
be gained to close achievement gaps. In other words, when the focus on discipline
and control is manifested in grades, high expectations for achievement may drift to
lesser status.
Cheating is one of the most difficult issues related to behavior and grading
because it concerns both behavior and achievement. You should have a clear policy on
gre78720_ch10_290-321.indd Page 302 4/4/09 3:00:21 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
cheating (usually determined at the school or district level) that addresses plagiarism,
cheating on tests, and other academic dishonesty issues, as well as their consequences.
Carefully explaining this policy to students with explicit examples educates them about
appropriate academic behavior and demonstrates that you take cheating seriously.
Because cheating involves achievement concerns as well as behavioral ones, academic
consequences should be applied (O’Connor, 2002). Such consequences would probably
include, for example, a reduction in grade on the assignment. Notice we stated a reduc-
tion in grade, not assigning a zero. Again, the issue is moderation. Just as effort or
participation should have a minor effect on grades, so should behavior have a minor
effect. Lowering the student’s grade, and/or having the student complete another task
to show actual achievement, communicates that cheating doesn’t have a role in educa-
tion and maintains the overall integrity of the student’s grade.
Aptitude or Ability
Some teachers, in their role as advocate, believe they must give a student of “low abil-
ity” high grades even if the student doesn’t meet the learning goals as long as the
student is “working to potential.” On the other hand, teachers may believe that a stu-
dent with high ability should not receive an “A,” even if the student masters the learn-
ing goals, if the student is not “working to potential.” If you had this perspective and
these two students were in your classroom, you might give the first student an “A” and
the second a “B” for similar performances on summative assessments. We exaggerate
a bit here to show how this would distort the usefulness of the grades in communicat-
ing levels of mastery.
In addition, experts in cognition over a number of years have found aptitude or
ability very elusive and difficult to measure apart from achievement (see Chapter 11).
We as teachers, then, should hesitate to jump to conclusions about the learning poten-
tial of students in our classrooms. As we noted in Chapter 3, conveying high expecta-
tions for all students is important. In addition, if we are modeling mastery goals for
our students, we must help them internalize the conviction that they can learn new
things and master new challenges.
Improvement
Sometimes teachers see a student improve across a marking period and want to give
that student’s grade a boost because an average grade doesn’t reflect the gains. If you
follow our suggestion to weight recent and consistent performance more heavily than
strict averages, this problem is considerably diminished. In addition, if you rely on
formative assessments to help students move toward mastery of the learning goals,
improvement across time should be expected from everyone, and scores on the sum-
mative assessments should accurately reflect gains the students have made.
Extra Credit
Sometimes teachers offer extra credit assignments. We (and probably most other
teachers) also find a few students pleading for extra credit opportunities if they are
worried about a grade, especially as the end of the term approaches. If you do offer
extra credit, it must be offered to all students. It should also be crafted as an oppor-
tunity for showing mastery of some aspect of the learning goals rather than busywork
(e.g., bringing in photos from magazines). Finally, it should contribute a very small
amount to a final grade. If students do not do the extra credit assignment, they should
not be penalized.
gre78720_ch10_290-321.indd Page 303 4/4/09 3:00:21 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
Late Work
Most teachers have policies for penalizing students who turn in work after the due date.
In addition to encouraging responsibility among students, specific due dates and late pen-
alties allow us to sequence instruction based on knowing exactly which content students
have mastered. They also ensure our grading workload is manageable. Often, however,
penalties are so severe (e.g., 10% for each day late) that they reduce student motivation—
why turn in something four or five days late if you will get an “F” on it anyway? We
agree with Marzano (2000), who suggests that teachers deduct no more than 1 percent or
2 percent per day with a maximum of 10 percent. We also recommend that permission
for extensions be secured before the due date. Using this approach ensures that a grade
still primarily represents the quality of the work. Rectifying frequent tardiness is, however,
important to address as part of developing a sense of responsibility and the other personal
qualities that contribute to academic success. But punitive grading has not been shown
to be effective for accomplishing this (Guskey & Bailey, 2001; O’Connor, 2002). Instead,
supporting students by learning why their work is late and establishing assistance or
supervision of make-up work can be more helpful in addressing the problem.
Zeroes
Teachers sometimes assign zeroes for missing assignments or for misbehavior. But if
grades are to communicate the level of achievement, zeroes may significantly distort
them. Such distortion is particularly significant if grades are averaged. A zero is an
outlier that noticeably lowers the mean. In Table 10.7 we changed one of Inconsistent
Student A’s grades (from Table 10.6) to zero. Note the mean fell from 75 to 62.5, while
the median remained the same. Zeroes have an enormous impact because there is a
much bigger numerical distance between them and the next higher grade—usually at
least 50 points—than there is between any other two grades. For example, the distance
between any “B” and an “A” on a grading scale (where 90–100 ⫽ A and 80–89 ⫽ B)
is only 10 points. For this reason, some schools have grading policies that require
teachers to give no score lower than one point less than the lowest “D.”
Group Grades
If a grade is meant to represent the individual’s level of mastery of the learning goals,
you can already see the argument against all students in a group receiving exactly the
same score on a group project intended as a summative assessment. Chances are the
TA B L E 1 0 . 7
Showing the Impact of Zeroes on Mean Versus Median Scores
Consistent Student Inconsistent Student A
Test Grade 82 82
Presentation 85 85
Quiz Grades 83 83
Venn Diagrams 80 0
score will represent the performance level of some, but probably not all, students in
the group. If you want to use a team project in your classes, it should be designed to
have a large individual accountability component and a smaller group component. For
example, in a group presentation, individuals can be scored on their delivery and
content contribution, and then a smaller group score related to some aspects of the
overall presentation might be given. The key in designing group projects is keeping in
mind the understanding and skills you want individuals to demonstrate.
TA B L E 1 0 . 8
Elements of an Effective Grading Policy to Communicate to Students
Describes formative assessments and their purpose and distinguishes them from
summative assessments
Addresses cheating
Addresses consequences for late work
Ethics Alert: Parents must participate in this decision-making process about grades. If
parents are not made aware of modifications to grading, they may be surprised if their child
does not do well on a state test. Remember, the role of the grade is to communicate to others
(e.g., parents) about a student’s achievements.
Grading as a Skill
Learning to grade consistently and fairly is a complex set of skills that requires careful
thought and practice (Brookhart, 2004). This issue doesn’t come into play when you
are grading multiple-choice tests with only one correct answer. But when you are scor-
ing writing assignments or performance assessments, your first step in fair grading is
to be sure to formulate a clear scoring guide to delineate key elements of quality for
you as well as your students, as we discussed in Chapters 8 and 9. You must then take
special care to apply the criteria from the scoring guide consistently. The first few times
you grade a set of papers or projects, we recommend reviewing the suggestions for
avoiding the pitfalls listed in Table 8.8 in Chapter 8.
Norm-Referenced Grading
Norm-referenced grading, sometimes referred to as “grading on the curve,” requires you
to arbitrarily decide how many students can achieve each letter grade. For example, you
may decide the grades in your classroom should approximate a normal distribution:
10% of the students will get an “A” and 10% will receive an “F.”
20% will get a “B” and 20% will earn a “D.”
40% will receive a “C” (Frisbie & Waltman, 1992).
Unfortunately, if 15 percent of your students have high levels of mastery, 5 percent of
them must receive a “B” if you are using the 10 percent cutoff. This approach is clearly arbi-
trary, and the meaning of an “A” will vary with every new group of students. A student’s
grade depends on how many people do better. You can see why norm-referenced grading
often weakens the morale and effort of poorer performing students. It also functions
as a method of “sorting” students from best to worst and thus fosters competition among
students. We recently heard from a student who was taking a class where the teacher
used norm-referenced grading. He told us that as students finished each exam, they
would make as much noise as possible when leaving the classroom—dropping books,
scraping their chairs, slamming the door—to distract the remaining students and interfere
with their concentration in hopes of lowering their grades.
In our own classes, we often ask students which form of grading they prefer.
Surprisingly, we always have a few who favor norm-referenced grading. They inevita-
bly reveal they have had several classes where everyone would have failed tests if
criterion-referenced grading were used, so norm-referenced grading was their only
shot at doing well. But if you align your assessment with your instruction and use
formative assessment with feedback to help students improve before the summative
assessments, students should not be failing your tests in droves.
Letter Grades
Letter grades are the most familiar method of summarizing student achievement. For
many years, teachers have used letter grades, from “A” to “F,” for individual assignments
and then have combined them for final grades, usually weighting more important
assignments more heavily than less important ones.
Even though letter grades are so common, the actual meaning of letter grades
may vary and even confuse. Sometimes the label for each of the five grade categories
can suggest a norm-referenced interpretation (e.g., “C” ⫽ Average), whereas at other
times it can suggest a criterion-referenced interpretation (e.g., “C” ⫽ Satisfactory)
(Guskey & Bailey, 2001). If you grade using a criterion-referenced approach, the labels
conveying the meaning of your grades should be consistent with this approach.
Standards-Based
To tie grading categories more closely to standards on which learning goals are
based, some school districts have abandoned letter-grade categories for categories
that indicate degree of mastery on the standards. They explicitly design assessments
gre78720_ch10_290-321.indd Page 308 4/4/09 3:00:22 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
Benchmark Tests termed benchmarks that indicate levels of proficiency in meeting the standards, and
Assessments tied explicitly to aca- grades are based on these benchmarks. Grading status is then assigned based on
demic standards and administered
at regular intervals to determine
categories such as below basic, basic, proficient, and advanced or beginning, progress-
the extent of student mastery. ing, proficient, and exceptional. Some educators find this approach particularly useful
for connecting student performance to the standards and view it as the best approach
for capturing student performance (Marzano, 2006; O’Connor, 2002). For example,
when you design scoring guides, the categories can be tied explicitly to the profi-
ciency levels of the standards (see Table 5.8). For this approach to be effective, the
standards designated must be broad enough to communicate an overall sense of
student achievement, but narrow enough to provide useful information for future
learning (Guskey & Bailey, 2001).
Percentage
A common method for summarizing grades is to calculate the percentage of total
points earned on each assessment. After weighting assignments proportionally, letter
grades can then be assigned based on specified ranges, or in some schools, the percent-
ages themselves are reported. Many teachers prefer using percentages because they
believe they can make finer discriminations between students with a 100-point range
than with the 5-point letter grade range (i.e., A–F). Because of the error involved in
all measurement, however, the difference between a 72 and a 75 may not actually be
meaningful. In addition, a score of 100 percent, especially if it is on an easy test, may
not necessarily equate to complete mastery of the material. You can’t necessarily assume
that the percentage correct increases directly with the degree of mastery of the learn-
ing goals (McMillan, 2007). In addition, students may perform quite differently on the
different learning goals addressed during a marking period, and a single total doesn’t
allow a picture of strengths and needs in these different areas.
Total Points
A point system is another method teachers often use in calculating grades. The
outcome is similar to the percentage method. Instead of weighting assignments after
the percentage correct is calculated to get a total, you assign a weighted proportion
of the total points to each assignment at the beginning of the term. For example,
instead of weighting the percentage correct on a test twice as much as the percentage
correct on a quiz, you simply have a quiz worth 20 points and a test worth 40 points.
You add up all the points, calculate the percentage of total points earned, and assign
grades on this basis. The drawbacks of this method are similar to those for the
percentage system.
“Not Yet” Grading “Not yet” grading assumes not all students learn at the same “Not Yet” Grading
pace, but all students will learn. On benchmark assessments tied to grade-level stan- A grading system in which students
receive either an “A”, “B”, or “NY”
dards, students receive an “A,” a “B,” or “NY” for each learning goal. “Not Yet” implies (not yet) for each learning goal.
all students will reach the goal when they put in more work, conveying high expecta-
tions and promoting the connection between student effort and achievement. These
systems also offer students systematic opportunities to improve their “NY” grades, so
this approach is one way to capture the positive results of differentiated instruction
and formative assessment. Students who did not do well on an assessment the first
time may need different instructional strategies or more time. Instead of assuming the
fault is in the student, this approach provides more opportunities to master the content,
thus encouraging rather than demoralizing anyone who tends to lag behind peers. You
may protest that this method is not “fair” to those who do well on the first try. But we
must remember that in providing equal access to educational opportunity, we have an
obligation to help all students learn rather than merely sorting them into “strong” or
“slow” learners.
Narrative Reports Rather than using report cards with a single letter grade for each
subject, some elementary schools have teachers write narrative reports describing Narrative Reports
student achievement and progress each marking period. The difficulty of combining a Written descriptions of student
range of disparate factors into one letter or number is thus eliminated. Teachers can achievement and progress.
describe student work habits, reasoning skills, communication skills, and other facets
of learning and performance that may not be captured by a single grade in a content
area. This type of summary of student achievement can work effectively when teachers
employ a structured approach that addresses specific learning goals in their reports
(Guskey & Bailey, 2001). Otherwise, narrative reports can be too vague in their descrip-
tions to pinpoint achievement strengths and needs. Narrative reports are rarely used
in middle and high schools because they are quite time consuming for teachers who
may have more than 100 students each term. In addition, earning credits toward grad-
uation requires more explicit tracking of accumulated understanding and skills.
We next turn to a discussion of portfolios. Portfolio assessment is a popular and
growing means of documenting and communicating about learning in a more com-
prehensive way than a single grade or score might offer.
Portfolio Purposes
Because portfolios require time and energy, they should be used only if they enhance
communication about learning. Therese Kuhs and her colleagues (2001) suggest that
teachers ask themselves, “What is it I want to know about student learning that I can-
not find out in another way?” and “What can a portfolio tell about student learning
that cannot be told by a single performance task?” We thus first turn to the purposes
portfolios can serve so you get a clear picture of the functions they can perform.
gre78720_ch10_290-321.indd Page 310 4/4/09 3:00:23 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
Growth Portfolios
If you intend to promote mastery goals among the students in your classroom, a
Growth Portfolio growth portfolio could be a helpful strategy. The growth portfolio is designed to
An organized collection of students’ document student growth over time. It therefore contains items such as progress mon-
work gathered to document devel-
opment and growth over time.
itoring charts and graphs tracking progress toward specified goals, samples of similar
assignments (e.g., lab reports, critiques, perspective drawings, descriptive paragraphs)
across time, and performance assessments, plus other projects, including early drafts
and final product, with the feedback from earlier drafts incorporated.
In our own teaching, we have used portfolios to document student growth in
writing. At the beginning of each school year, our third-grade students wrote a fictional
story. These stories generally consisted of four or five sentences on 1-inch lined paper.
By the end of the year, stories had greatly improved. They often filled two or more
pages of regular notebook paper. Students now established a setting, developed a char-
acter, presented a problem, and brought the problem to a resolution. They also had
learned to follow the conventions for capitalization, punctuation, quotations for dia-
logue, and paragraphing. At the year-end conference, parents viewed their child’s learn-
ing progression through the stories, which dramatically illustrated their child’s growth.
Thus, in our experience, a portfolio of student writing can be a powerful method for
documenting and communicating about learning.
Another essential element of growth portfolios is focused student reflection. For
example, students may be asked to describe their strengths and weaknesses at the
beginning of the year and then note how they have changed, using evidence from the
items collected over several months in the portfolio. Students may choose a personal
goal and then later describe how portfolio entries illustrate their progress toward
reaching that goal. Sometimes students write a “biography” of a piece of work that
describes the development over time of the assignment. These types of assignments
help students focus on their own progress and improvement rather than comparing
themselves to others, a key element of promoting mastery goals. Stronger mastery goals
then tend to increase student motivation, enhance persistence, and increase student
willingness to attempt challenging tasks (see Chapter 1).
Reflective assignments also encourage students to take responsibility for their
own learning by providing them with opportunities to think about their work as mean-
ingful evidence of their learning rather than as arbitrary assignments to get out of the
way. Growth portfolios also allow students to personalize the learning goals and to
internalize the standards for evaluation, which help them develop the self-governing
skills needed for continued learning and, ultimately, for democratic participation.
Showcase Portfolios
Showcase Portfolio The showcase portfolio is designed to display a collection of work illustrating students’
An organized collection of students’ finest accomplishments in meeting the learning goals. It therefore contains only the
work designed to illustrate their best examples of student work. For example, a math showcase portfolio might show a
finest accomplishments in meeting
learning goals. student’s most effective attempts related to specific problem-solving abilities and com-
munication skills (Kuhs, 1997). A writing showcase portfolio might show a student’s
best writing for different purposes, such as one journal entry on current events, one
poem or piece of fiction, one persuasive essay, and one nonfiction report.
If you decide to use showcase portfolios, you must be sure students have more
than one opportunity to complete each type of assignment. One goal for the showcase
portfolio is to encourage students to learn to apply the evaluation standards to their
work, a strategy we have suggested can help students develop self-governing skills. For
this reason, students usually choose the portfolio entries from categories designated
by the teacher. They must, therefore, have several pieces to compare in order to choose
gre78720_ch10_290-321.indd Page 311 4/4/09 3:00:23 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
the one they believe represents their best work. A reflective piece explaining why each
item was chosen for the showcase portfolio can reinforce students’ use and internaliza-
tion of the standards.
Documentation Portfolios
Some states and school districts, because of dissatisfaction with large-scale standard-
ized tests, have made an effort to use portfolios for large-scale assessment for reports
to government agencies and the public. The documentation portfolio is intended to Documentation Portfolio
provide an evaluation of student achievement status for accountability purposes. An organized collection of students’
Entries for these portfolios must be strictly standardized so they are comparable across work that provides information for
evaluating achievement status for
students and classrooms. Entries may include performance assessments and written accountability purposes.
assignments, as well as teacher-completed checklists (especially in the early grades).
Entries are determined at administrative levels rather than by teachers and/or students.
In contrast to growth and showcase portfolios, student self-assessment is not usually
an essential component. Documentation portfolios are also sometimes employed by
classroom teachers for evaluating levels of student achievement. Table 10.9 presents an
overview of the three types of portfolios.
TA B L E 1 0 . 9
Types of Portfolios
Importance of Importance
Standardized of Student Who
Type of Entries and Self- Chooses
Portfolio Purpose Types of Entries Procedures assessment Entries?
Growth Show progress • Progress monitoring charts and graphs Low High Teacher
over time tracking progress toward specified goals and
• Samples of similar assignments (e.g., lab student
reports, critiques) across time
• Performance assessments or other projects,
including early drafts and final product
with feedback incorporated
• Student reflections on progress
Showcase Demonstrate A student’s best work, such as the following: Low High Student
best accom- • Most effective attempts related to specific
plishments problem-solving abilities and communi-
cation skills in math
• Best writing for different purposes in
language arts
• Best pinch pot, best hand-built pot, and
best thrown pot in art class on pottery
• Student reflection on rationale for entries
Documen- Provide a record Comparable entries across students and High Low Adminis-
tation of student classrooms, such as trators
achievement • Performance assessments
status • Written assignments
• Teacher-completed checklists (especially in
the early grades)
gre78720_ch10_290-321.indd Page 312 4/4/09 3:00:23 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
Implementation Issues
Several issues must be addressed if you decide to implement some variation of growth
or showcase portfolios in your classroom. These issues are not as salient for documen-
tation portfolios required by external authorities.
Student Involvement
Another issue in getting started is to foster meaningful student participation. Students,
too, need to understand what they will get out of developing portfolios. Helping students
feel responsible for their work is a key issue facing teachers today, and portfolios can
help. Preparing an opening statement for the portfolio at the beginning of the year, such
as describing themselves and what they hope to gain in school this year (“Who are you
and why are you here?”), focuses students on ownership of their work and provides a
benchmark for examining progress later (Benson & Barnett, 2005). Some of the sugges-
tions we have made for promoting mastery goals are also relevant here. Allowing students
choice and control in the types of items to be included is one step. Another factor we
have discussed in terms of fostering mastery goals is relevance. Students need to under-
stand why the work they are doing is important and has value for them and their future.
Involving them in analyzing their own work related to learning goals whose significance
they understand helps make their efforts meaningful and relevant.
Systematically requiring student reflection is probably the most important aspect
of generating student engagement in the portfolio process. Involving students in
explaining and reviewing the process of creating their entries as well as evaluating their
products is useful. These reflections should allow them to analyze their work in com-
parison to the learning goals, to see their strengths and weaknesses, and to develop
new strategies to improve future efforts.
One important job of the teacher is to help students see how such efforts foster
significant insights that expand their learning, and three guidelines are recommended
(Fernsten & Fernsten, 2005). First, you must create a supportive environment in which
you encourage candid reflection, even if student revelations don’t correspond to your
gre78720_ch10_290-321.indd Page 313 4/4/09 3:00:23 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
ideas of appropriate strategies (e.g., doing a large project at the last minute). Only hon-
est dialogue about work habits or study strategies can lead to new learning insights.
Second, you must take special care in designing effective prompts for student
reflection. These should be based on the learning goals for the portfolio entries. For
example, if students are to use imagery in their writing, you might ask them how and
why they chose the imagery they used rather than asking a vague question about the
writing process. Similarly, if an entry requires revision after formative feedback, you
might ask students to specify the changes made and how they helped the student move
toward mastery of the learning goals.
Third, for effective student reflection, you and your students must develop a
common language. Your modeling of the language to use when analyzing work is
crucial in the early stages (Gredler & Johnson, 2004). You must also designate periods
for specific instruction related to student learning of evaluation and reflection skills.
Such instruction could include class discussions involving careful analysis of work
samples based on scoring guides and comparison of one piece to another. Discussion
and examples of appropriate goals that can be monitored across time can also be
helpful. Opportunities for students to examine samples of portfolios and generate their
own questions about them may help make the process more concrete for them.
Logistics
After you have clearly established a purpose and have your students involved, you must
choose work samples on which to focus, incorporate the portfolio process into your
instructional activities, and decide how to organize and store the portfolios.
Choosing Work Samples Bailey and Guskey (2001) suggest that the most effective
approach in selecting pieces for inclusion involves a “combination of teacher direction
and student selection.” For example, for growth portfolios, teachers may mandate cer-
tain progress monitoring charts be included (e.g., documenting writing fluency growth),
but students may also choose an additional goal and a personal method to monitor
progress toward it. For showcase portfolios, a language arts teacher may specify that
three types of writing progress will be monitored, but students may be able to choose
which three out of the four types of writing studied they want to represent, and they
also decide which pieces constitute their best work. Sometimes teachers specify a few
entries that must be included (e.g., monthly performance of scales on audiotape for
instrumental music students, written reflection on progress in a specified area in art).
These are added to other entries students choose, including favorite pieces, work
related to their special interests, or items into which they put the most effort or learned
the most. In addition, items chosen collaboratively by students and teachers can be
included (Gredler & Johnson, 2004).
TA B L E 1 0 . 1 0
Growth Portfolio Reflection Form
Portfolio Reflection
Date: ______________
3. In what specific ways have I improved from the last assignment (or last draft of this
assignment)?
Evaluation
The evaluation approach to portfolios depends on the type of portfolio. For growth and
showcase portfolios, the emphasis is on student self-evaluation and reflection rather
than on using portfolios for traditional grades. The key to encouraging accurate student
self-evaluation is establishing clear criteria and providing practice with these skills. If
you include a learning goal related to self-assessment or reflection, the most recent
portfolio entries in which students demonstrate these skills may be used as a summative
assessment for grading purposes. You must make clear what you are looking for in these
reflections, preferably with discussion of examples using a well-designed scoring guide.
Then the early reflections for the portfolio should serve as formative assessments on
which the same rubric is used, and feedback and suggestions for improvement are
offered. Many other key items in growth and showcase portfolios (e.g., persuasive essays,
performance assessments) probably serve as summative assessments in their own right
and therefore will have been graded separately by the teacher.
If you do want to assign grades using portfolio assignments, developing a written
assessment plan such as the one depicted in Table 10.11 for a social studies unit can
be helpful (Rushton & Juola-Rushton, 2007). Such a plan provides an opportunity for
gre78720_ch10_290-321.indd Page 315 4/4/09 3:00:24 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
TA B L E 1 0 . 1 1
Portfolio Assessment Plan
Student Assignment Teacher Assignment
Learning Goal Choice Score Choice Score
both teachers and students to choose entries that exemplify the student’s level of per-
formance for each learning goal.
Documentation portfolios, in contrast to growth and showcase portfolios, are
formally designed to demonstrate student level of mastery of the learning goals, and
entries are not usually a matter of student or teacher choice because of the need for
standardization across many students. Currently, judges assessing large-scale use of
documentation portfolios have difficulty scoring portfolios consistently, but efforts are
continuing to improve reliability by improving training and scoring guides (Azzam,
2008). The variations in conditions under which the entries in the portfolios are devel-
oped may also cloud the meaning of scores. Circumstances often vary between stu-
dents and between classrooms. For example, the amount of support students have
received from others may fluctuate from classroom to classroom, and revision oppor-
tunities may also differ (Gredler & Johnson, 2004). Such discrepancies may detract
from the validity of the judgments made about students’ level of mastery.
FIGURE 10.1 Student-Led Conference Plan for Grade Three Language Arts
gre78720_ch10_290-321.indd Page 317 4/4/09 3:00:24 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
TA B L E 1 0 . 1 2
Advantages and Limitations of Portfolios
Advantages Limitations
Provide opportunities for in-depth Management tasks may require time that
communication with parents and others detracts from attention to mastering
about student learning and progress. learning goals.
gre78720_ch10_290-321.indd Page 318 4/4/09 3:00:24 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
HELPFUL WEBSITES
http://www.guilderlandschools.org/district/General_Info/Grading.htm
The Guilderland, New York Central School District grading policies provide explicit
guidelines for elementary, middle, and high school teachers, including purposes of
gre78720_ch10_290-321.indd Page 319 4/4/09 3:00:24 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
References 319
grades, type of criteria, borderline cases, weighting of grades, and number of grades
required per marking period.
http://essdack.org/?q=digitalportfolios
This site for the Educational Services and Staff Development Association of Central
Kansas presents information about designing digital portfolios and provides a
related blog.
REFERENCES
Allen, J. G. 2009. Grades as valid measures of academic achievement of classroom learning.
In K. Cauley and G. Pannozzo (eds.), Annual Editions Educational Psychology, 23rd ed.,
pp. 203–208.
Ames, C. 1992. Classrooms: Goals, structures, and student motivation. Journal of Educational
Psychology 84(3): 261–271.
Azzam, A. 2008. Left behind—By design. Educational Leadership 65(4): 91–92.
Bailey, J., and T. Guskey. 2001. Implementing student-led conferences. Thousand Oaks, CA:
Corwin Press.
Belgrad, S., K. Burke, and R. Fogarty. 2008. The portfolio connection: Student work linked to
standards. 3rd ed. Thousand Oaks, CA: Corwin Press.
gre78720_ch10_290-321.indd Page 320 4/4/09 3:00:25 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
Benson, B. P., and S. P. Barnett. 2005. Student-led conferencing using showcase portfolios. Thousand
Oaks, CA: Corwin Press.
Brookhart, S. 1994. Teachers’ grading: Practice and theory. Applied Measurement in Education 7:
279–301.
Brookhart, S. 2004. Grading. Upper Saddle River, NJ: Pearson Merrill Prentice Hall.
Callison, D. 2007. Portfolio revisited with digital considerations. School Library Media Activities
Monthly 23(6): 43–46.
Cross, C. 1997. Hard questions, “standard answers.” Basic Education 42(3): 1–3.
Cross, L. H., and R. B. Frary. 1999. Hodgepodge grading: Endorsed by students and teachers
alike. Applied Measurement in Education 12(1): 53–72.
Fernsten, L., and J. Fernsten. 2005. Portfolio assessment and reflection: Enhancing learning
through effective practice. Reflective Practice 6: 303–309.
Frisbie, D., and K. Waltman. 1992. Developing a personal grading plan. Educational
Measurement: Issues and Practices 11(3): 35–42.
Gredler, M., and R. Johnson. 2004. Assessment in the literacy classroom. Boston: Pearson.
Gronlund, N. 2006. Assessment of student achievement. 8th ed. Boston: Pearson.
Guskey, T., and J. Bailey. 2001. Developing grading and reporting systems for student
learning. Thousand Oaks, CA: Corwin Press.
Howley, A., P. Kusimo, and L. Parrott. 2001. Grading and the ethos of effort. Learning
Environments Research 3: 229–246.
Klein-Ezell, C., and D. Ezell. 2004. Use of portfolio assessment with students with cognitive
disabilities. Assessment for Effective Intervention 30(4): 15–24.
Kuhs, T. 1997. Measure for measure: Using portfolios in K–8 mathematics. Portsmouth, NH:
Heinemann.
Kuhs, T., R. Johnson, S. Agruso, and D. Monrad. 2001. Put to the test. Portsmouth, NH:
Heinemann.
Marzano, R. 2000. Transforming classroom grading. Alexandria, VA: Association for Supervision
and Curriculum Development.
Marzano, R. 2006. Classroom assessments that work. Alexandria, VA: Association for Supervision
and Curriculum Development.
McMillan, J. 2007. Classroom assessment. Boston: Pearson Allyn & Bacon.
McMillan, J., S. Myran, and D. Workman. 2002. Elementary teachers’ classroom assessment and
grading practices. Journal of Educational Research 95(4): 203–213.
Munk, D., and W. Bursuck. 2003. Grading students with disabilities. Educational Leadership
61(2): 38–43.
O’Connor, K. 2002. How to grade for learning: Linking grades to standards. Glenview, IL: Pearson
SkyLight.
Pilcher, J. 1994. The value-driven meaning of grades. Educational Assessment 2(1): 69–88.
Rushton, S., and A. M. Juola-Rushton. 2007. Performance assessment in the early grades. In
P. Jones, J. Carr, and R. Ataya, eds. A pig don’t get fatter the more you weigh it. New York:
Teacher’s College Press, pp. 29–38.
Schunk, D., P. Pintrich, and J. Meece. 2008. Motivation in education: Theory, research, and
applications. 3rd ed. Upper Saddle River, NJ: Pearson.
Shepard, L. A. 2006. Classroom assessment. In R. L. Brennan (ed.), Educational
measurement, 4th ed., pp. 623–646. Westport, CT: American Council on Education/
Praeger Publishers.
Smith, G., L. Smith, and R. DeLisi. 2001. Natural classroom assessment. Thousand Oaks, CA:
Corwin Press.
Stiggins, R. 2008. An introduction to student-involved assessment for learning. 5th ed. Boston:
Pearson.
Stiggins, R., and J. Chappuis. 2005. Using student-involved classroom assessment to close
achievement gaps. Theory into Practice 44(1): 11–18.
Thomas, S., and P. Oldfather. 1997. Intrinsic motivations, literacy, and assessment practices:
“That’s my grade. That’s me.” Educational Psychologist 32(2): 107–123.
gre78720_ch10_290-321.indd Page 321 4/4/09 3:00:25 PM user-s172 /Users/user-s172/Desktop/MHSF123-10
References 321
Urdan, T., and E. Schoenfelder. 2006. Classroom effects on student motivation: Goal
structures, social relationships, and competence beliefs. Journal of School Psychology
44: 331–349.
Whittington, D. 1999. Making room for values and fairness: Teaching reliability and
validity in the classroom context. Educational Measurement: Issues and Practice 18(1):
14–27.
Winger, T. 2005. Grading to communicate. Educational Leadership 63(3): 61–65.
Zoeckler, L. G. 2007. Moral aspects of grading: A study of high school English teachers’
perceptions. American Secondary Education 35(2): 83–102.
gre78720_ch11_322-357.indd Page 322 4/4/09 3:28:52 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
322
gre78720_ch11_322-357.indd Page 323 4/4/09 3:28:54 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 11
LARGE-SCALE STANDARDIZED
TESTS AND THE CLASSROOM
As long as learning is connected with earning, as long as certain jobs can only be
reached through exams, so long must we take this examination system seriously.
–E. M. Forster, British novelist and essayist
INTRODUCTION
This chapter addresses large-scale, standardized, summative tests. They are the focus
of possibly the most controversial discussions in education today, with passionate
defenders and detractors. One key reason large-scale tests loom so large in the
national discourse on education is that they are the primary means by which states
assess student performance to meet the requirements of the No Child Left Behind
Act (NCLB) of 2001. They are also used for high-stakes decisions related to college
and graduate school entry and for access to some special programs.
As you may recall from Chapter 5, the goal of NCLB is to increase academic
achievement of lower-performing groups and to have all students in the United
States achieve proficiency in math and reading by 2014. Each state accepting federal
assistance is required to assess students in math and reading in grades 3 through 8
323
gre78720_ch11_322-357.indd Page 324 4/4/09 3:28:54 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
FIGURE 11.1
© The New Yorker Collection 2002 Edward Koren from cartoonbank.com.
All rights reserved.
each year and once in high school. Science must also be assessed at least once in elemen-
tary, middle, and high school. The results must be disaggregated by major ethnicity
groups, disability status, English proficiency, and status as economically disadvantaged.
These results are reported publicly. If any of the groups in a school fail to make annual
yearly progress (AYP) two years in a row, the school is designated as needing improve-
ment, and students may transfer to another public school in the district. If the school
continues to fail to make annual yearly progress, other corrective actions must be taken,
eventually leading to “restructuring” the school with curriculum and staff changes.
Because such important consequences result from these tests, they have become
high-stakes assessments. With so much riding on the results, these tests definitely get
“inspected.” If “respect” equates to crowding other issues out of the spotlight, these
tests do that, too. The cartoon in Figure 11.1 underlines the burgeoning prominence
of such test scores.
To begin navigating this high-profile controversy, you must become familiar with
large-scale summative assessments. Thus, we devote this chapter to the issues sur-
rounding them. We first describe some basic definitions and misconceptions related
to large-scale standardized tests, then we appraise their benefits and pitfalls. Next, we
explain preparation for and administration of these tests, including accommodations
that can be made for diverse learners. We also discuss how to interpret these tests and
describe how classroom teachers can use the results.
provide information about the degree to which test content is mastered (similar to
criterion-referenced grading in Chapter 10). For example, most states set a criterion level
of performance for their accountability tests at each grade that represents proficiency. If
students reach this predetermined score, they are categorized as “proficient” or “meeting
standards.” Criterion-referenced scoring is also termed standards-based scoring in educa-
tion because students’ outcomes determine whether they meet the standards the state has
chosen for levels of understanding and skills for that subject and grade.
When we address criterion-referenced scoring in our classes, we ask students for
examples from their own experiences. An example often given is the test to get a
driver’s license. You must obtain a score at or above a certain cutoff level to pass it.
Our students think of this example because that cutoff score still looms large in their
memories, especially if they had to take the test more than once. Most teachers grade
classroom summative tests using criterion-referenced scoring. If a teacher sets the cut-
off score at 94 percent correct for an A, everyone above that score receives an A, even
if it is half of the class.
Norm-referenced scoring, on the other hand, compares individual scores to the Norm-Referenced Scoring
scores of a group of students who has taken the test. Instead of indicating level of Compares a student’s score to a
content mastered, scores indicate standing relative to that group. Your score reflects a group of other students who have
taken the test.
comparison to people taking the test rather than to the amount of content learned.
The score you are assigned depends on where your performance (number of items
correct) fits in the range of scores obtained by a group, called the norm group. Norm Group
When we ask our students for examples from their experience that illustrate A carefully constructed sample of
norm-referenced scoring, they mention their high school class rank or their band or test takers used for evaluating the
scores of others.
football team’s annual ranking. These examples also illustrate why careful attention
must be directed to just who composes the norm group. If you had 10 National Merit
Semi-Finalists in your graduating class, your class rank is likely to be higher than if
you had 40.
Similarly, your scores may look very different if you are compared to a norm
group of students from a nationwide sample rather than to local norms, which are Local Norm
based on school district or state performance. Figure 11.2 illustrates this point. It rep- A norm group made up of test
resents the distribution of test scores along a continuum of low scores (on the left), takers from a restricted area
(e.g., school district).
average scores (in the middle), and high scores (on the right). If your school district
is above the national average, your scores will be lower when compared to local norms.
Your score
1 5 10 20 30 40 50 60 70 80 90 95 99
Percentile ranks for Percentile ranks for
national test takers local test takers
1 5 10 20 30 40 50 60 70 80 90 95 99
FIGURE 11.2 With Norm-Referenced Scoring, the Group You Are Compared to Determines Your Score
gre78720_ch11_322-357.indd Page 326 4/4/09 3:28:55 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
For example, in Figure 11.2, your test score at the 70th percentile nationally (see the
lighter gray scale) will be at the 50th percentile locally (see the dark scale). If, instead,
your school district is below the national average, your scores will be higher when
compared to local norms than when compared to national norms.
A test using norm-referenced scoring is not useful unless the norm group is
composed of a representative sample of individuals from the entire population of the
type of people expected to take that test. If a test is used only in one school district,
local norms will suffice. If a test is used widely across the whole country, the sample
of students for the norm group should also represent the student population in the
United States in terms of demographics, geography, race, gender, age, and socioeco-
nomic and cultural factors.
If the norm group does not include a representative sample, using that sample’s
distribution of scores to interpret scores in your school will provide misleading infor-
mation and lead to decisions that lack validity. For example, one of our students found
a published norm-referenced test whose norm group consisted of students from only
two states, yet the test developer suggested the norms based on this group could be
applied to students across the United States. Similarly, questions have been raised about
the legitimacy of comparing English language learners to norm groups made up of
native English speakers (Solorzano, 2008).
Because people know that norm- and criterion-referenced scoring require dif-
ferent processes for interpreting scores, they assume that you cannot use both methods
on the same test. Many tests offer only one or the other kind of scoring option. How-
ever, any test can be scored both ways if (1) appropriate criteria for acceptable perfor-
mance have been specified (for criterion-referenced scoring), and (2) if representative
norms have been developed (for norm-referenced scoring). Sometimes people are
interested in indicators of content mastery and comparisons to a national or state norm
group for the same set of scores.
TA B L E 1 1 . 1
Questions Similar to Those on Aptitude Versus Achievement Tests
Aptitude Achievement
What’s the right thing to do if you see a car Explain the difference between the rotation
accident? and the revolution of the earth.
What is the next number in the following If there are five ducks in the water and
series? 5 10 20 40 three fly away, how many ducks are left?
LMARGE
ISCONCEPTIONS RELATED TO
-SCALE TESTING
In our years of teaching, we have found that teacher candidates have certain miscon-
ceptions about large-scale standardized tests, especially published norm-referenced
tests. Table 11.2 lists the five most common misconceptions.
TA B L E 1 1 . 2
Misconceptions About Standardized Tests
1. The obtained score always accurately represents the test taker’s true score.
3. Norm-referenced tests always compare people who took the test at the same time.
4. Standardized tests with multiple-choice items address only basic facts at the knowledge
level of Bloom’s taxonomy.
5. Teachers should be able to use large-scale test results to address individual student
needs in the classroom.
gre78720_ch11_322-357.indd Page 328 4/4/09 3:28:55 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
Ethics Alert: The report is simulated and the student’s name is fictional. Providing an
actual student’s report would violate confidentiality. As teachers, we should share stu-
dent test scores only with the student, his or her parents, and other school faculty with
a need to know.
COMPLETE BATTERY
Performance on Objectives
Individual Profile
Objectives Performance index (OPI)* Objectives Performance Index (OPI)*
with InView™, Part I
Diff
Obj. No.
Student
Natl OPI
Diff
Obj. No.
Student
Natl OPI
Moderate
Mastery
Range
Objective
Titles
Objective
Titles
Moderate
Mastery
Range
0 25 50 75 100 0 25 50 75 100
Reading Social Studies
gre78720_ch11_322-357.indd Page 329
PAT WASHINGTON
02 Basic Understanding 91 79 12 48-70 26 Geographic Perspectives 79 91 -12 48-70
Grade 4 03 Analyze Text 92 84 8 52-75 27 Historical & Cultural 84 92 -8 52-75
04 Evaluate/Extend Meaning 65 66 -1 50-70 28 Civics & Government 66 65 1 50-70
4/4/09
Simulated Data
05 Identify Rdg Strategies 70 74 -4 45-73 29 Economic Perspectives 74 70 4 45-73
Language
Purpose
07 Sentence Structure 63 68 -5 45-70
This report presents information about
this student's performance on the 08 Writing Strategies 59 74 -15 50-75
TerraNova and InView assessments.
Page 1 describes achievement in terms 09 Editing Skills 78 63 15 55-75
of performance on the objectives. Together
with classroom assessments and Mathematics
classwork, this information can be used
to identify potential strengths and needs 10 Number & Num. Relations 71 69 2 47-77
3:28:55 PM user-s172
329
FIGURE 11.3 TerraNova Test Scores for Fictitious Students (Continued)
/Users/user-s172/Desktop/MHSF123-11
330
TM
TerraNova , Third Edition
COMPLETE BATTERY Anticipated Obtained
Normal Normal Anticipated National National Percentile Scale
Scale 1
Curve Curve National National Percentile
Norm-Referenced Scores Score DIFF Equiv. Equiv. Percentile Percentile Range 1 10 25 50 75 90 99
Individual Profile Reading 664 35 47 23 45 32-56
with InView™, Part I _
_
Language 678 Above 34 57 22 64 53-72
_
_
Mathematics 674 41 48 29 47 37-60
_
_
gre78720_ch11_322-357.indd Page 330
Purpose
**Total Score consists of Reading, Language, and Mathematics.
Page 2 of this report presents 1
Above or Below appears when there is a significant difference.
obtained norm-referenced information 1 2 3 4 5 6 7 8 9
(TerraNova) and anticipated norm- National Stanine Scale
referenced information (InView), as well Skills and abilities the student demonstrates:
as descriptions of the skills and abilities
the student demonstrated, and the skills Reading...
and abilities the student can work toward Students use context clues and structural analysis to determine word meaning. They recognize homonyms and antonyms in grade-level text. They identify important
details, sequence, cause and effect, and lessons embedded in the text. They interpret characters' feelings and apply information to new situations. In written responses,
to show academic growth. they can express an opinion and support it. Integrate the use of analogies to generalize an idea; identify paraphrases of concepts or ideas in text; indicate specific thought
processes that lead to an answer; demonstrate understanding of an implied theme; assess intent of passage information; and provide justification for answers.
The student demonstrated some understanding of the knowledge, skills, and abilities in this content area.
Language Arts...
Students identify irrelevant sentences in paragraphs and select the best place to insert new information. They recognize faulty sentence construction. They can combine
3:28:57 PM user-s172
simple sentences with conjunctions and use simple subordination of phrases/clauses. They identify reference sources. They recognize correct conventions for dates,
closings, and place names in informal correspondence. Understand logical development in paragraph structure; identify essential information from notes; recognize the
effect of prepositional phrases on subject-verb agreement; find and correct errors when editing simple narratives; correct run-on and incomplete sentences; eliminate all
errors when editing their own work.
The student demonstrated some of the knowledge, skills, and abilities in this content area.
Mathematics...
Students identify even and odd numbers; subtract whole numbers with regrouping; multiply and divide by one-digit numbers; identify simple fractions; measure with ruler
to nearest inch; tell time to nearest fifteen minutes; recognize symmetry; subdivide shapes; complete bar graphs; extend numerical and geometric patterns; apply simple
Birthdate: 02/08/98 logical reasoning. Locate decimals on a number line; apply basic number theory; compute with decimals and fractions; measure to the nearest quarter-inch; read scale
drawings; find areas; identify results of geometric transformations; construct and label bar graphs; find simple probabilities; find averages; use patterns in data to solve
Special Codes: problems; use multiple strategies and concepts to solve unfamiliar problems; express mathematical ideas and explain the problem-solving process.
ABCDEFGHIJKLMNOPQRSTUVWXYZ The student demonstrated some of the knowledge, skills, and abilities in this content area.
3 5 9 7 32 1 1 1
Form/Level: G-14 In Science...
Students are familiar with the life cycles of plants and animals. They can identify an example of a cold-blooded animal. They infer what once existed from fossil evidence.
Test Date: 04/15/07 Scoring: PATTERN (IRT) They recognize the term habitat. They understand the water cycle. They know science and society issues such as recycling and sources of pollution. They can sequence
QM: 31 Norms Date: 2007 technological advances. They extrapolate data, devise a simple classification scheme, and determine the purpose of a simple experiment. Differentiate between instinctive
Class: JONES and learned behavior; develop a working understanding of the structure of the Earth; create a better understanding of terms such as decomposers, fossil fuel, eclipse,
and buoyancy; interpret more detailed graphs and tables; understand experimentation.
School: WINFIELD The student demonstrated little of the knowledge, skills, and abilities in this content area.
District: GREEN VALLEY
In Social Studies...
Students demonstrate skills in organizing information. They use time lines, product and global maps, and cardinal directions. They understand simple cause and effect
relationships and historical documents. They sequence events, associate holidays with events, and classify natural resources. They compare life in different times and
understand some economic concepts related to products, jobs, and the environment. They give some detail in written responses. Synthesize information from sources
City/State: ANYTOWN, U.S.A. such as maps and charts; create a better understanding of the democratic process, the basic principles our government was founded on, as well as roles and
responsibilities of government and citizens; recognize patterns and similarities in different historical times; understand global and environmental issues; locate continents
and major countries; summarize information from multiple sources in early American history; describe how geography affected the development of the colonial economy;
Pg. 2 thoroughly understand both sides of an issue.
The student demonstrated most of the knowledge, skills, and abilities in this content area.
For more information, visit CTB’s website, CTB.com/TerraNova3
Norm-Referenced Scores
Individual Profile
with InView™, Part II National Percentile Scale
PAT WASHINGTON NP by NP by 10 25 50 75 90 99
GE NCE SS NS Age Grade
gre78720_ch11_322-357.indd Page 331
1 2 3 4 5 6 7 8 9
3:28:59 PM user-s172
Observations
Birthdate: 02/08/98 InView consists of five tests that measure cognitive ability. The five scores. For example, in Sequences, the National Percentile by Age
Special Codes: tests are Sequences, Analogies, Quantitative Reasoning, Verbal is 54 and the National Percentile by Grade is 61. This can be
ABCDEFGHIJKLMNOPQRSTUVWXYZ Reasoning-Words, and Verbal Reasoning-Context. Explanations of interpreted to mean that with respect to age, this student scored
3 59 732 1 1 1 what these tests measure can be found on the next page. All five higher than 54 percent of the students nationally who are at the
Form/Level: G-14
tests are combined to create a Total Score. Sequences, Analogies, same age. With respect to grade, this student scored higher than 61
Test Date: 04/15/07 Scoring: PATTERN (IRT) percent of the students nationally who are in the same grade.
and Quantitative Reasoning are combined to yield a Total
QM: 31 Norms Date: 2007
Non-Verbal Score; Verbal Reasoning-Words and Verbal
Class: JONES
Reasoning-Context are combined to create a Total Verbal Score. The Cognitive Skills Index (CSI), which is shown beneath the
School: WINFIELD
District: GREEN VALLEY table, is an age-dependent standardized score based on an
Displayed on the left are the norm-referenced scores for every individual's performance on InView. This score indicates a student's
content area tested. The National Percentile by Age and the overall cognitive ability relative to other students for the same age
National Percentile by Grade are listed in the last two columns. A without regard to grade. The CSI has a mean of 100 and a standard
City/State: ANYTOWN, U.S.A.
National Percentile by Age is based on the student's cognitive deviation of 16. This means that two-thirds of the students in the
ability with respect to students of the same age, regardless of their national norm group had CSI scores between 84 and 116. The CSI
Pg. 3 grade in school. A National Percentile by Grade compares a student range indicates that if the student had taken the test numerous
with other students in the same grade, regardless of their times, two-thirds of the scores would have fallen within the range
ages. Displayed on the right is a graph of both National Percentile shown.
331
FIGURE 11.3 TerraNova Test Scores for Fictitious Students
/Users/user-s172/Desktop/MHSF123-11
gre78720_ch11_322-357.indd Page 332 4/4/09 3:29:02 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
apples” rather than “apples and oranges.” The equivalence of administration and scor-
ing methods supplies you with the confidence that you can attribute gains to student
growth as you see scores improve across time. Without standardization, gains could
be attributed to irrelevant things such as changes in your directions, variations in time
for the assessment, or idiosyncratic scoring.
LBARGE
ENEFITS AND PITFALLS OF
-SCALE ASSESSMENTS
We will discuss the complexities of large-scale assessments primarily in the context of
No Child Left Behind. Since the law was implemented, researchers have begun to
gather information about its intended and unintended impact on education in the
United States. We can glean some important lessons from these trends that apply to
other high-stakes tests as well.
As we begin to sort out the issues associated with the increased use of large-scale
achievement tests, we must remember the basic purpose of the NCLB law. It started
with very good intentions—closing the widely acknowledged achievement gaps that
have developed in the United States. The law had bipartisan support in Congress
because everyone supports this important goal. We address three issues related to
large-scale assessments: comparisons, curriculum concerns, and addressing improve-
ment. Table 11.3 summarizes these issues.
Comparisons
One of the most important benefits of large-scale tests is their standardization,
which is implemented to allow systematic, meaningful comparisons across different
groups on the same content. Evidence has begun to accumulate that the large-scale
tests initiated to comply with the NCLB legislation have helped school districts
focus on struggling children by using disaggregated data to compare groups and
pinpoint weaknesses. Districts that previously looked only at average scores often
TA B L E 1 1 . 3
Benefits and Pitfalls Related to Use of Large-Scale Tests
for NCLB Accountability
Issue Benefit Pitfall
discovered that minority or poor children in their districts were not faring as well
as their middle-class European American peers. In a recent analysis, Time magazine
suggested that spotlighting “the plight of the nation’s underserved kids” is NCLB’s
“biggest achievement” (Wallis & Steptoe, 2007). One principal we know who is an
expert on working with low-achieving African American children attributes the
burgeoning number of requests he gets to consult with other schools about how to
enhance achievement of African American males to NCLB and the awareness it has
generated. Working to close achievement gaps is an important assessment focus we
have stressed throughout this text as necessary for preparing students for participa-
tion in a democratic society.
Unfortunately, different states require different levels of proficiency on their
large-scale tests. Because levels of quality are determined by each state, wide variations
in what students know and can do to be “proficient” for NCLB occur (Nichols &
Berliner, 2007; Wallis & Steptoe, 2007). For example, to achieve proficiency in Wyo-
ming, you must perform significantly better than you have to in Oklahoma. The lower
Oklahoma standards make it appear that more students are proficient. One way this
happens is through variations in the reading difficulty of items. For example, two states
may both require fourth graders to be able to distinguish between fact and opinion,
but one state may use simple sentences for this task while another uses complex pas-
sages. Even with standardized administration and scoring, you can only compare
scores within a state and not across states when this is the case.
Paradoxically, states with higher requirements for proficiency find themselves at
a disadvantage with NCLB accountability. Fewer of their students, compared to other
states, meet annual goals, and so they are more likely to miss annual yearly progress
targets and to receive sanctions. The fact that standards of proficiency declined in
30 states between 2000 and 2006 is a worrisome trend suggesting possible “watering
down” of expectations under the pressure of NCLB sanctions (Wallis & Steptoe, 2007).
To address this issue, some national policy groups are making the case for international
benchmarks (McNeil, 2008).
We can draw lessons from these events about the importance of comparisons for
large-scale tests. For any standardized large-scale test taken by your students, whether
norm- or criterion-referenced, you should know the basic facts about the test. If the
test uses criterion-referenced scoring, you should know the understanding and skills
required for attaining proficiency so you know what your students must aim for. If the
test uses norm-referenced scoring, you should know whether the norm group is appro-
priate for your students. For tests with either (or both) kinds of scoring interpretations,
you and your fellow teachers and administrators should be able to use the scores to
disaggregate the data for use in pinpointing groups who may need extra attention.
Familiarity with standards and understanding what comparisons are meaningful will
help you keep these tests in perspective.
Curriculum Concerns
The second issue related to the high-stakes test controversy we must address is cur-
riculum concerns. How do these tests affect the curriculum? Several studies have sug-
gested that NCLB assessments have influenced educators to develop policies and
programs that better align curriculum and instruction with state standards and assess-
ments (Center on Education Policy, 2006). Other research suggests that a well-designed
exit exam, another type of high-stakes test, can produce positive impact on curriculum
alignment, improve the quality of instruction, and generate positive attitudes and use-
ful dialogue among teachers (Yeh, 2005). Most assessment experts believe alignment
gre78720_ch11_322-357.indd Page 335 4/4/09 3:29:02 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
between learning goals, instruction, and assessment ensures that instruction addresses
the understanding and skills students need to learn. It also helps guarantee that students
are fairly assessed on that specific content, and it helps keep standards high for all
students, as discussed in Chapter 2. NCLB outcomes related to enhancing alignment,
then, suggest a positive influence of high-stakes tests on instruction consistent with
recommendations for best practice.
On the other hand, evidence suggests that curriculum is narrowing excessively
as a result of the NCLB focus on math, reading, and science (Au, 2007). Educators
fear subjects not tested, such as art, history, or the skills needed for democratic
participation, will not be taught. We have anecdotal indicators of this, such as recent
complaints we heard from a middle school principal that student handwriting is
becoming impossible to decipher because elementary schools no longer teach cur-
sive writing because “it’s not in the standards.” Broader evidence comes from a study
that queried educators from all 50 states. In that study, 71 percent of the respon-
dents reported they had increased time for reading and math and decreased time
for science and social studies in response to NCLB (Center on Education Policy,
2006). Even more alarming, this curriculum narrowing occurred more in high-
poverty districts (97%) than in others (55–59%). Critics of high-stakes tests have
suggested teachers perceive pressure to “teach to the test” and therefore spend time
having students drill on facts as they eliminate more demanding activities they see
as unlikely to contribute to increasing scores. These lackluster strategies in turn
reduce student interest and motivation, leading to further depression of educational
achievement.
Such actions, however, are not effective and they do not increase either test
scores or student achievement. To buttress this point, research is emerging suggest-
ing that teachers who teach to high standards (and not to the test) have students
who excel on large-scale tests. A study in the Chicago public schools analyzed teach-
ers’ assignments to students in math and writing in Grades 3, 6, and 8 (Newman
et al., 2001). They found teachers who gave more challenging work had students who
made larger than average gains on annual standardized tests. Similarly, Reeves (2004)
studied schools with 90 percent low-income and ethnic minority students, yet with
90 percent also achieving mastery on annual state or district tests. He found
schools with these characteristics emphasized higher-order thinking, in particular by
requiring performance assessments with written responses demonstrating students’
thinking processes. Finally, studies show teachers who take time to systematically
develop formative assessment strategies have students who do better on general abil-
ity tests or end-of-year tests (e.g., Wiliam et al., 2004). Formative assessment and
more challenging work involving critical thinking allow students to develop and
apply multiple, flexible strategies so they can capitalize on their strengths and com-
pensate for their weaknesses when confronted with the broad content of a large-scale
test (Sternberg, 2008).
These findings suggest a lesson to take away from the controversy about the
impact of large-scale high-stakes tests on curriculum. “Teaching to the test” by focus-
ing on knowledge-level thinking and rote memory learning is a no-win strategy. Stu-
dents are shortchanged, class is boring for teacher and learners, and test scores will
not rise. Instead, providing students with thought-provoking work and tasks with real-
world applications aligned to standards and learning goals will be more effective. The
other element, of course, is plenty of formative assessment along the way so students
receive feedback on how to close the gap between where they are and where they need
to be and are provided opportunities to improve. See Box 11.1 for the experience of
one of our own colleagues who took this path.
gre78720_ch11_322-357.indd Page 336 4/4/09 3:29:03 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
B O X 1 1 . 1
Preparing Students for Annual for teaching reading, and stressing writing for authentic pur-
Large-Scale Tests? poses (e.g., writing editorials). So she ended up using only
about a third of the material in the test prep notebooks. At the
In her fourth year of teaching, Barbara Blackburn was as-
end of the year, when test scores were received, the principal
signed two language arts classes filled with students who had
called Barbara into his office. She was apprehensive, worrying
scored below average on the annual large-scale test the year
her students had failed because she had not spent enough
before. The principal provided her with lots of “test prep”
time specifically on preparation for the tests. Instead, the
materials and told her this was a “make or break” year for the
principal was overjoyed. Almost all the students in those lan-
students. She found her students responded much better to
guage arts classes had achieved proficiency. He was eager to
activities aligned with standards but with real-life implica-
know how Barbara had worked this miracle.
tions, such as using the state driver’s manual and USA Today
Addressing Improvement
The third issue we must address related to the high-stakes test controversy is improve-
ment. The point of implementing NCLB was to improve the nation’s schools and close
achievement gaps. Has the use of large-scale tests enhanced achievement, narrowed
achievement gaps, and helped improve schools that fail to make annual yearly prog-
ress? Evidence is currently mixed.
Extensive studies of all 50 states by the Center on Education policy (2007, 2008)
report scores on reading and math have climbed since NCLB was enacted in 2002.
They also report evidence that achievement gaps between groups are narrowing. The
authors temper the reports, however, by stating they cannot necessarily attribute these
gains to NCLB because many schools have implemented other significant policies to
increase student achievement since NCLB began. The impact of NCLB is entangled
with the impact of these other programs.
Critics believe the problems with NCLB overshadow any gains, even if gains can
be directly attributed to this law. They believe NCLB accountability tests have become
an end in themselves, rather than a check on the real goal, which is enhanced under-
standing and skills for all children. As a farmer relative of ours is fond of saying, “The
hogs won’t gain weight just because you weigh them more often.”
An inordinate focus on the “weighing” (in this case, the NCLB accountability
test) can lead to problems. As you already know from our discussions of reliability and
validity in Chapter 6, you should never make important decisions on the basis of one
test because error is always involved. In addition, the fact that these test scores are
used to decide on significant sanctions leads to great pressure to enhance scores with-
out necessarily enhancing student achievement (which we termed score pollution in
Chapter 1). This trend has been called “Campbell’s Law” (Nichols & Berliner, 2007).
Donald Campbell, a social psychologist, philosopher of science, and social researcher,
pointed out that when a single social indicator is increasingly used to make important
decisions, it will also come under increasing pressure to be corrupted and will become
less and less useful as an indicator of progress. Examples of augmenting scores at state
and local levels using unethical means (e.g., teachers reviewing specific test content
before the test) are documented by Nichols and Berliner. Their examples provide con-
firmation of such pressure and evidence the scores themselves may not always be an
accurate and valid indicator of student achievement gains.
In our view, if the results of any large-scale assessments are to be at all useful
for addressing educational improvement, the focus must be on increasing learning, and
not simply on increasing test scores. Test scores must be seen merely as one indicator
gre78720_ch11_322-357.indd Page 337 4/4/09 3:29:03 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
of broader learning, and not as an end in themselves. After all, when students have
left school and become adults participating in a democratic society, they are judged
on their accomplishments and their character, not on their test scores. We lose sight
of these larger issues when we focus on improving test scores rather than on develop-
ing all facets of learning. From this larger perspective, test scores can provide general
information used to modify educational programs and procedures put in place by
schools. Results can generate ideas for improving teaching and learning at the school,
district, and state level. Examples could include using scores to decide to increase staff
development in areas where a school shows weakness, or using scores to see whether
programs in place are having the intended impact on learning. When scores are used
to provide clues about how to fine-tune educational processes, they are used forma-
tively rather than summatively.
In contrast, efforts will be superficial and test results will become less valid and
less useful if educators merely aim to increase scores. Examples include teachers
reviewing specific test content before the test, schools removing less able students from
testing, or districts focusing interventions only on students close to the proficiency
cutoff while ignoring lower achievers. Such practices may influence test scores, but
they will not enhance overall achievement for all students.
Ethics Alert: All these practices contribute to score pollution and may warrant an ethics
review of a teacher.
TA B L E 1 1 . 4
Suggestions for Test Preparation
1. Know the standards and teach them.
TA B L E 1 1 . 5
Suggestions for Knowing and Teaching the Standards
1. Collaborate with other teachers to design instruction and assessments aligned with the
standards.
2. Make students aware of the learning goals and evaluation standards you are working
on together.
4. Provide equal access by differentiating instruction using more than one method of
gaining and demonstrating learning.
gre78720_ch11_322-357.indd Page 339 4/4/09 3:29:03 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
classroom but also on large-scale tests and in their daily life, and can provide meaning
as well as a larger context to your classroom activities. It can also help students see the
large-scale tests as nonthreatening and more connected to their ongoing work (Taylor
& Nolan, 2008).
Another part of knowing and teaching the standards is stressing instruction and
assessment activities in your classroom that address higher-order thinking skills, espe-
cially if you want students to use the learning they acquire in your classroom in other
contexts. They must be able to apply concepts they are learning to novel situations if
they are to use them effectively in large-scale assessments, or, of course, in daily life
or for future work or learning.
A final issue related to knowing the standards and teaching them is providing
equal access to the content for all of your students. Using differentiated instruction,
including supplying students with alternative methods of gaining information and
demonstrating their learning, are key strategies.
TA B L E 1 1 . 6
Skills Useful for Taking Tests
1. Use time wisely. • Answer all easier items your first time through the test while
skipping the harder ones.
• Give yourself enough time to address all questions (e.g.,
allocate more time for questions with more points).
4. Use logic. • With multiple-choice questions, read all choices and eliminate
incorrect answers (distracters) before choosing the correct
answer.
• Make sure you take into account any words that reverse the
meaning of a question such as “LEAST,” “NOT,” or “EXCEPT.”
• With complex multiple-choice items, eliminate any single
answer first, as well as multiple answer choices that include
the single answer.
5. Use shrewd • Be sure to answer all questions, especially if you can eliminate
guessing. at least one alternative.
Some educators also recommend occasionally showing students released items from
previous tests related to a learning goal you are working on. Such a practice familiar-
izes students with the format and style of the test as well as reassuring them about the
familiar content.
takes the test under the same circumstances as the norm group. Also, the only way
results from different classrooms, schools, and states can be compared fairly is if the
conditions under which all students take the test are the same.
Testing directions go into detail about the arrangement of desks and the class-
room (e.g., educational posters or academic information on the classroom walls must
be covered), how to pass out the tests, whether students can have scratch paper, and
how to time the test precisely. Written instructions you read aloud to the students are
also provided. If you and your students remember these procedures are designed to
make the test fair and to allow useful interpretations, they will make sense and not
feel intimidating to students.
LFOR
ARGE-SCALE TEST ACCOMMODATIONS
DIVERSE LEARNERS
Now that we have discussed the importance of standardization with the administration
of large-scale tests, we must also point out that a few exceptions to these strict stan-
dardization practices, called testing accommodations, are permitted for certain chil- Testing Accommodations
dren with disabilities and for students with limited English proficiency. Accommodations Exceptions to the strict standardiza-
are formally chosen by a team of educators that includes classroom teachers and others tion practices that allow students
to demonstrate their learning
who regularly work with designated children. Each state has a list of acceptable accom- without altering the basic meaning
modations your team of educators may use for making accommodations decisions of the score.
based on students’ demonstrated needs. These accommodations should be in place for
all assessments, not just for large-scale assessments. The accommodations should be
consistent with those incorporated into typical classroom practices throughout the year
(Thompson et al., 2002).
The purpose of such accommodations is to allow students to demonstrate their
understanding and skills without being unfairly restricted by a disability or limited
English proficiency. For example, one of the most common accommodations is extend-
ing the amount of time available for an assessment. Extra time can remove an obstacle
to doing well for those who work slowly because of either a disability or limited pro-
ficiency with English.
No accommodation is supposed to change the constructs the test measures. Change
to the content brings into question the validity of the student’s results and eliminates the
value of the comparisons that can be made with them. For example, simplification of
test questions or eliminating some alternatives could alter the meaning of a student’s test
score. This practice would be considered a testing modification, or a change made to Testing Modification
the testing process that does change the construct being measured. A change in the testing procedures
Sometimes, a modification depends on the content of the test. If a reading com- that alters the meaning of the score.
prehension test is read aloud to a student, the test no longer measures the construct
of reading comprehension. Instead, it measures listening comprehension. Similarly,
using spell-check devices for a test measuring writing skills, or using a calculator for
a test measuring math computation skills, are testing modifications because they alter
the interpretations that can be made about students’ level of proficiency based on the
test results.
The most common testing accommodations are grouped into four categories:
flexibility in scheduling, flexibility in the setting in which the test is administered,
changes in the method of presentation of the test, and changes in the method of stu-
dent response to the test. The effectiveness of accommodations should be regularly
evaluated by the team working with any student who receives them. Table 11.7 lists
examples of accommodations.
gre78720_ch11_322-357.indd Page 342 4/4/09 3:29:04 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
TA B L E 1 1 . 7
Examples of Common Testing Accommodations for Use with Students
with Disabilities or Students with Limited English Proficiency
Type of Accommodation Examples
Reliability
To examine reliability related to assessment occasion, test developers have a group of
students take the same test twice, usually several weeks apart. They then conduct cal-
culations to see how much variation occurs between the two scores for each student.
gre78720_ch11_322-357.indd Page 343 4/4/09 3:29:04 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
TA B L E 1 1 . 8
Questions That Should Be Addressed in the Technical Manual
for a Large-Scale Test
Reliability
Validity
• What empirical evidence is available for the legitimacy of comparing results of this test
across persons, classrooms, and schools?
• Has a table of specifications been provided to gauge alignment between local learning
goals and this assessment?
• Is the process used in consulting experts and reviewing major textbooks in the
development of test content described?
The resulting measure of stability is termed test-retest reliability. Another form of Test-Retest Reliability
reliability provides a measure of internal consistency by comparing student scores on The degree of stability between
items within the test. An example of this is to split the test in half, correlating students’ two administrations of the same
test to the same students.
scores on even items with their scores on the odd items.
Validity
Designers and evaluators of large-scale tests also assemble evidence supporting the
accuracy of test score interpretations. They must make sure that the norm group is
appropriate and that the test content representatively samples the subject areas
addressed. This process should be carefully described in the technical manual, and a
table of specifications is often provided. Large-scale tests of achievement in a particu-
lar subject usually cover a broader range of material than is covered in any one school
district’s assessments. The degree of overlap between the large-scale test’s coverage and
local coverage can vary. The degree of overlap should be considered when drawing
inferences about children’s achievement. The test publisher must demonstrate in the
technical manual that evidence relating this test to other established measures shows
it is an accurate, representative, and relevant measure of student performance for the
intended purpose.
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
FIGURE 11.4 Two Sets of Scores with the Same Mean but Different Standard
Deviations
Many
Number of People
Some
None
n
n
n
w Ds
e s
w D
e D
ea
ea
ea
ea
ea
ov SD
lo S
ov 1 S
S
m
m
m
be 1
2
ab 2
lo
be
ab
FIGURE 11.5 The Normal Curve with Estimates of the Percentage of Scores That Fall
Along Each Part of the Distribution
of us are about average in height. As you move away from the mean in either direction,
fewer and fewer people appear, until only a very few can be found either at the extreme
short end of the height continuum or at the extreme tall end. Figure 11.5 presents the
normal distribution.
When you transform a large group of raw scores to the bell shape of the normal
distribution, the pattern becomes regular and predictable. As you see in Figure 11.5,
68 percent of a group of scores is predicted to fall within one standard deviation on
either side of the mean. Even though each normalized distribution has its own mean
and standard deviation, the pattern of percentages of scores across the normal distribu-
tion always stays constant. If you know the mean and standard deviation of the norm
group, you can estimate with accuracy the percentage of scores that will lie along each
part of the range of possible scores. Whether the mean of the norm group is 15 or 50,
50 percent of the scores will always lie at or below that mean for the group. Similarly,
whether the standard deviation is 15 or 50, 16 percent of the scores will always fall at
or below one standard deviation below the mean.
gre78720_ch11_322-357.indd Page 346 4/4/09 3:29:09 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
Number of Students
6
0
1 2 3 4 5 6 7 8 9 10 11
Test Scores
FIGURE 11.6 A Distribution of Actual Test Scores That Could Be Represented by the
Normal Curve
It can be hard to remember that the foundation of this smooth curve consists of
the actual test scores for a group of test takers. For example, in Figure 11.6 you see a
pattern of actual scores that could underlie a normalized distribution like that in Fig-
ure 11.5. You can see that one person earned a score of 1 on this test, and six people
earned a score of 5. Remember, when you look at a normal distribution (Figures 11.2,
11.4, 11.5), it actually represents a group of real test takers’ scores.
Percentile Rank
In our experience, the most common type of score used in schools is percentile rank.
Percentile or Percentile Rank A percentile or percentile rank indicates the percentage of students in the norm group
A score that indicates the at or below the same score as your student. By looking at Figure 11.5, you can see that
percentage of students in the a student who scores at the mean is at the 50th percentile. If you add up the percentages
norm group at or below the
same score. of the number of students who scored below the mean (2% plus 14% plus 34%), you
arrive at 50 percent. If instead your student’s score is one standard deviation below
the mean, you can see that the student’s score is at the 16th percentile because 2 percent
plus 14 percent of the norm group was at the same score or below.
Percentile ranks range between 1 and 99.9. They reflect the pattern shown in the
normal curve. Typically, average scores are considered to range between the 25th and
75th percentiles. Because more students are clustered around the middle of the distri-
bution, the distance between percentile ranks is smaller near the mean than at the
gre78720_ch11_322-357.indd Page 347 4/4/09 3:29:11 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
extremes. Because percentile ranks don’t have equal intervals between them, you should
not add them or do other calculations with them.
Sometimes our teacher candidates confuse percentile ranks with percentage cor-
rect. Remember that percentile rank tells you your students’ relative standing com-
pared to students in the norm group. So, if your SAT percentile rank is 87 for the
critical reading section, you did better on the test than 87 percent of the students in
the norm group. Percentile rank therefore provides a norm-referenced interpretation
of scores. Percentage correct instead provides an indicator of the amount of content
mastered and therefore can be used only in criterion-referenced interpretations.
Instead of providing a single percentile rank score, many commercial test design-
ers now provide a range of percentile scores incorporating the standard error of mea-
surement for each section of a test. These are called percentile bands, confidence Percentile Bands
intervals, or percentile ranges. The higher the reliability of a test, the narrower the A range of percentile scores that
band appears. Percentile bands are useful for determining individual students’ strengths incorporate the standard error of
measurement.
and weaknesses. If the bands overlap across different parts of the test, differences
appearing in the specific percentile score on each part are likely due to measurement
error. If the bands do not overlap, a real difference in performance is more likely.
For example, if you return to the Terra Nova results in Figure 11.3, Pat Washington’s
percentile bands are pictured on the right on page 2. You can see that his social stud-
ies percentile band does not overlap with his language percentile band, suggesting a
real difference for him between these areas. On the other hand, his reading, language,
and mathematics percentile bands do overlap, suggesting they are not significantly
different. Also notice the average range (25th–75th percentile) is shaded on the
national percentile scale. You can see that all of Pat’s scores fall within the average
range, even though his language score is his strongest and his social studies score is
his weakest.
Whenever we explain percentile ranks and percentile bands to our students, we
are reminded of a poignant story told to us by one of our teacher candidates. During
the class on interpreting standardized test scores, another student, who happened to
be a P.E. major, wondered aloud why the class needed to learn to interpret these scores.
This candidate spoke about her brother and told the story that appears in Box 11.2.
B O X 1 1 . 2
Stanines 1 2 3 4 5 6 7 8 9
Stanines
Whereas normal curve equivalents divide the distribution of the norm group’s scores
Stanine into 99 equal parts, stanines divide the norm group’s scores into only 9 equal parts,
A standardized score that indicates with 1 representing scores in the lowest part and 9 representing scores in the highest
where a student’s score falls in part. Scores around the middle of the distribution are represented by a stanine of 5
relation to the norm group when
the distribution of the scores is (see Figure 11.7). With only 9 scores possible, a stanine score therefore represents an
divided into nine equal parts. approximate score within a range rather than pinpointing a single score. Because stanine
scores fall in a range, they often fluctuate less across time than percentile ranks or
normal curve equivalents, which fall at a single point. For this reason, many teachers
find them useful for examining student performance across years to see if a student’s
relative standing changes compared to peers. Teachers can also use stanines to compare
student performance across different kinds of test content. For example, a student who
attains a stanine of five in math and a stanine of two in English appears stronger in
math. Generally, a difference of two stanines suggests a significant difference in per-
formance, whether across years on the same test or between two different types of tests
taken the same year. For example, in Figure 11.3 you can see the stanine range for Pat
Washington’s scores at the bottom of the figure on the right of page 2. His stanine
gre78720_ch11_322-357.indd Page 349 4/4/09 3:29:14 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
score for social studies is four, and his stanine score for language is six, again suggest-
ing a significant difference between his performance in these two areas.
are not systematically derived from the characteristics of the norm group like stan-
ines, percentile ranks, or normal curve equivalents. Therefore, grade equivalent scores
are seldom useful for interpreting individual students’ scores on tests or subtests, for
comparing scores on different types of tests, or for exploring growth across time
(Sattler, 1992). Many test publishers are eliminating the reporting of age or grade
equivalent scores for these reasons.
IFOR
NTERPRETING LARGE-SCALE TESTS
STUDENTS AND PARENTS
To help you to summarize the most important points related to interpreting large-scale
test scores, we now turn to the steps to take in explaining them to students and parents.
(You have probably heard one way to learn something really well is to explain it to
someone else, and that is likely to happen as you begin summarizing test scores for
others.) One middle school teacher we know has listed the common questions he gets
about large-scale tests, and these are presented in Table 11.9. As you can see, parents
TA B L E 1 1 . 9
One Teacher’s List of Parents’ Common Questions About
Large-Scale Tests
1. What is the required score to pass this test?
TA B L E 1 1 . 1 0
Key Large-Scale Test Interpretation Rules for Students and Parents
1. Explain the purpose of the test and exactly what it measures.
4. Choose one type of score (e.g., percentiles, stanines, proficiency level), explain how it
works, and then use that type of score to describe all parts of the test.
6. Work with the parent and student as partners in designing recommendations for what
the next steps should be given the information you have all gleaned from these scores.
want to put these scores in context (e.g., What is required to pass? What was the
average of other students? Has my child made progress?). They also need some basic
information about the test (What does this score mean?) and how scores are derived
(How was this score calculated?).
To answer these questions satisfactorily, we have devised a few rules for you to
follow. These are useful steps for you to take when interpreting scores for your own
use of test scores, and they can be especially helpful as you think about sharing test
results with your students and their parents. We have followed these rules ourselves in
many parent conferences. Table 11.10 shows a summary of the rules.
test score report illustrates percentile bands, these can be used to explain the range of
Over-Interpretation error around each score. Specific score numbers should not be overinterpreted. Over-
Incorrect assumptions and interpretation occurs when conclusions are drawn that are not warranted by the data.
conclusions that are not warranted For example, if a student scores at the 45th percentile in math and at the 55th percen-
by the available data.
tile in language arts on a nationally normed large-scale test, you should not conclude
the student is “below average” in math. In addition, if you take the standard error of
measurement into account, you would see that a student has a good chance of scoring
higher than the 45th percentile if the mathematics test were taken again. (Of course,
the percentile band will also show the student may score lower if that student were to
take the test again.) Using percentile bands or stanines to explain scores helps reduce
overinterpretation.
TA B L E 1 1 . 1 1
Uses of Large-Scale Tests for Classroom Teachers
Determine general strengths and needs of individuals for
• Partnering with student and family to enhance achievement
• Differentiating instructional tasks
process skills, such as careful observation. Encouraging your school to sponsor faculty
workshops related to effective methods for teaching science and instituting family
science nights could also be helpful.
Large-scale test results, because they provide only a broad sweep across the
content without a lot of detail, cannot be used for precise diagnostic purposes. From
large-scale test results, for example, you can see whether students tend to do better
in math computation than in math concepts, but the results don’t give you enough
information to know specifically which math concepts to teach or how to teach them.
Instead, you must design your classroom diagnostic and formative assessments for
that purpose. Large-scale test results can point you in the right direction and provide
clues to orient your classroom approach. They are most useful for providing general
information about trends within and across years and across subject content for
groups and individual students. Table 11.11 summarizes the ways you can use large-
scale test scores to add to your understanding of your students’ academic abilities
and skills.
used for students with disabilities and for English language learners to allow students
to demonstrate their understanding and skills of the construct tested without being
unfairly restricted by a disability or by limited English proficiency.
We next discussed reliability and validity in the context of large-scale tests,
then turned to interpreting norm-referenced tests, showing how the mean and stan-
dard deviation of the norm group are used to compare your students’ scores to those
of the norm group using percentile ranks, percentile bands, normal curve equiva-
lents, and stanines. We next explained why grade and age equivalent scores are
problematic.
We moved on to interpreting criterion-referenced tests and then discussed the
steps in interpreting large-scale tests for students and parents. Finally, we talked about
the ways that classroom teachers can use large-scale testing results in their work.
HELPFUL WEBSITES
http://www.cep-dc.org/
The Center on Education Policy is a national, independent advocate for public education and for
more effective public schools. The website provides recent research on current topics such as
No Child Left Behind and standards-based education reform. The organization is funded by
charitable foundations and does not represent special interest groups. Its goal is to provide
information to help citizens understand various perspectives on current educational issues.
www.centerforpubliceducation.org
The Center for Public Education website is an initiative of the National School Board
Association. It offers up-to-date, balanced information on research involving current
issues in public education, such as high-stakes testing. It also provides access to
summaries of recent education reports, polls, and surveys related to improving student
achievement and the role of public schools in society.
interpreting large-scale test results? Complete the table below to answer this
question.
Percentile rank
Percentile band
Stanine
Scale score
Grade- or age-equivalent
9. Provide two reasons for not using grade- or age-equivalent scores when
interpreting large-scale test results.
10. Address the points in Table 11.10 for the test results in Figure 11.3.
11. If a student’s score falls one standard deviation above the mean, what will be her
percentile rank?
12. Describe at least two specific ways you might use the results of large-scale tests
for individual students. Describe two specific ways you might use the results for
a whole class.
REFERENCES
Au, W. 2007. High-stakes testing and curricular control: A qualitative metasynthesis. Educational
Researcher 36: 258–267.
Center on Education Policy. 2006, March. From the capital to the classroom. Year 4 of the No
Child Left Behind Act. Washington, DC. Retrieved October 23, 2007, from http://www.
cep-dc.org/index.cfm?fuseaction⫽Page.viewPage&pageId⫽497&parentID⫽481.
Center on Education Policy. 2007, June. Answering the question that matters most: Has student
achievement increased since No Child Left Behind? Washington, DC. Retrieved November 2,
2007, from http://www.cep-dc.org/document/docWindow.cfm?fuseaction⫽document.
viewDocument&documentid⫽200&documentFormatId⫽3620.
Center on Education Policy. 2008, June. Has student achievement increased since 2002? State test
score trends through 2006–07. Retrieved June 25, 2008, from http://www.cep-dc.org/
document/docWindow.cfm?fuseaction⫽document.viewDocument&documentid⫽
241&documentFormatId⫽3769.
Darling-Hammond, L. 2007. The flat earth and education: How America’s commitment to equity
will determine our future. Educational Researcher 36: 318–334.
Flynn, J. R. 1999. Searching for justice: The discovery of IQ gains over time. American
Psychologist 54: 5–20.
McNeil, M. 2008. Benchmarks momentum on increase. Education Week 27(27): 1, 12–13.
Newman, F., A. Bryk, and J. Nagoaka. 2001. Authentic intellectual work and standardized tests:
Conflict or coexistence? Chicago: Consortium on Chicago School Research. Retrieved
November 5, 2007, from http://ccsr.uchicago.edu/content/publications.php?pub_id⫽38.
Nichols, S., and D. Berliner. 2007. Collateral damage: How high-stakes testing corrupts America’s
schools. Cambridge, MA: Harvard Education Press.
Reeves, D. B. 2004. The 90/90/90 schools: A case study. In D. B. Reeves, Accountability in action:
A blueprint for learning organizations, 2nd ed. Englewood, CO: Advanced Learning Press.
gre78720_ch11_322-357.indd Page 357 4/4/09 3:29:16 PM user-s172 /Users/user-s172/Desktop/MHSF123-11
References 357
Sattler, J. 1992. Assessment of Children. 3rd ed. San Diego: Jerome M. Sattler, Publisher, Inc.
Solorzano, R. W. 2008. High stakes testing: Issues, implications, and remedies for English
language learners. Review of Educational Research 78: 260–329.
Sternberg, R. 2008. Assessing what matters. Educational Leadership 65(4): 20–26.
Taylor, C., and S. Nolan. 2008. Classroom assessment: Supporting teaching and learning in real
classrooms. Saddle River, NJ: Pearson Education.
Thompson, S., C. Johnstone, and M. Thurlow. 2002. Universal design applied to large scale
assessments (Synthesis Report 44). Minneapolis, MN: University of Minnesota, National
Center on Educational Outcomes. Retrieved November 13, 2007, from http://education.
umn.edu/NCEO/OnlinePubs/Synthesis44.html.
Wallis, C., and S. Steptoe. 2007. How to fix No Child Left Behind. Time 169(23): 34–41.
Wiliam, D., C. Lee, C. Harrison, and P. Black. 2004. Teachers developing assessment for learning:
Impact on student achievement. Assessment in Education 11: 49–65.
Yeh, S. 2005. Limiting the unintended consequences of high-stakes testing. Education Policy
Analysis Archives 13: 43. Retrieved October 23, 2007, from http://epaa.asu.edu/epaa/
v13n43/.
gre78720_ch12_358-387.indd Page 358 4/4/09 3:37:40 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
358
gre78720_ch12_358-387.indd Page 359 4/4/09 3:37:41 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
ﱟﱟﱟﱟﱠﱟﱟﱟﱟ
CHAPTER 12
INTRODUCTION
We wholeheartedly agree with the quotation beginning this chapter—we already
know what to do to successfully teach all children. In fact, we now know even more
about what makes for effective schools than we did when Edmonds wrote those
words in 1979. In particular, studies of schools and classrooms have shown that
effective classroom assessment practices can be one of the most important ingredi-
ents for enhancing student achievement (Reeves, 2004; Wiliam et al., 2004). Through-
out this book we have acquainted you with classroom assessment practices to
motivate students and help them learn to their full potential. In this chapter we take
the opportunity to review some key guidelines, discuss issues related to efficiency in
assessment practices, show you illustrative examples from real classrooms, and help
you set goals for your own classroom assessment practices to equip your students to
become lifelong learners and active participants in a democratic society.
TA B L E 1 2 . 1
Six Essential Assessment Guidelines
1. Begin with the end in mind: Have clear, high expectations for developing understanding
and skills that can be flexibly used across contexts.
2. Find out what students know: Use diagnostic assessment supporting differentiated
instruction.
3. Check as you go: Use flexible formative assessment to help students close the gap
between where they are and where they need to be.
4. Teach students to check as you go: Teach self- and peer-assessment to promote
internalization of goals and performance criteria.
5. Use rubrics creatively to reinforce attainment of student learning goals: Set clear and
concrete criteria for excellence.
fields, from music and theater to English and biology, and from elementary through
college levels, critical themes converge that guide effective teachers.
These guidelines are based on the foundational assumption that teachers must
be reflective in their practice and aware of the instructional climate they are creating
in their classroom. Reflective teachers weigh what they do against the question we
introduced in Chapter 1: “Will this help my students learn?” Their primary purpose
for assessment is formative rather than summative in nature. They use assessment to
be advocates who help their students learn more, not simply evaluators who gauge
where students are and assign grades accordingly. Using this approach, assessment
becomes a fluid, continuing activity intertwined with instruction rather than mere
documentation of student levels of attainment on end-of-unit tests.
Teachers sometimes struggle to provide rationales for their grading practices and
their assessment strategies. When they explain how and why they assess, they tend to
give more weight to their individual experiences, beliefs, and external pressures than
to principles of effective assessment (McMillan, 2003). We hope these guidelines,
which are based on principles of effective assessment, can serve as an important source
for the justification of the assessment practices you put in place in your classroom. We
believe they should coincide with, rather than contradict, your beliefs about assessment
and your experiences with it. Table 12.1 lists the guidelines.
A problem we sometimes see with teacher candidates is that they can be capti-
vated by interesting topics or activities (e.g., “I want to do a zoo unit.” or “I want to
design several lessons around the upcoming election.”) without having learning goals
to anchor and guide their planning. With clear goals, you can easily capitalize on the
teaching potential of creative strategies, materials, and topics (the zoo or the election).
A laser-like focus on the end you have in mind is crucial for separating wheat from
chaff as you design engaging instruction and figure out how to maximize learning.
Remember that merely offering appealing activities such as games does not guarantee
high motivation or strong learning (Sanford et al., 2007). You should build all your
instructional and assessment activities on the foundation of your learning goals and
your curriculum. This habit will make it easy to justify clearly to yourself and others
why you are doing what you are doing in your classroom.
teachers we know want more than anything to instill a love of music in their students,
and they keep this goal in mind as they design each lesson. Other teachers stress pride
in doing a good job as a goal and encourage students to believe that several drafts of
an essay should be the norm and not the exception. Others emphasize the importance
of seeing the classroom as a supportive community for learning, encouraging students
to rely on each other and celebrating their diversity. We also hope many aim to equip
their students as well-informed, active participants in a democracy. The point is to
articulate for yourself the overarching personal goals you care about so you can build
in activities to support them every day.
why we believe that “checking as you go” using formative assessment is essential. Evi-
dence continues to accumulate that, when used correctly, it is one of the most effective
interventions available to enhance student achievement.
want students to move beyond simple tasks and memorization of facts, you need learn-
ing activities that require students to manipulate ideas through thinking, writing, and
other complex understanding and skills. Rubrics are a valuable technique to help stu-
dents systematically work through and understand the characteristics of excellent work
for these more complex tasks. They aid students in envisioning the goal and analyzing
the parts that will add up to the whole. With an eye to the important understandings
and skills in your learning goals, you can construct rubrics to help both you and your
students remember what is important.
As discussed in Chapter 8, useful rubrics must communicate the central features
of a quality performance. You must figure out how to put into words the elusive factors
that determine writing is “vivid” or “concise” or that a project is “well-organized” or
“creative.” One of the best ways to do this is to have a discussion with your students
to help you define the levels of quality. Structuring assessment activities with rubrics
is a useful way to get students in the habit of thinking about and evaluating their own
and each others’ work, so that formative and self-assessment automatically become
incorporated into the process.
when problems arise, checking their instructional toolbox for new solutions for
reaching their goals.
We believe these six principles can be the foundation for excellent assessment
that promotes student growth. By now you understand well that assessment is not just
an afterthought at the end of the term. It is an ongoing process intertwined with your
classroom activities.
TA B L E 1 2 . 2
Efficiency Strategies for Effective Assessment
Strategy Examples
Carefully target the • Choose only one element for feedback and vary it across
feedback you provide. time.
• Choose only each student’s most problematic element for
feedback.
Build in time for self- and • Have students take responsibility for comparing their work
peer review and feedback. to a standard and make changes consistent with it.
Structure record keeping to • Provide progress monitoring tools and class time for updates.
encourage student self-
monitoring.
Station 2
Slices or Chunks?
Price of Kroger Pineapple Chunks: $1.89
Price of Dole Pineapple Slices: $3.29
1. What is the price per ounce for the Kroger Pineapple Chunks? (show the rate!)
2. What is the price per ounce of the Dole Pineapple Slices? (show the rate!)
3. Which is the better buy?
4. Why might someone choose to purchase the more expensive product?
FIGURE 12.1 Classroom Activity and Formative Assessment for Seventh-Grade Math Unit
Learning Goal, “Apply ratios, rates, and proportions to discounts, taxes, tips, interest, unit
costs, and similar shapes.”
gre78720_ch12_358-387.indd Page 367 4/4/09 3:37:43 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
wean our own students from expecting every piece of work to be graded by assigning
homework almost every day, but we award points for it on a random basis. We always
collect the work and read through it to look for patterns of errors in thinking or use of
skills, but that usually goes quickly compared to scoring every paper. On average, we
give points for roughly half of the assignments. To ensure quality work, we often use
homework questions as part of the foundation for more involved exam questions. When
we do assign points for homework, if students do not get full credit, they can redo the
assignment. These practices usually stimulate students to produce quality work consis-
tently even when points aren’t awarded. We also encourage them to think about the
benefits of their changing work habits in terms of development of mastery goals.
Grading and feedback on student writing can be a particularly problematic and
time-consuming area for teachers. For example, if eighth graders write just 15 minutes
per day, a class of 25 students produces an average of 375 sentences (Heward et al.,
1991). If you multiply this by several classes, providing feedback can be overwhelming
for a teacher. However, with selective checking it can be managed. Heward and his
colleagues suggest a strategy: The teacher reads and evaluates 20–25% of the papers
written each day. All students’ papers are returned the next day, and the teacher uses
the work she evaluated to develop instruction, similar examples, and feedback for
groups of students who need to work on those particular writing skills. (We caution
teachers not to use too many bad examples directly from student work to avoid dis-
couraging them.) Because this strategy is useful for differentiating instruction, some-
times the whole class and sometimes smaller groups will be involved, depending on
need. Students can receive points for producing writing each day (or on random days).
The whole class can receive additional bonus points if a high percentage of the papers
evaluated on a given day meet the criteria the teacher emphasized in a classwide daily
lesson. A small sample of representative students’ work can provide information about
understanding and skills to work on with a larger group, and this procedure can be
generalized to most subject matter. The teacher can also use this process with other
methods of examining student understanding, such as small discussion groups or
informal teacher conferences.
still limiting the scope of your comments to concentrate on improving one element of
the work. Even with a narrowed focus for explicit feedback, you should also try to
point out at least one exemplary aspect of student work to reinforce and review previ-
ous instruction.
TA B L E 1 2 . 3
Sources for Assessment Banks
Materials from teachers you know in your content area
different groups of students, modifying the basic questions to address each graph’s
content. As long as students have not seen the content of the graph before, such items
can allow you to check on this competency. Similarly, many teachers develop perfor-
mance assessments that can be tweaked to address different types of content. For
example, one social studies teacher we know developed a performance assessment
requiring students to design a page of a newspaper reflecting a historical period studied
in the class. The assignment calls for different articles displaying both news and
commentary as well as pictures, drawings, or political cartoons reflecting the period
studied. He uses this assignment only once per class, but he has used it for a variety
of significant periods in U.S. history. His effort to develop a clear assignment and
effective rubric has paid off because he can use and refine the basic elements many
times. Such item “shells” or “models” can be the foundation for a systematic and effi-
cient approach to item writing (Haladyna, 1994; Johnson et al., 2009).
ASOCIETY
SSESSMENT IN THE CONTEXT OF A DEMOCRATIC
: CLASSROOM EXAMPLES
We now turn to actual classrooms to demonstrate assessment practices in action that
incorporate the essential assessment guidelines and promote the understanding and
skills needed for participation in a democratic society. We visit two schools with very
different approaches and assumptions about teaching and learning. We use them to
illustrate how the elements of effective classroom assessment can encompass various
teaching philosophies and teaching methods. We describe some of the typical assess-
ment practices at each school and illustrate how they address our three themes (equal
access to educational opportunity, promotion of self-governing skills, and development
of critical thinking skills) related to preparation for democratic participation.
gre78720_ch12_358-387.indd Page 371 4/4/09 3:37:44 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
TA B L E 1 2 . 4
Sample Schedule from a Center for Inquiry Third-Grade Class
Time Activity Description
8:10–8:45 Exploration Children work in areas of personal interest (e.g., make entries in class journals, read
independently, play chess) and interact with the teacher and each other informally.
8:45–9:30 Morning Meeting Teacher and students engage in wide-ranging discussion (e.g., current events, science,
music) to foster connections between school content and personal lives.
9:30–11:00 Reading Workshop Reading Workshop includes read-alouds, independent reading, reading conferences,
(MWF) and Writing and literature study groups. Writing Workshop includes activities at all phases
Workshop (TTh) in the writing cycle, from choosing a topic, to learning and sharing skills and
strategies of good writers, to self-editing and author’s circle.
12:00–12:15 Chapter Book Read- Teacher chooses selections to read that are tied to aspects of the curriculum.
Aloud
12:15–1:15 Math Workshop Children investigate the purposes, processes, and content of the mathematical system
in natural contexts (e.g., calculating the most economical plan for a field trip or
interpreting results of a survey they designed).
1:15–2:30 Focused Studies Children explore integrated units of study related to broad themes such as change,
(Science or Social cycles, or systems, which include emphasis on the central role of oral and written
Studies Units) language.
2:30–2:50 Reflection, Friendship The class gathers materials to take home; then they reflect on the day together.
Circle, and Homework
gre78720_ch12_358-387.indd Page 372 4/4/09 3:37:44 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
the standards depending more on children’s interests and needs than on the more
specific scope and sequence formats used in many traditional schools.
These teachers also have overarching personal and schoolwide learning goals
related to their shared philosophy of fostering a culture of inquiry to guide the struc-
ture of their classroom activities and their assessments. This philosophy involves inten-
tionally designing learning experiences connected to life outside the classroom based
on broad, integrative concepts that involve collaboration of teachers and students in
the learning process.
TA B L E 1 2 . 5
Key Assessment Practices at the Center for Inquiry
Kidwatching: Assessment strategy requiring teachers to watch carefully and listen closely to
provide optimal learning experiences. Sample sources for kidwatching include the following:
Reading Teacher listens to and tape-records a student reading a book. Both
Conferences listen to the tape together and discuss their insights on the student’s
strengths, needs, and preferences as a reader.
Written Teacher and student hold a back and forth discussion in writing about
Conversations a book the student or the class is reading.
Morning Math Teacher designs a math challenge related to the class’s current math
Messages investigations. Students work individually on the morning message,
then one child describes the strategies he used to solve the challenge.
Other students then respond with questions, connections, and/or
appreciations.
Strategy Sharing At the end of the writing workshop, several authors describe a writing
strategy they believe others might be able to use, or describe a writing
dilemma they are experiencing and solicit advice from classmates.
gre78720_ch12_358-387.indd Page 373 4/4/09 3:37:44 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
the writing workshop allows attention to specific needs through authors’ circles and
individual writing conferences. During the math workshop, the students work together
in groups and coach each other in problem solving, so children at all levels benefit.
Another opportunity for providing additional literacy experiences for students
who need them is the weekly Literacy Club after school. Activities for this targeted
small group include fellowship and a snack, word games, and intensive reading and
writing activities aimed at accelerating growth, such as helping each other learn to use
effective strategies when encountering a difficult passage in the text. When the class is
working on a literature study of a particular book, Mr. O’Keefe has Literacy Club
students read ahead with coaching on strategies so they can participate more indepen-
dently and successfully with peers.
TA B L E 1 2 . 6
Center for Inquiry Assessment Strategies to Enhance Mastery Goals
Classroom Element Center for Inquiry Strategies
for a rain forest protection organization. Because these explorations were grounded in
the natural progression of student interests, they also addressed real-world issues.
These engaging instructional units also incorporated key state standards related to
reading, writing, science, and social studies.
Second, at the Center for Inquiry, student participation in decision making, not
only about instruction but also about assessment, is part of everyday classroom life. For
example, when Ms. Shamlin’s class was designing the expert projects on animals, she
negotiated with them the required criteria and a list of suggestions. When Mr. O’Keefe
conducts reading conferences, he first solicits the students’ perceptions about themselves
as readers. When students participate in strategy-sharing sessions after the writing work-
shop, they, themselves, decide which strategies are worth sharing. And, at report-card
time, students complete a self-assessment of their progress during that marking period.
Finally, the individualized assessment embodied in the various kidwatching
strategies encourages students to focus on their own progress and not to compare
themselves to other students. In fact, the students would have a difficult time making
such comparisons because most assessments, such as reading conferences and written
conversations, are geared to the individual student’s level, interests, and needs.
All these elements contribute to Center for Inquiry students developing mastery
goals and high motivation for academic tasks. They know their opinions and questions
are valued. They know how to take action to achieve a goal. They are practicing the
self-governing skills necessary to make good decisions and to develop control over
their own lives as they become active, contributing citizens.
TA B L E 1 2 . 7
Typical Schedule at KIPP Charlotte
Time Activity
7:30–8:00 a.m. Morning Meeting: All students and staff have breakfast, announcements,
discussion of whole school issues, and silent reflection.
8:00–9:20 Math
9:25–10:45 English
10:50–12:15 History
1:15–2:30 Study hall and “No Shortcuts” (reading intervention for lowest third of
students)
2:35–4:00 Science
4:00–5:00 Silent sustained reading (SSR) and “No Excuses” (math intervention for
lowest third of students)
the students are African American, 5 percent are Hispanic, and 65 percent are eligible
for free or reduced-fee lunch.
We focus on the fifth-grade class. These students are termed “the Pride of 2015”
because they will begin college in 2015, and each class is a family (as in a pride of
lions). The outline of a sample daily schedule appears in Table 12.7. Notice the school
day is much longer than the typical middle-school day. This is because all KIPP schools
accept many students with poor academic records—the average KIPP student starts
fifth grade at the 34th percentile in reading and the 44th percentile in math. To enable
KIPP students to catch up to peers, they spend 60 percent more time in school than
average public school students. In addition to the longer day, “KIPPsters” attend school
every other Saturday and for three weeks during the summer.
TA B L E 1 2 . 8
Key Assessment Strategies at KIPP Charlotte
Diagnostic At the beginning of the school year, all students take a diagnostic
Assessments assessment designed by the teachers directly aligned with the state
standards in each discipline.
Benchmark Every six weeks in each core subject, students take a test tied to the
Tests standards worked on during that marking period. For each subject,
students receive results keyed to each objective with a grade of “A”
(90%), “B” (80%), or “NY” (Not Yet—below 80%).
Checks for During instruction, teachers frequently ask for thumbs up if students
Understanding agree or thumbs down if they don’t.
to carefully keep track of student performance on each objective also helps shape the
next steps in instruction. You can see key assessment strategies in Table 12.8.
These assessments offer a range of information for teachers and students to
enhance learning. For example, the diagnostic tests let Ms. Adams, the math teacher,
know that she needed to start with third- and fourth-grade objectives because her fifth
graders needed a thorough review. The diagnostic tests also determine which students
needed more intense intervention, and the lowest third of the class receives extra
instruction in math and reading. Similarly, tracking benchmark test results (Figure 12.2)
allows students and teachers to determine strengths and weaknesses and to work to
improve in specific areas by later taking mastery quizzes. At first, the teachers weren’t
sure whether students would appreciate such detailed feedback, but as Mr. Pomis, the
science teacher, told us, “The more we share the more they get excited about it.”
I Am a Meteorologist.
I am a science genius.
My name is Keyshawn Davis.
My overall average on Benchmark One is 86%.
Objective 1 I can explain how the sun heats the earth 80%
during day and night.
Objective 2 I can explain how the sun heats the earth 60% 80%
during the four seasons.
Objective 3 I can explain how the sun heats the land, 100%
water, and air.
Objective 4 I can identify major air masses in North 60% 90%
America.
FIGURE 12.2 Sample Student Results After Benchmark One in Science and Subsequent
Mastery Quiz
Equal Access to Educational Opportunity One important way all KIPP schools focus
on closing the achievement gap and providing equal access to educational opportunity
is through extra instructional time allowing for accelerated learning. Because many
KIPPsters start fifth grade well behind typical peers, a basic necessity is to provide a
60 percent longer school day than typical middle schools so that they will have more
time to catch up. They must make more than a year’s growth in a year’s time. In addition,
KIPP Charlotte provides extensive intervention programs in math and English for the
lowest third of the class during times that do not interfere with the four core courses.
Another element related to KIPP Charlotte efforts to help students close the
achievement gap is a culture of high expectations. For example, when Ms. Young’s
English students were designing sentences using the prefix “bi,” one of the students
offered a sentence with the word “bicycle.” She stopped him saying, “Why did you use
bicycle? You forfeited an opportunity to expand your vocabulary,” because he did not
choose a less-common word.
Toward the end of science class, Mr. Pomis said, “If your table is super focused, you’ll
get the homework first.” This statement conveyed the belief that getting the homework was
something to be anticipated with excitement, not something to dread. Similarly, when ask-
ing a question, he paused as hands flew up saying, “I’m waiting for 100%,” communicating
the assumption that every student should know the answer and should be participating.
gre78720_ch12_358-387.indd Page 379 4/4/09 3:37:45 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
The use of “Not Yet” grading by all KIPP Charlotte teachers also sends a message
of high expectations. On benchmark tests, students receive an “A,” a “B,” or “NY” for
each objective. “Not Yet” implies the student certainly will reach the objective when
the student puts in more work. Providing opportunities to work on “NY” objectives
also conveys the expectation that everyone will reach them.
A recent independent evaluation of a national sample of KIPP schools showed stu-
dents who spent fifth through seventh grades at a KIPP school improved on the Stanford
Achievement Test, a nationally norm-referenced large-scale achievement test, from aver-
age scores at the 34th percentile to average scores at the 58th percentile in reading. In
math they grew from the 44th percentile to the 83rd percentile in those three years.
TA B L E 1 2 . 9
KIPP Charlotte Assessment Strategies That Embody Classroom
Elements to Enhance Mastery Goals
Classroom Element KIPP Charlotte Strategies
Students participate in • Students decide when they are ready to take mastery quizzes.
decision making • Self-assessment opportunities during instruction.
• Classes “co-investigate” benchmark test trends and problem-
solve next steps.
TA B L E 1 2 . 1 0
Student-Generated Persuasive Writing Topics
Should KIPPsters have more days at school?
Should KIPPsters have the opportunity to eat dinner with teachers every Friday?
when students do independent reading, they complete written questions about their
book, such as “Explain how this book connects to your life.” and “To whom would
you recommend this book and why?”
In relation to the second classroom element, student participation in decision mak-
ing, the assessments at KIPP are set up so that students must make crucial decisions about
their learning. They can individually decide how to study for and when to take mastery
quizzes to improve their grades on benchmark tests. After benchmark tests, an entire class
“co-investigates” the resulting data on a class spreadsheet. They discuss trends, major
successes and shortcomings, and areas on which the class should focus. As a class, they
look for root causes of poor performance and problem solve how to address them.
In relation to the third classroom element, personal goals and progress monitor-
ing, students are encouraged in every core subject to focus on their individual goals
and track their own progress toward meeting the state standards. The careful alignment
of the benchmark assessments to the standards allows students to see their strengths
and weaknesses clearly. The “Not Yet” grading system and the use of mastery quizzes
allows them to work toward their personal goals and master the understanding and
skills they personally need.
Development of Critical Thinking Skills The ability to think critically is essential for
citizens of a democracy. The assessment strategies at KIPP Charlotte provide some
examples of fostering critical thinking among students.
Students are challenged to think critically with many types of writing assignments.
In science, at the end of a unit, students write five-paragraph information essays to sum-
marize what they have learned about, for example, meteorology. In English, they con-
struct persuasive essays. This focus on nonfiction writing helps students learn to
formulate ideas with clarity and precision. It also provides teachers with insights about
student strategy use, thinking processes, and what challenges and misconceptions remain.
This information is invaluable for identifying obstacles—vocabulary issues, reasoning
problems, writing fluency—to student learning to be addressed later in instruction.
To encourage analysis skills, teachers design mastery quizzes to cover more than
one skill (e.g., both ordering and estimating in math). In the directions, students are
told to do only the items addressing the skill on which they are working. Thus they
must be able to discriminate among different types of problems, which requires them
to move up a level from solving problems to analyzing characteristics of problems
related to different skills.
Similarly, in English class, Ms. Young often asks students to describe strategies
they found useful in completing a homework assignment. Students must think not
merely about appropriate answers to questions, they must also decide which steps they
took to arrive at them were most effective. Stressing these metacognitive skills helps
students monitor themselves so they become more efficient at learning.
KOFEYDEMOCRA
TO ASSESSMENT IN THE CONTEXT
TIC PARTICIPATION
As we look at what our two schools have in common, we see that formative assess-
ment—checking where students are in relation to the learning goals, providing feed-
back, and then offering opportunities for improvement—is the key. We believe that
formative assessment is the foundation on which skills for democratic participation
can be built. Using formative assessment also addresses all three themes we have woven
throughout this book, about using assessment practices to help students become effec-
tive citizens in a democracy (see Table 12.11).
gre78720_ch12_358-387.indd Page 381 4/4/09 3:37:46 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
TA B L E 1 2 . 1 1
Formative Assessment and Promoting Democratic Values
Formative Assessment Characteristic Democratic Theme Addressed
Center for Inquiry, students engage in frequent writing tasks and other complex assign-
ments as formative assessment. Finally, because formative assessment involves compar-
ing one’s own work to a standard for the purpose of improvement, engaging in self- and
peer-assessment fosters the growth of critical thinking. Activities such as strategy shar-
ing, checks for understanding, and student contributions to rubric development are
examples we have seen at our two schools.
We hope this review has consolidated your understanding about formative
assessment and the many ways formative assessment can be embedded in your
classroom practice. As a guide for applying formative assessment across learning
activities, Table 12.12 provides formative strategies for informal classroom activities,
Table 12.13 offers strategies for adapting quizzes and tests for formative purposes,
and Table 12.14 lists some suggestions for formative assessment related to perfor-
mance products.
TA B L E 1 2 . 1 2
Formative Assessment Strategies for Informal Classroom Activities
Oral questions requiring student justification of their answers.
Oral questions at higher levels of Bloom’s taxonomy (see Box 4.1 and Tables 4.4 and 4.5 in
Chapter 4).
Quick writes to key questions about learning as homework or at the end of class (e.g., exit
or entry tickets).
Instructional tasks that build in opportunities for formative assessment (e.g., grocery store
“stations,” reading conversations).
Use of white boards or hand signals for student answers or reactions during
instruction.
TA B L E 1 2 . 1 3
Strategies for Adapting Quizzes and Tests for Formative Purposes
Ungraded quizzes.
Morning math messages where students work on a problem or challenge alone, then share
questions, connections, appreciations.
Label test questions by content area and allow students to analyze strengths and
weaknesses of their results across the content areas tested.
After feedback on a test or quiz, provide an opportunity for students to close the
gap between where they are and where they need to be using mastery quizzes or
other methods.
gre78720_ch12_358-387.indd Page 383 4/4/09 3:37:46 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
Now It’s Your Turn: Setting Personal Goals for Classroom Assessment 383
TA B L E 1 2 . 1 4
Strategies for Formative Assessment Related to Performance Products
Discuss rubric criteria and their application to sample products.
Authors’ Chair where students share writing and receive feedback from classmates and
teachers.
Critical reflection by student on a work in progress using rubric for the product.
NGOALS
OW IT’S YOUR TURN: SETTING PERSONAL
FOR CLASSROOM ASSESSMENT
We have learned many important lessons about human behavior throughout our
careers in education. One of the most important, as we discussed in Chapter 5, is the
effectiveness of setting goals and monitoring progress for achieving important ends—
including efforts to help us become better teachers. When you write down your goals,
you make them tangible. Explicit goal setting allows you to develop a clear personal
vision of what you want to accomplish based on a strong rationale for why you want
to accomplish it. Monitoring progress toward your goal also provides feedback and
reinforcement that keeps you on track and bolsters your persistence (Locke & Latham,
2002). In terms of usefulness of application to work settings, goal setting is one of the
highest-rated theories of motivation (Latham et al., 1997). We would like you to take
a few moments now to follow a few steps to set a goal to incorporate assessment
strategies that foster preparation for democratic life among your students.
of video clips of a speech as they are preparing it for presentation,” you have something
a lot more concrete (and challenging) to work from.
If you think you lack the understanding and skills to use some of the types of
formative assessment, you are not alone. A gap between understanding and use of
effective strategies often occurs as teachers try new formative practices (Gearhart et al.,
2006). If you think you need more training, exposure, or experience, you can develop
a goal providing for that. For example, your initial goal might be to (1) find two teach-
ers at your school who use students to help design rubrics for self-assessment, and (2)
secure their permission to allow you to observe the process during your planning
period. The key is first choosing a specific goal to accomplish that fits you and your
school environment and then getting it down on paper. This action fosters your com-
mitment to achieve the goal (Locke & Latham, 2002).
3. Develop a Plan
Making a list of obstacles that may impede completion of your goal along with actions to
take to eliminate those threats is the first important step in developing your plan. This step
helps you foresee problems you may run into and gives you an opportunity to anticipate
ways to deal with them and make them manageable. It also increases the likelihood of
success, so your motivation increases. For example, you may expect you will have difficulty
finding time to squeeze out of your busy schedule to spend with students to design that
rubric for your speech class. Pull out your weekly planner and decide right away which
of your usual activities you might shorten or forgo so this activity can be scheduled.
After you have identified and dealt with potential obstacles, you can make a list
of the actions you will take to reach the goal. For example, consultation with other
teachers could be useful, as well as developing documents or figures to support the
process. One of the actions you list should be reviewing progress toward your goal
each week. In fact, you may want to design a mastery monitoring checklist or graph
(Chapter 5) to help you plot your progress toward the goal. Each time you complete
an action, you can then tangibly see your movement toward the goal.
4. Self-Evaluate
Here we find again the importance of teacher reflection. The inquiry stance we have
talked about since Chapter 1 can aid you as you aim to reach new personal goals
associated with designing classroom assessments fostering the understanding and skills
students need to become responsible citizens in a democracy. A formative-assessment
approach to dealing with setbacks realistically acknowledges that reaching your goal
may require some trial and error.
We hope that as you have studied this text, you have learned new skills to help
you achieve your goals as a professional who meets students where they are and helps
all students learn to their potential. Perhaps more important, we hope we have con-
vinced you why that aspiration is so crucial for our future, our children’s future, and
the future of democratic societies around the world.
gre78720_ch12_358-387.indd Page 385 4/4/09 3:37:46 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
11. What evidence for the six essential guidelines for assessment do you see among
practices at KIPP Charlotte and the Center for Inquiry?
12. Design a specific, challenging short-term goal related to incorporating
assessment strategies fostering skills for democratic participation among your
students using the four steps for personal goal setting at the end of the chapter.
HELPFUL WEBSITES
http://www.nsta.org/
The National Science Teachers Association website has a wealth of resources targeted to different
grade levels that can contribute to your assessment notebook, as well as listserv
connections, publications, and recommendations on lessons and materials.
http://www.ncte.org/
The National Council of Teachers of English website has teaching resource collections related to
many aspects of teaching English, including classroom assessments. It also features
materials related to a wide range of topics from grammar to poetry.
REFERENCES
Ames, C. 1992. Classrooms: Goals, structures, and student motivation. Journal of Educational
Psychology 84(3): 261–271.
Andrade, H. 2008. Self-assessment through rubrics. Educational Leadership 65: 60–63.
Bandura, A. 1997. Self-efficacy: The exercise of control. New York: Freeman.
Edmonds, R. 1979. Effective schools for the urban poor. Educational Leadership 37: 23.
Gearhart, M., S. Nagashima, J. Pfotenhauer, S. Clark, C. Schwab, T. Vendlinski, E. Osmundson,
J. Herman, and D. Bernbaum. 2006. Developing expertise with classroom assessment in
K–12 science: Learning to interpret student work. Educational Assessment 11(3 & 4):
237–263.
Haladyna, T. 1994. Developing and validating multiple-choice test items. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Heward, W., T. Heron, R. Gardner, and R. Prayzer. 1991. Two strategies for improving students’
writing skills. In G. Stoner, M. Shinn, and H. Walker (eds.), Interventions for achievement
and behavior problems, pp. 379–398. Silver Spring, MD: National Association of School
Psychologists.
Jennings, L. 2001. Inquiry for professional development and continual school renewal. In H. Mills
and A. Donnelly (eds.), From the ground up: Creating a culture of inquiry, pp. 33–54.
Portsmouth, NH: Heinemann.
Johnson, R., J. Penny, and B. Gordon. 2009. Assessing performance: Developing, scoring, and
validating performance tasks. New York: Guilford Publications.
Latham, G., S. Daghighi, and E. Locke. 1997. Implications of goal-setting theory for faculty
motivation. In J. Bess (ed.), Teaching well and liking it, pp. 125–142. Baltimore: Johns
Hopkins.
Leary, M. R. 1999. Making sense of self-esteem. Current Directions in Psychological Science 8: 32–35.
Locke, E., and G. Latham. 2002. Building a practically useful theory of goal setting and task
motivation: A 35-year odyssey. American Psychologist 57: 705–717.
Marsh, H. W., and R. Craven. 1997. Academic self-concept: Beyond the dustbowl. In G. D. Phye
(ed.), Handbook of classroom assessment: Learning, achievement, and adjustment. San Diego,
CA: Academic Press.
McMillan, J. 2003. Understanding and improving teachers’ classroom assessment decision
making: Implications for theory and practice. Educational Measurement: Issues and
Practice 22(4): 34–37.
O’Keefe, T. 2005. Knowing kids through written conversation. School Talk 11(1): 4.
gre78720_ch12_358-387.indd Page 387 4/4/09 3:37:47 PM user-s172 /Users/user-s172/Desktop/MHSF123-12
References 387
Okyere, B., and T. Heron. 1991. Use of self-correction to improve spelling in regular education
classrooms. In G. Stoner, M. Shinn, and H. Walker (eds.), Interventions for achievement
and behavior problems, pp. 399–413. Silver Spring, MD: National Association of School
Psychologists.
Penso, S. 2002. Pedagogical content knowledge: How do student teachers identify and describe
the causes of their pupils’ learning difficulties? Asia-Pacific Journal of Teacher Education
30: 25–37.
Radar, L. 2005. Goal setting for students and teachers. Six steps to success. Clearing House 78(3):
123–126.
Reeves, D. B. 2004. The 90/90/90 schools: A case study. In D. B. Reeves (ed.), Accountability in
action: A blueprint for learning organizations. 2nd ed. pp. 185–208. Englewood, CO:
Advanced Learning Press.
Saddler, B., and H. Andrade. 2004. The writing rubric. Educational Leadership 62: 48–52.
Sanford, R., M. Ulicsak, K. Facer, and T. Rudd. 2007. Teaching with games. Learning, media and
technology 32(1): 101–105.
Shamlin, M. 2001. Creating curriculum with and for children. In H. Mills and A. Donnelly (eds.),
From the ground up: Creating a culture of inquiry, pp. 55–77. Portsmouth, NH: Heinemann.
Stiggins, R., and N. Conklin. 1992. In teachers’ hands: Investigating the practices of classroom
assessment. Albany: State University of New York Press.
Wiliam, D., C. Lee, C. Harrison, and P. Black. 2004. Teachers developing assessment for learning:
Impact on student achievement. Assessment in Education 11: 49–65.
gre78720_glo_388-391.indd Page 388 4/7/09 8:31:02 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
ﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟ ﱲ
GLOSSARY
A C
achievement gap The disparity in performance between ceiling effect Occurs when a student attains the maximum
student groups (e.g., ethnicity, gender, and/or socioeconomic score or “ceiling” for an assessment and thus prevents
status) on achievement measures such as large-scale tests and appraisal of the full extent of the student’s knowledge.
graduation rates. checklist A list of key elements of a task organized in a
achievement tests Instruments designed to measure how logical sequence allowing confirmation of each element.
much students have learned in various academic content cognitive academic language proficiency (CALP) The
areas, such as science or math. formal language that requires expertise in abstract scholarly
action research The process of examining and improving vocabulary used in academic settings in which language
teaching practices and student outcomes using an inquiry itself, rather than contextual cues, must bear the primary
stance. meaning.
affective domain Responses related to students’ emotional cognitive domain Processes related to thinking and
or internal reaction to subject matter. reasoning.
alignment The adjustment of one element or object in complex multiple-choice A type of selected-response item
relation to others. that consists of a stem followed by choices that are grouped
alternate-choice item Type of question that has only two into more than one set.
options instead of the three to five options of a conventional confidence interval The range of scores that a student
multiple-choice item. would attain if the test were taken many times.
analytic rubric A scoring guide that contains one or more construct The basic idea, theory, or concept in mind
performance criteria for evaluating a task and proficiency whenever measurement is attempted.
levels for each criterion. construct-related evidence All information collected to
aptitude tests Instruments that estimate students’ potential examine the validity of an assessment.
or capacity for learning apart from what they have already constructed-response items Questions in which students
achieved and with less reference to specific academic content. must create the response rather than select one provided.
assessment The variety of methods used to determine what Examples include short-answer and essay items.
students know and are able to do before, during, and after content-based essays A prompt that presents students with
instruction. a question or task that assesses student knowledge in a
assessment bank An organized collection of assessments subject area. Students respond in a paragraph or several
that can be modified and reused with different content and paragraphs.
different groups. content-related evidence Refers to the adequacy of the
sampling of specific content, which contributes to validity.
B content representativeness Ensuring that the assessment
backward design A process of planning instruction and adequately represents the constructs addressed in the
assessment. After learning goals are identified, assessments learning goals.
measuring student achievement of learning goals are contextual invisibility The concept that certain groups,
designed, and only then is instruction developed. customs, or lifestyles are not represented or are under-
basic interpersonal communication skills (BICS) The represented in curriculum and assessment materials.
informal language used to communicate in everyday social conventional multiple-choice items Questions that begin
situations where many contextual cues exist (such as gestures, with a stem and offer three to five answer options.
facial expressions, and objects) to enhance comprehension. criterion-referenced grading A grading process in which
benchmark tests Assessments tied explicitly to academic student work is compared to a specific criterion or standard
standards and administered at regular intervals to determine of mastery.
extent of student mastery. criterion-referenced scoring A scoring process in which
bias An inclination or a preference that interferes with the individual student’s score is compared to predefined
impartiality. standards. Also termed standards-based scoring.
388
gre78720_glo_388-391.indd Page 389 4/21/09 7:54:35 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Glossary 389
criterion-related evidence Information that examines a grade equivalent score A number representing the grade
relationship between one assessment and an established level at which the median students get the same number of
measure for the purposes of discerning validity. questions correct as your student.
curriculum-based measurement One form of general grouped frequency distribution A frequency distribution
outcome measurement in which frequent brief assessments that groups scores in intervals (e.g., all scores in the 90–100%
are used in the basic skill areas of reading, math, spelling, range are plotted together as one point on the distribution).
and written expression to monitor student progress and growth portfolio An organized collection of student work
provide information for teacher decision making. gathered to document student development of skills and
strategies over time.
D H
diagnostic assessment Assessment at the early stages of a
halo effect Effect that occurs when a teacher’s judgments
school year or a unit that provides the teacher with informa-
about one aspect of a student influence the teacher’s
tion about what students already know and are able to do.
judgments about other qualities of that student.
differentiation or differentiated instruction Using
historical distortion Presenting a single interpretation of
students’ current understanding and skills, readiness levels,
an issue, perpetuating oversimplification of complex issues,
and interests to tailor instruction to meet individual needs.
or avoiding controversial topics.
disaggregation of scores Separating the scores of a large
holistic rubric A scoring guide that provides a single score
group of students into smaller, more meaningful groups
representing overall quality across several criteria.
(such as gender, disability status, or socioeconomic status) to
determine whether differences among these groups exist
related to achievement.
I
distracters Incorrect options in multiple-choice questions. individualized education plan (IEP) The program designed
documentation portfolio An organized body of student each year for a student with disabilities by the parents and
work that provides information for evaluating student teachers who work with that student.
achievement status for accountability purposes. inferences Assumptions or conclusions based on observa-
tions and experience.
informative writing A composition in which the author
E describes ideas, conveys messages, or provides instructions.
effort optimism The idea that effort brings rewards. For inquiry stance An approach to dealing with challenges in
example, if you work hard in school, you will learn more. the classroom that involves identifying problems, collecting
environmental sources of error Extraneous factors in the relevant data, making judgments, and then modifying
assessment context that can influence scores on an assessment. practices based on the process to bring about improvement
error The element of imprecision involved in any in teaching and learning.
measurement. internal consistency The degree to which all items in an
essay item A prompt that presents students with a question assessment are related to one another and therefore can be
or task to which the student responds in a paragraph or assumed to measure the same thing.
several paragraphs. interpretive exercise A selected-response item that is
extrapolated Estimated outside the known range. preceded by material that a student must analyze to answer
the question.
interrater reliability The measure of the agreement between
F two raters.
formative assessment Monitoring student progress during item-to-item carryover Effect that occurs when a teacher’s
instruction and learning activities, which includes feedback scoring of an essay is influenced by a student’s performance
and opportunities to improve. on the previous essay items in the same test.
frequency distribution A display of the number of students
who attained each score, ordered from lowest to highest score. J–K
frequency polygon A line graph in which the frequencies
kidwatching An informal assessment technique requiring
for each score are plotted as points, and a line connects
teachers to watch carefully and listen closely in an effort to
those points.
provide optimal experiences for learning.
G L
general outcome measurement A method of monitoring learning goals Learning outcomes, objectives, aims, and
progress that uses several brief, similar tasks (e.g., reading targets.
aloud in grade-level stories for one minute) that can indicate local norm A norm group made up of test takers from a
achievement of a learning goal. restricted area (e.g., school district).
gre78720_glo_388-391.indd Page 390 4/7/09 8:31:03 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/07.04.09/MHSF123:210:GREEN
390 Glossary
M norm group The group chosen by a test designer for the initial
mastery goals Academic goals that focus on a desire to administration of a test to obtain a distribution of typical scores.
understand and accomplish the task and that assume ability Subsequently, test takers’ performances are compared to the
can increase. scores of the norm group to determine their percentile ranks and
mastery monitoring A method of monitoring progress by similar scores when using a norm-referenced scoring system.
tracking student completion of several different tasks that, when norm-referenced grading Assigning grades according to a
all are completed, indicates achievement of the learning goal. distribution (e.g., 10% of the students will get an “A,” 20%
matching format Assessment form in which two parallel will get a “B,” 40% will receive a “C”).
columns of items (termed premises and responses) are listed “Not Yet” grading A grading system in which students
and the student indicates which items from each column receive an “A,” a “B,” or “NY” (not yet) for each learning goal.
belong together in pairs. “NY” implies that all students will reach the goal when they
mean A measure of central tendency that is the average put in more work, conveying high expectations and promoting
of all scores in a distribution. It is calculated by adding all the connection between student effort and achievement.
scores together, then dividing by the number of scores.
median A measure of central tendency that is the exact
O
objectives Measurable and concrete descriptions of desired
midpoint of a set of scores.
skills that are derived from learning goals.
metacognition The process of analyzing and thinking
options The potential answers with one correct response
about one’s own thinking, enabling such skills as monitoring
and several plausible incorrect responses in the multiple-
progress, staying on task, and self-correcting errors.
choice assessment format.
microaggressions Brief, everyday remarks or behaviors
outliers Extremely high or low scores differing from typical
that inadvertently send denigrating messages to the receiver.
scores.
misalignment Occurs when learning goals, instruction,
overinterpretation Incorrect assumptions and conclusions
and/or assessment are not congruent.
drawn when interpreting scores that are not warranted by
mode The score achieved by more students than any other.
the data available.
morning math messages A math challenge each morning
related to the class’s current math investigations. P
multiple means of engagement The variety of possible
percentile or percentile rank A score indicating the percent-
methods used to keep student interest and participation
age of students in the norm group at or below the same score.
strong in a universal design for learning framework.
percentile bands A range of percentile scores that
multiple means of expression The variety of possible
incorporates the standard error of measurement.
methods in which students may respond to instruction or
performance assessments Assessments that require students
assessment tasks in a universal design for learning framework.
to construct a response to a task or prompt or to otherwise
multiple means of representation The variety of possible
demonstrate their achievement of a learning goal.
methods by which instructional or assessment material is pre-
performance criteria The key elements of a performance
sented to students in a universal design for learning framework.
specified in a scoring guide.
multiple true-false items Several choices follow a scenario
performance goals Academic goals that focus on
or question and the student indicates whether each choice is
performing well in front of others and that assume ability is
correct or incorrect.
fixed. Often contrasted with mastery goals.
persuasive writing A composition in which the author
attempts to influence the reader to take action or create change.
N portfolio A purposefully organized collection of student work.
narrative reports Written descriptions of student achievement process The procedure a student uses to complete a
and progress each marking period. performance task.
narrative writing A form of composition in which the product The tangible outcome or end result of a
author relates a story or personal experience. performance task.
negative suggestion effect Concern that appearance in proficiency levels The description of each level of quality
print lends plausibility to erroneous information. of a performance criterion along a continuum in an analytic
No Child Left Behind Act (NCLB) U.S. federal law aimed to or holistic rubric.
improve public schools by increasing accountability standards. prompt The stem (i.e., directions) of an essay.
normal curve or normal distribution The theoretically psychomotor domain Processes related to perceptual and
bell-shaped curve thought to characterize many attributes in motor skills.
nature and many human traits.
normal curve equivalent (NCE) A scale ranging between 1 Q–R
and 99.9 that has equal intervals between all the scores along range The distance between the lowest and highest scores
the continuum. attained on an assessment.
gre78720_glo_388-391.indd Page 391 4/21/09 9:00:46 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Glossary 391
ﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟﱟ ﱲ
INDEX
Note: The italicized f and t following page numbers refer to figures and tables, respectively.
392
gre78720_ind_392-402.indd Page 393 4/21/09 8:37:35 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Index 393
Black, Angela, 270, 271f reliability, 154–157, 155t, 157t scoring short-answer items,
Black, P., 95 reliability improvement, 157–159, 158t 227, 242
Blackburn, Barbara, 336 reliability vs. validity, 168 short-answer formats, 226–227,
Bloom’s cognitive taxonomy, sufficiency of information, 157, 157t, 228f, 232–235, 232t
46–48, 47t 158t, 179 student-generated items, 255
Bloom’s cognitive taxonomy, revised, teacher biases, 171–174, 173t writing prompts for essay
49–50, 49t validity, 159–166, 160t, 161f, 162f, items, 227, 236–242, 238f,
boys. See gender differences 163t, 165t, 166t 240t, 241t
Bryan, Tanis, 106 validity improvement, 167, 167t constructing assessments, 213,
Buchanan, Beverly, 201 cheating, 25, 301–302 215–216, 215t
Bursuck, William, 305 checklists for scoring essays, 243, 243f, construct-related evidence
248, 249–250, 249t format variety, 163–164, 167t
C chemistry, mastery monitoring chart inferences and, 160–161, 161f, 162f
CALP (cognitive academic language for, 123t logic and common sense, 162–163,
proficiency), 85 Clinton, Tracie, 12 163t, 167t
Campbell, Donald, 336 Cochran-Smith, M., 21 content-based essays, 235–237, 235f,
“Campbell’s Law,” 336 cognitive academic language proficiency 236f. See also essay items
capitals, writing in all, 198–199, (CALP), 85 content-related evidence, 164–165,
213, 237 cognitive taxonomies, 45–50, 47t, 49t 165t, 167t
case studies collaborative quizzes, 104 content representativeness, 164–165, 165t
in characteristics of assessments, communication skills content standards, 39–41, 40t
178–180, 179t, 180t learning goals for, 38–39, 38t, 39f context, in performance assessments,
in constructed-response formats, by students learning English, 85 270
256–258, 257f, 257t complex multiple choice format, contextual invisibility, 176, 200–201
in diagnostic assessments, 89–90, 90t 187–188, 187f conventional multiple-choice items,
in formative assessments, 114–116, computer spreadsheets for mastery 186–187
115f, 116t monitoring, 129 Covey, Stephen, 36, 360
in learning goals, 58–61, 58t, 59t, 60t conceptual knowledge, 50, 51t criterion-referenced grading, 306
in performance assessments, 283–286, confidence intervals, 328, criterion-referenced scoring, of
283f, 284f, 285f 329f–331f, 347 standardized tests, 306,
in preassessment, 89–90, 90t confidentiality 324–325, 352. See also
in progress monitoring, 145–149, of mastery monitoring charts, 129 standardized, large-scale tests
146t, 148t of student examples, 97 criterion-related evidence, 165–166,
in reliability and diversity, of student scores, 26, 328 166t, 167t
178–180, 180t construct, definition of, 160 critical thinking
in selected-response assessments, 220 constructed-response formats, application to new contexts, 361
categorization, 191, 191f 225–259 at Center for Inquiry, 375
CBM (Curriculum-Based Measurement), accommodations for diverse cognitive taxonomies and, 45–50,
133–134, 136, 137 learners, 255–256, 256t 47t, 49t
ceiling effect, 87 case study, 256–258, 257f, 257t in essay items, 253–254
Center for Inquiry, 371–375 constructing the assessment, 242 in formative assessments, 96,
aligning goals, instruction, and content-based essay guidelines, 100, 101
assessment, 371–372 235–237, 235f, 236f importance of, 14, 178
equal access to educational essay format, 227–230, 229f, 230f, informal diagnostic assessment of, 74
opportunity, 373–374 235–237 at Knowledge Is Power Program,
key assessment practices, 372t general guidelines, 230–235, 380, 381–382
kidwatching, 371, 372–373 231t, 232t in multiple choice formats, 332
sample schedule, 371t learning goals and thinking skills, 226 in selected-response assessments,
teacher self-assessment, 373 novelty in, 231 216–217
characteristics of assessments pitfalls to avoid, 253–254, 253t in standardized tests, 335, 336, 339
assessment biases, 168–171, 169t scoring-error sources, 251–252, 251t culture. See also school culture,
case study, 178–180, 179t, 180t scoring essays, 242–247, 243f, 244f, students unfamiliar with
democratic values and, 177–178 245f, 246f, 247f competition vs. cooperation and, 88
diversity representation, 174–177, scoring guides, 242, 248–251, 248t, questioning patterns and, 88
175t, 176t 254–255 unintentional bias and, 169
gre78720_ind_392-402.indd Page 394 4/21/09 8:37:36 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
394 Index
curriculum, No Child Left Behind and, before classes start, 70–74, 71t, 73f building in time for self- and peer
334–335, 336 design steps summary, 77f review, 368
Curriculum-Based Measurement high expectations in, 68–70, combining instruction with
(CBM), 133–134, 136, 137 68t, 69f assessment, 366, 366f
importance of, 362 developing “assessment banks,”
D kidwatching, 372–373 369, 369t
data summaries learning goals and, 76–77, 86–87 encouraging student self-
confidence intervals, 328, meeting students, 74–75 monitoring, 368
329f–331f, 347 before new unit of instruction, enlisting students in assessment
disaggregation of scores, 144–145, 75–76, 76t design, 370
144t, 147–148, 148t overview, 15t, 16–17 summary of, 365t
frequency distributions, 136–138, pitfalls of informal methods, 75–76 targeting feedback, 367–368
138t, 139f preassessment analysis, 80–82, 81t effort optimism, 10–11
measures of central tendency, preassessment design and eligibility for special programs, 72
138–139, 344–345, 345f administration, 77–80, 79t, 80t Elkin, Naomi, 366
percentile rank, 325–326, 325f, preassessment KWL charts, 80 English language learners
346–347, 348f, 352 questions addressed by, 76, 76t background information needs, 85t
standard error of measurement, 328, triangulation of results, 70, 71f constructed-response items, 256t
329f–331f utilization of, 82–83, 82t diagnostic assessments, 84t,
tables, 139–144, 140t, 141t, 142t, 143t differentiation, 16 85–86, 85t
decision-making. See also grading directions eligibility for, 73
decision process for essay items, 236–237, 236f formative assessments, 113, 114t
ethical, 27–28, 28t ethics of accurate, 270 learning goals case study,
student participation in, 9t, 10, 375 for performance assessments, 59, 59t
democratic values 272–274, 274f, 280 norm-referenced scoring, 326
assessment practices and, 3–4, 4t scoring and, 254 performance assessments,
at Center for Inquiry, 373–374, 374t for tests, 213, 215 281t, 282
critical thinking. See critical thinking disabilities. See students with disabilities selected-response items, 217–219, 218t
equal access to educational disaggregation of scores, 144–145, 144t, vocabulary specific to disciplines, 86
opportunity. See equal access 147–148, 148t environmental distractions, 155, 155t
to educational opportunity distracters, in multiple-choice items, 186, equal access to educational
at Knowledge is Power Program, 186f, 205–206, 205f, 206f, 213 opportunity. See also
377–380, 377t, 378f, 379t district curriculum guides, 41 achievement gap
mastery goals and, 7–9, 7t diverse learners. See accommodations assessment bias from inequality
self-governing skills. See for diverse learners in, 170
self-governing skills diversity of the classroom. See also at Center for Inquiry, 373–374
Deno, Stan, 126, 129, 130, 131, 133, 134 minorities cultural biases in assessments
design of assessments case study on representing, 180 and, 169
backward design process, 36–37, contextual invisibility, 176, 200–201 as democratic value, 4, 4t
36f, 37t depicted in selected-response items, formative assessments and, 96, 381
construction process, 213, 215–216, 199–203, 201f, 202f, 203f importance of assessments in,
215t historical distortions, 176–177, 5, 177–178
diagnostic assessments, 76–77, 77f, 201–203, 202f, 203f at Knowledge Is Power Program,
89–90, 90t stereotypical representation, 174–176, 378–379
differentiation, 16 175t, 200 error. See also biases, in assessments
preassessment, 77–80, 79t, 80t documentation portfolios, 311, 311t, 315 definition of, 154
selected-response assessments, “do no harm,” 24 due to the occasion of testing,
213, 214f Drews, Leslie, 201 154–155, 155t, 158t
student help in, 370 in essay scores, 251–252, 251t
tables of specifications, 56, 57t E internal consistency, 155–156, 155t
Dewey, John, 19 educational opportunity. See equal access scoring issues, 156–157, 158t, 173t
diagnostic assessments, 67–91 to educational opportunity single scores as estimates, 351–352
accommodations for diverse efficiency strategies, 365–370 sources of, 154–157, 155t, 157t
learners, 83–89, 84t, 85t, 88t analyzing selected student work, standard error of measurement,
case study, 89–90, 90t 366–367 328
gre78720_ind_392-402.indd Page 395 4/21/09 8:37:36 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Index 395
essay items. See also constructed- in diagnostic assessments, 68–70, formative task examples, 99t, 100f
response formats 68t, 69f functions of, 96
accommodations for diverse fostering high, 361 homework for, 106–107, 106t
learners, 255–256, 256t gender differences in, 69–70 importance of, 362–363
advantages and disadvantages, from learning goals, 361 informal classroom activities for,
229–230 outcomes and, 69–70, 69f 382, 382t, 383t
in case study, 256–258, scoring biases and, 173 journals or notebooks, 103, 115–116,
257f, 257t teacher behaviors communicating, 115f, 116t
content-based, 235–237, 68–69, 68t kidwatching, 372
235f, 236f extrapolated grade equivalent overview, 15t, 17, 23, 97t
format of, 227–229, 229f, 230f scores, 349 peer assessments, 104–106
in large-scale, standardized in progress monitoring, 124–125,
tests, 328 F 124t, 126t
performance criteria, 242–244, factual knowledge, 50, 51t purpose and validity, 159–160
244f, 245f, 248–249 families. See also parents quick writes, 104
persuasive essay assignments, assignments related to, 107 scaffolding and, 113
379t, 380 disorganized home life, 170 self-assessment, 96–98, 104, 105t
pitfalls to avoid, 253–254, 253t family homework packets, 22t simple responses in, 98–99
scoring by analytic rubrics, 243, feedback student awareness of learning goals,
244f, 246f, 249t, 250–251 careful targeting of, 367–368 96–98
scoring by holistic rubrics, 244, characteristics of useful, 108–109, student opportunities to make
245f, 247f, 249t, 250–251 108t, 109t improvements, 112
scoring checklists, 243, 243f, 248, 249t in formative assessments, 108–111, students comparing current and
scoring error sources, 251–252, 251t 108t, 109t previous assignments, 103
scoring guides, 242, 248–251, 248t, journal prompts as, 115–116, teacher bias, 173t
249t, 254–255 115f, 116t teacher observations and questions,
when to use, 229 judgment in, 110–111 100–103, 101t, 102t
writing prompts for, 236–242, 236f, on knowledge/understanding, formative tasks, in progress
238f, 240t, 241t 109–111 monitoring, 99t, 100f,
ethics, 23–28 on skills, strategies, and 124–125, 124t, 126t
behavior problems and procedures, 110 format variety, 163–164, 167t
assessment, 107 technology and, 111 frequency distributions, 136–138,
confidentiality, 97, 129, 328 fine motor difficulties 138t, 139f
“do no harm,” 24 constructed-response items frequency polygons, 137–138, 139f
giving low grades as motivation, 293 and, 256t
judgment calls, 26–27, 26t, 27t diagnostic assessments and, G
parents’ participation in grading 84, 84t gender differences
process, 306 formative assessments and, 114t in assessment depiction, 199–200,
receiving accurate directions, 270 performance assessments and, 199f
recommendations for making 281–282, 281t sexist language, 175–176, 176t
ethical decisions, 28, 28t selected-response items and, in stereotypical representation,
score pollution and, 24–25, 154, 337 217, 218t 175, 175t, 200
scoring of major assessments Flesch Reading Ease, 196 in teacher expectations, 69–70
responsibility of teacher, 156 formative assessments, 95–117 test scores and, 143–144
unfair treatment examples, 25 accommodations for diverse general outcome measurement,
evaluation standards learners, 113–114, 114t 129–134
feedback to students on, 109–110 avoiding grading, 111–112, 296 Curriculum-Based Measurement,
student understanding of, 98, 98t case study, 114–116, 115f, 116t 133–134, 136, 137
evaluator role, 292–293, 292t, 293t collaborative quizzes, 104 graphs of, 131–132, 132f
exit exams, 334 content-related evidence from, 165, mastery monitoring vs., 129–131,
exit tickets, 377t 165t, 167t 134–136, 135t
expectations democratic values and, 96, 381 scoring scale for, 131, 131t
achievement gap and, 378 enhancing instruction with, steps in, 130t
avoiding negative information from 112–113 Germundson, Amy, 113–114
previous years, 82 feedback in, 108–111, 108t, 109t gifted programs, eligibility for, 74
gre78720_ind_392-402.indd Page 396 4/21/09 8:37:36 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
396 Index
girls. See gender differences grading decision process, 295–306. home graphs, 12
goals. See also learning goals; See also data summaries homework, 106–107, 106t
mastery goals in borderline cases, 304, 305t humor, avoided in test settings, 204, 209
of No Child Left Behind Act, guidelines for, 295t
323–324 learning goals and, 300–304 I
performance goal orientation, 6, 7t median vs. mean scores, 299–300, In and Out Game, 75–76, 103
personal goals of students, 9t, 299t, 303, 303t Individualized Education Plan (IEP),
10–11, 122 number and type of assessments, 72, 305
personal goals of teachers, 361–362, 297–298, 298t Individuals with Disabilities Education
383–384 school district policy, 296 Act (IDEA), 72, 133
plan development for completing, 384 student involvement, 298–299, 305t inferences, from assessments, 161–162,
progress monitoring, 122–124, students with disabilities, 305 162f, 168
122t, 123t summative assessments, 296–297 informal assessments, pitfalls of, 75–76
steps in setting, 383–384 grading on the curve, 307 informative writing, 239, 240t
grade equivalent scores, 349–350 graphs inquiry stance, 19–20
grades of general outcome measurement, instructional content standards, student
ambiguous meaning of, 294–295 131–132, 132f, 135t familiarity with, 338–339, 338t
definition of, 291 in mastery monitoring, 126–129, interpersonal sensitivity, 105
as diagnostic data, 71 128t, 129t, 135t interpretation of tests. See test
in formative tasks, 111–112 grouped frequency distributions, interpretation
group grades, 303–304 137, 138t interpretive exercises, 191–194. See also
problems with, 291, 292t group projects, 303–304 selected-response assessments
as rewards and punishments, 304 growth portfolios, 310, 311t, advantages and disadvantages of,
symbolic role of, 293–294 313, 314t 192–194, 193f, 194f, 195f
grading, 291–318. See also scoring guided reading, action research overview, 191–192, 192f
advocate vs. evaluator in, 292–293, on, 22t when to use, 192
292t, 293t guidelines for assessments, 359–365 interrater reliability, 156–157
aptitude vs. ability, 302 beginning with the end in mind, interventions, formative assessment
attitude or behavior, 301–302 360–362 during, 23
cheating punishment, 302 finding out what students invented spelling, 19
extra credit, 302 know, 362 item-to-item carryover effects, 251t, 252
impact of zeroes, 303, 303t monitoring progress, 362–363
improvement and, 302 summary of, 360t J
lack of agreement on judgment teaching students to check as you journal prompts, 115–116, 115f,
process, 294 go, 363 116t, 131
late work, 303 using rubrics in attainment of judging role of teachers, 293–294, 293t
participation vs. contribution, 301 learning goals, 363–364
portfolios, 314–315, 315t Gundersheim, Stephen, 103 K
role of effort, 300–301 Guskey, Thomas, 53, 313 “kidwatching,” 371–372
selected student work only, knowledge and understanding, 50
366–367 H Knowledge Is Power Program (KIPP),
as a skill, 306 halo effect, 251t, 252 375–380
grading approaches hand-strengthening exercises, 21, 22t aligning goals, instruction, and
criterion-referenced, 306, handwriting biases, 173, 252 assessment, 376
324–325, 352 Hattie, John, 108 critical thinking skills development,
letter grades, 307 Hertel, Jo Ellen, 122 380, 381–382
narrative reports, 309, 373 higher-order thinking skills. See critical demographics of, 375–376
norm-referenced, 307, 325–326, thinking equal access to educational
325f, 332 high expectations. See expectations opportunity, 378–379
“not-yet,” 309, 377t, 379 high-stakes tests, 342. See also key assessment strategies,
percentage of total points, 308 standardized, large-scale tests 376–377, 377t
portfolios. See portfolios of historical distortions, 176–177, 201–203, self-governing skills, 379–380,
student work 202f, 203f 379t, 381
standards-based, 307–308 holistic rubrics, 244, 245f, 247f, 249t, teacher self-assessments, 377
total points, 308 250–251 typical schedule, 376t
gre78720_ind_392-402.indd Page 397 4/21/09 8:37:37 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Index 397
knowledge types, 50, 51t student input in, 45, 46f matching items. See also selected-
Krathwohl, D., 52t tables of specifications, 53–57, 55t, response assessments
Kuhs, Therese, 309 57t, 178–180, 179t advantages and disadvantages of, 191
KWL charts, 80 from teacher editions, 41 form of, 190–191, 190f
writing, 43–45 guidelines to writing, 211–212, 211t
L learning goals, students who have when to use, 191, 191f
language already mastered mathematics
conventions of written, 198–199, constructed-response items diagnostic assessment of, 74
199f, 213 and, 256t for English language learners, 86f
reading demands in prompts, 272 diagnostic assessments and, 86–87 journal prompts for, 115–116,
simple, in constructed-response formative assessments and, 114t 115f, 116t
items, 231, 232f performance assessments and, McCarter, Lynn, 58, 58t, 60,
stereotypes in, 175–176, 176t 281t, 282 114–116, 131
large-scale standardized testing. See selected-response items and, McTighe, J., 36–37, 36f, 37t, 43, 360
standardized, large-scale tests 218t, 219 mean, 139, 344–345, 345f
learning disabilities. See literacy skills letter grades, 307 measures of central tendency, 138–139,
below typical peers literacy, societal impacts of, 21 344–345, 345f
learning goals, 33–64 Literacy Club, 374 median, 138–139
affective taxonomies, 51–53, 52t literacy skills below typical peers Mensik, Maria
assessment content representativeness constructed-response items essays in assessments, 256–258,
and, 164 and, 256t 357f, 357t
assessment selection and, 265f diagnostic assessments and, 87 learning goals, 58–59, 58t
in backward design, 36–37, formative assessments and, 114t matching items in assessments,
36f, 37t performance assessments and, 220, 220f
benefits of specifying, 37–39 281t, 282 performance assessments, 283–286,
case studies, 58–61, 58t, 59t, 60t, selected-response items and, 114t, 283f, 284f, 285f
178–180, 179t 218t, 219 reliability and diversity, 178–180, 180t
cognitive taxonomies and, 45–50, local norms, in scoring standardized tables of specifications, 178–179, 179t
47t, 49t tests, 325–326, 325f metacognition, 11–12, 96, 375
for communication skills, 38–39, in loco parentis, 24 metacognitive knowledge, 50, 51t
38t, 39f Lytle, S. L., 21 microaggressions, 171–172
in diagnostic assessment design, Mills, Elizabeth, 20–21
76–77, 89–90, 90t M Mîndril™, Diana Luminiτa, 200
from district curriculum guides, 41 Marzano, R. J., 131, 294, 297 minorities. See also diversity of
examples of, 34t mastery goals the classroom
filters for selecting, 41–43, 42t assessment tasks to enhance, achievement gap and, 4–5,
in grading process, 296 9–11, 9t 144–145
high expectations from, 361 Center for Inquiry strategies for, contextual invisibility of, 176
overview, 33–36, 34t, 35t 374–375, 374t depicting diversity, 199–203, 200f,
performance assessments and, individual progress and, 12–13 201f, 202f, 203f
264, 265f Knowledge Is Power Program historical distortions of, 176–177, 203f
performance criteria and, 248, 255 strategies for, 379t stereotypical representation of,
planning activities with the end in performance goals vs., 7, 7t, 13 174–175, 175t
mind, 361 self-assessment and, 11–12 misalignment, 35–36
portfolios of student work and, in standardized test preparation, mistakes, value of, 11
312–313, 315t 339 mode, 138
preassessment for, 89 student characteristics with, 7–9, monitoring. See mastery monitoring;
psychomotor taxonomies, 53, 54t 7t, 121 progress monitoring
as series of objectives, 376 mastery monitoring, 126–129 morning math messages, 373
simplifying, 45 charts and graphs in, 127t, 128–129, motivation
specificity of, 44–45, 44t 128f, 129f giving low grades as, 293
from state and national content comments on, 137 mastery goals and, 374–375
standards, 39–41, 40t general outcome monitoring vs., systematic progress monitoring and,
student awareness of, 37–38, 134–136, 135t 125, 126t, 136
96–98 steps in, 126, 130t in test preparation, 339
gre78720_ind_392-402.indd Page 398 4/21/09 8:37:37 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
398 Index
multiple-choice items. See also selected- national tests. See standardized, large- participation, as grading criterion,
response assessments scale tests 97–98
advantages and disadvantages of, Nazario, Carmen, 269 PDAs (Personal Digital Assistants), 111
188–189 NCE (normal curve equivalents), peer assessments, 104–106, 363, 368
alternate-choice items, 186–187, 187f 348, 348f percentile ranges/bands (confidence
cognitive levels and, 234f NCLB Act. See No Child Left intervals), 328, 329f–331f, 347
complex, 187–188, 187f Behind Act percentile rank (percentile)
conventional, 186–187, 186f negative language, 82, 250 in explaining to students and
distracters that provide insight, negative-suggestion effect, 190 parents, 352
186, 186f, 205–206, 205f, No Child Left Behind (NCLB) Act, interpreting, 346–347, 348f
206f, 213 17–18, 145. See also norm group used in, 325–326, 325f
in large-scale, standardized tests, standardized, large-scale tests performance assessments, 263–286
328, 332 achievement tests, 326–327, 327t accommodations for diverse
multiple true-false format, benefits and pitfalls of large-scale learners, 280–283, 281t
189–190, 189f tests, 333t advantages and disadvantages, 266
options arranged in logical order, comparisons of school districts, attaching a grade scale, 276–278, 277f
206–207, 207f, 208f 333–334 case study, 283–286, 283f,
providing clues to the answer, curriculum effects of, 334–335, 336 284f, 285f
208–209, 209f evaluation of results, 336–337 degree of structure in tasks,
reasonable options in, 206, 206f goals of, 323–324 272, 273f
stems written clearly in, 186, 186f, inappropriateness of tests for essay items. See essay items
204–205, 205f purposes in, 159 learning goals and, 264, 265f,
when to use, 188, 232–233, 233f score augmentation, 336 278–279, 278t
multiple means of engagement, 83–84, standardized tests required in, limiting performance criteria
113, 114t 323–324 in, 279
multiple means of expression, 83 state differences in requirements, 334 logistics in task directions, 272–274,
multiple means of representation, 83 normal curve equivalents (NCE), 274f, 280
multiple true-false format, 189–190, 189f 348, 348f materials and resources
Munk, Dennis, 305 normal distribution (normal curve), availability, 272
MySpace, portfolio use of, 314 344–346, 345f, 346f meaningfulness of tasks in, 268–270,
norm groups, in test interpretation, 268f, 269f, 271f, 279
N 325, 344–346, 345f, 346f periodic reviews in, 273–274
narrative reports, 309, 373 norm-referenced grading, 307 prompt guidelines, 267t
narrative writing, 239, 240t norm-referenced scoring, of providing practice before assigning
National Assessment of Educational standardized tests, 325–326, tasks, 279
Progress (NAEP) 325f, 332 reading demands in, 272
balanced views of historical events, “not-yet” (NY) grading, 309, 377t, 379 response format, 264, 270–271, 271t
201, 202f, 203f noun phrases, in learning goals, 44 scoring guides, 274–276, 275f, 279,
bias avoidance, 203–204, 204f 280, 284f, 285f
content-based essays, 236–237, 236f O specifying skills to be addressed,
content independent of other objectives, from learning goals, 376 266–268, 267f, 268f, 280
items, 196 options, in multiple-choice formats, 186 student-generated items, 279–280
conventions of written language, 199f oral presentations, rubric for grading, 98t weighting performance criteria, 276
distracters providing information outcome measurement. See general when to use, 265–266
about misconceptions, outcome measurement performance criteria, in essay scoring
205–206, 206f outliers, 139 in analytic and holistic rubrics,
diversity depictions, 199f overinterpretation, 352 243–244, 245f, 250–251
essay items, 229f in checklists, 242–243, 243f,
interpretive exercise format, P 249–250
194, 195f parents. See also families learning goals and, 248, 255
keeping answers independent of interpretation of large-scale tests for, proficiency levels and, 243, 244f,
other answers, 196, 197f 350–353, 350t, 351t 246f, 250–251, 254
multiple-choice format, 188 monthly progress monitoring performance criteria, in performance
short-answer items, 228f and, 134 assessments, 275f, 276, 279, 283
national content standards, 39–41 participation in grading process, 306 performance goal orientation, 6, 7t
gre78720_ind_392-402.indd Page 399 4/21/09 8:37:38 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
Index 399
400 Index
Index 401
curriculum impacts from, student self-monitoring, 368 expectations of, 68–70, 68t, 69f, 82,
334–335, 336 students unfamiliar with school culture. 173, 361
definition of, 19 See school culture, students judging role of, 293–294, 293t
as diagnostic data before classes unfamiliar with personal goals, 361–362,
start, 71–72 students who have already mastered 383–384
foundational issues in learning goals. See learning questioning by, 88, 100–103,
interpretation, 344 goals, students who have 101t, 102t
information on individual students already mastered self-assessment by, 364–365,
from, 332–333 students with disabilities. See also 373, 377, 384
interpretation for students and accommodations for diverse “teaching to the test,” 335, 336
parents, 350–353, learners; specific disabilities technical manuals, for large-scale tests,
350t, 351t homework assignments and, 106 342–343, 343t
interpretation of criterion-referenced Individualized Education Plans, technology, feedback using, 111
tests, 350 72, 305 Terra Nova scoring example,
interpretation of norm-referenced responsiveness to intervention 329f–331f, 347
tests, 343–350, 345f, measure, 133–134 testing accommodations. See
346f, 348f stereotypical representation of, accommodations for diverse
misconceptions about, 327–333, 327t 175, 175t learners
multiple choice formats, 328, 332 writing difficulties of, 147, testing modifications, 341
norm group importance in inter- 255–256 test interpretation, 343–350
pretation, 344–346, sufficiency of information, 157, 157t, foundational issues, 344
345f, 346f 158t, 179 grade equivalent scores,
overemphasis on increasing scores summative assessments. See also 349–350
on, 336–337 standardized, large-scale tests measures of central tendency, 139,
overinterpretation, 352 avoiding teacher bias, 173t 344–345, 345f
question formats, 328, 332 characteristics, 15t normal curve equivalents,
reliability, 342–343, 343t classroom vs. large-scale level, 348, 348f
skills useful for, 340, 340t 17–19, 18t norm group importance, 344–346,
standard error of measurement, content-related evidence from, 164, 345f, 346f
328, 329f–331f 165t, 167t percentile rank, 325–326, 325f,
student attitudes towards, essays, 231 346–347, 348f, 352
339–340 kidwatching, 373 proficiency levels, 352
as summative assessments, 17 purpose and validity, 160, 160t scale scores, 349
Terra Nova scoring example, stanines, 348–349, 352
329f–331f, 347 T state-level performance levels, 350
test preparation, 338–340, 338t, 340t tables, data, 139–144, 140t, 141t, test modifications, 341
test presentation changes, 342t 142t, 143t. See also test preparation, 338–340, 338t, 340t
validity, 343, 343t data summaries test-retest reliability, 343, 343t
standardized classroom tests, 328, 332 tables of specifications, 53–57 test-to-test carryover effects,
standards, student familiarity with, benefits, 55–56, 55t 251t, 252
338–339, 338t case study, 178–180, 179t Timperley, Helen, 108
standards-based scoring. See criterion- challenges in using, 56–57 Tomlinson, Carol, 113–114
referenced scoring for content-related evidence, triangulation
stanines, 348–349, 352 164, 167t assessment validity and, 165, 166t,
state content standards, 39–41, 40t developing, 53–55, 54t 167t
stems for selected-response items, 186 in diagnostic assessment, 70, 71f
in multiple-choice formats, 186, in test design, 56, 57t “trick” questions, 293t
204–205, 205f taxonomies, 45–53 true-false items. See also
in short-answer formats, affective and psychomotor, 51–53, selected-response assessments
233–234, 234f 52t, 54t advantages and disadvantages
stereotypical representation, 174–176, cognitive, 45–50, 47t, 49t of, 190
175t, 200 teacher editions, learning goals forms of, 189–190, 189f
student-led conferences, in portfolio from, 41 guidelines for writing, 209–211,
evaluation, 315–316, 316f teachers 210f, 210t
student reflection, 310, 313, 314t biases in, 25, 171–174, 173t when to use, 190
gre78720_ind_392-402.indd Page 402 4/21/09 8:37:39 AM user-s200 /Users/user-s200/Desktop/ANIL KHANNA/08.04.09/MHBR107:colander
402 Index