FNL - Rev - Assessment and Evaluation of Learning
FNL - Rev - Assessment and Evaluation of Learning
December 2013
Addis Ababa
i
Ministry of Education
Module Title: Assessment and Evaluation of
Learning
Prepared by: Mekelle University
Module Writer: Yohannes Geretsadik
Technical Advisor: PRIN International Consultancy &
Research Services PLC
ii
Assessment and Evaluation of Learning
Credit hours: 3
iii
Table of Contents
Module Introduction...................................................................................................................................3
Module Contents.........................................................................................................................................4
Unit 1...........................................................................................................................................................5
UNIT TWO..................................................................................................................................................26
UNIT 3........................................................................................................................................................61
ITEM ANALYSIS..........................................................................................................................................61
UNIT 4........................................................................................................................................................71
INTERPRETATION OF SCORES....................................................................................................................71
UNIT 5........................................................................................................................................................87
iv
Icons Used
Dear Learner: the following icons are used throughout this module. Critically study what each icon
represents before using the module
This tells you there is an introduction to the module, unit and section.
This tells you there is a question to answer or think about in the text.
This tells you that these are the answers to the activities and self-test
questions.
This tells you that there are learning outcomes to the Module or Unit
v
Module Introduction
This module is designed to equip you, as prospective secondary school teachers, with the
conceptual and practical skills of assessing students learning. It incorporates descriptions of
important concepts that help to clarify the assessment process; elaboration of the major
principles and procedures of the assessment process; the different tools and strategies that can be
used for assessing students leaning; different mechanisms that are used to maintain the quality of
assessment tools and procedures; and the ethical standards of assessment. There are two
prerequisite courses that you need to take before learning this module: Secondary School
Curriculum & Instruction; and Psychological Foundations of Learning & Development.
Throughout this module there are different in-text questions which help you to pose your reading
for a moment and reflect on what you are studying. In addition, there are many activities that you
will come across (at least one in each section) and attempt before proceeding from one section to
the next section. Therefore, you need to seriously try to reflect on/answer each question and
activity if you are to have a deep and meaningful understanding of the concepts under discussion
and be successful learners. You will also complete two assignments and submit to your course
tutor that will be graded out of 30%.
Now wishing you a good and successful learning journey, you may start studying your module
right now.
vi
Apply proper procedures when administering assessment tools
Analyze items to increase the fit for purpose of classroom assessment tools.
Interpret assessment results to understand the implications and thereby make appropriate
decisions.
Conduct self-assessment of their teaching in classrooms in view of student learning and
standards of teacher professionalism.
Adhere to professional assessment ethical standards in assessing student learning,
handling records, using or communicating assessment results and making decisions.
vii
Unit 1: Assessment: Concept, Purpose, and Principles
Introduction
Welcome to the first unit of “Assessment and Evaluation” course module. This is an introductory
unit that is intended to familiarize you with some basic concepts that you will encounter while
studying this course. Specifically the concepts test, measurement, assessment and evaluation will
be elaborated. Following this, the purposes of educational assessment are described. Next there is
a brief explanation of the role of educational objectives in assessment. This unit also presents
you with the important principles that have to be adhered when assessing students’ learning.
Finally the importance of involving students in the assessment process is highlighted to be
followed by the most important competencies that professional teachers are expected to possess
so as to effectively assess their students.
Apply the principles of assessment and evaluation of learning in the local context.
1.1 Concepts
Dear learner, before you start studying educational assessment and evaluation, you need to have
a clear understanding about certain related concepts. You might have come across the concepts
test, measurement, assessment, & evaluation.
Reflection:
What do you know about these concepts? Can you differentiate these concepts? Please try.
You might have found it difficult to come up with a clear distinction in meaning between these
concepts. This is because of the fact that they are concepts which may be involved in a single
1
process. There is also some confusion and differences in usage of these concepts as manifested in
the literature. Now let us see the meaning of these concepts as used in this module.
Test: Perhaps test is a concept that you are more familiar with than the other concepts. You have
been taking tests ever since you have started schooling to determine your academic performance.
Tests are also used in work places to select individuals for a certain job vacancy. Thus test in
educational context is meant to the presentation of a standard set of questions to be answered by
students. It is one instrument that is used for collecting information about students’ behaviors or
performances. Please note that there are many other ways of collecting information about
students’ educational performances other than tests, such as observations, assignments, project
works, portfolios, etc.
Measurement: In our day to day life there are different things that we measure. We measure our
height and put it in terms of meters and centimeters. We measure some of our daily
consumptions like sugar in kilograms and liquids in liters. We measure temperature and express
it in terms of degree centigrade or degree Celsius. How do we measure these things? Well
definitely we need to have appropriate instruments such as a meter, a weighing scale, or a
thermometer in order to have reliable measurements.
Similarly, in education measurement is the process by which the attributes of a person are
measured and described in numbers. It is a quantitative description of the behavior or
performance of students. As educators we frequently measure human attributes such as attitudes,
academic achievement, aptitudes, interests, personality and so forth. Measurement permits more
objective description concerning traits and facilitates comparisons. Hence, to measure we have to
use certain instruments so that we can conclude that a certain student is better in a certain subject
than another student. How do we measure performance in mathematics? We use a mathematics
test which is an instrument containing questions and problems to be solved by students. The
number of right responses obtained is an indication of performance of individual students in
mathematics. Thus, the purpose of educational measurement is to represent how much of
‘something’ is possessed by a person using numbers. Note that we are only collecting
information. We are not evaluating! Evaluation is therefore quite different from measurement.
Measurement is not also that same as testing. While a test is an instrument to collect information
2
about students’ behaviors, measurement is the assignment of quantitative value to the results of a
test or other assessment techniques. Measurement can refer to both the score obtained as well as
the process itself.
Assessment: In educational literature the concepts ‘assessment’ and ‘evaluation’ have been used
with some confusion. Some educators have used them interchangeably to mean the same thing.
Others have used them as two different concepts. Even when they are used differently there is
too much overlap in the interpretations of the two concepts.
Cizek (in Phiye, 1997) provides us a comprehensive definition of assessment that incorporates its
key elements:
the planned process of gathering and synthesizing information relevant to the purposes of
(a) discovering and documenting students' strengths and weaknesses, (b) planning and
enhancing instruction, or (c) evaluating progress and making decisions about students.
How do teachers collect the information about their students’ academic progress as well as
about their own teaching? Please list the tools as exhaustibly as possible.
Generally, educational assessment is viewed as the process of collecting information with the
purpose of making decisions about students. We may collect information using various
instruments including tests, observations of students, checklists, questionnaires and interviews.
Rowntree (1974) views assessment as a human encounter in which one person interacts with
another directly or indirectly with the purpose of obtaining and interpreting information about
the knowledge, understanding, abilities and attitudes possessed by that person. The key words in
the definition of assessment is collecting data and making decisions. Hence, to make decisions
one has to evaluate which is the process of making judgment about a given situation.
Evaluation: This concept refers to the process of judging the quality of student learning on the
basis of established performance standards and assigning a value to represent the worthiness or
quality of that learning or performance. It is concerned with determining how well they have
learned. When we evaluate, we are saying that something is good, appropriate, valid, positive,
3
and so forth. Evaluation is based on assessment that provides evidence of student achievement at
strategic times throughout the grade/course, often at the end of a period of learning.
What types of decisions might teachers make based on the information they collect about the
learning and teaching process in general and students learning in particular? Please discuss on
this question with your colleague.
Evaluation includes both quantitative and qualitative descriptions of student behavior plus value
judgment concerning the desirability of that behavior. The following simple mathematical
arrangement shows the relationship between measurement and evaluation.
Thus, evaluation may or may not be based on measurement (or tests) but when it is, it goes
beyond the simple quantitative description of students’ behavior. Evaluation involves judgment.
The quantitative values that we obtain through measurement will not have any meaning until
they are evaluated against some standards. Educators are constantly evaluating students and it is
usually done in comparison with some standard. For example, if the objective of the lesson is for
students to solve quadratic equations and if, having given them a test related to this objective, all
learners are able to solve at least 80% of the problems, then the teacher may conclude that his or
her teaching of the topic was quite successful.
So, we can describe evaluation as the comparison of what is measured against some defined
criteria and to determine whether it has been achieved, whether it is appropriate, whether it is
good, whether it is reasonable, whether it is valid and so forth. Evaluation accurately
summarizes and communicates to parents, other teachers, employers, institutions of further
education, and students themselves what students know and can do with respect to the overall
curriculum expectations.
Now, let’s summarize the differences and relationship between the four concepts. A test is a
particular type of assessment instrument that typically consists of sets of questions administered
during a fixed period of time under reasonably comparable conditions for all students.
4
Measurement is the assigning of numbers to the results of a test or other forms of assessment
according to a specific rule. Assessment is a much more comprehensive and inclusive concept
than testing and measurement. It includes the full range of procedures (observations, rating of
performances, paper and pencil tests, etc) used to gain information about students’ learning. It
may also include quantitative descriptions (measurement) and qualitative descriptions (non-
measurement) of students’ behaviors. Evaluation, on the other hand, consists of making
judgments about the level of students’ achievement for purposes of grading and accountability
and for making decisions about promotion and graduation. To make an evaluation, we need
information, and it is obtained by measuring using a reliable instrument.
1) Have you ever heard or experienced with the process of assessment? What does the word
“assessment” mean to you?
3) Define the terms test, measurement, evaluation using your own terms.
One of the first things to consider when planning for assessment is its purpose. Who will use the
results? How will they use them? As prospective teachers, you also need to have a clear idea as
to what the purposes assessment serves. So let’s discuss on the following question:
Activity 4: Think-Pair-Share
In the previous section we have seen that assessment is the process of collecting information and
making decisions. Why do we need assessment in education? What do you think is the purpose of
assessment? Why do teachers assess their students? Reflect on these questions, write your ideas
on a piece of paper and share these ideas with your colleagues before you proceed on reading
the contents of this section.
Classroom assessment involves students and teachers in the continuous monitoring of students'
learning. It provides the staff with feedback about their effectiveness as teachers, and it gives
students a measure of their progress as learners. Through close observation of students in the
process of learning and the collection of frequent feedback on students' learning, teachers can
learn much about how students learn and, more specifically, how students respond to particular
teaching approaches. Classroom assessment helps individual teachers obtain useful feedback on
what, how much, and how well their students are learning. The staff can then use this
5
information to refocus their teaching to help students make their learning more efficient and
more effective.
Thus, based on the reasons for assessment described above, it can be summarized that
assessment in education focuses on:
With regards to the learner, assessment is aimed at providing information that will help us make
decisions concerning remediation, enrichment, selection, exceptionality, progress and
certification. With regard to teaching, assessment provides information about the attainment of
objectives, the effectiveness of teaching methods and learning materials.
1) Assessment is used to inform and guide teaching and learning: A good classroom
assessment plan gathers evidence of student learning that informs teachers' instructional
decisions. It provides teachers with information about what students know and can do. To
plan effective instruction, teachers also need to know what the student misunderstands
and where the misconceptions lie. In addition to helping teachers formulate the next
teaching steps, a good classroom assessment plan provides a road map for students.
Students should, at all times, have access to the assessment so they can use it to inform
and guide their learning.
2) Assessment is used to help students set learning goals: Students need frequent
opportunities to reflect on where their learning is at and what needs to be done to achieve
their learning goals. When students are actively involved in assessing their own next
learning steps and creating goals to accomplish them, they make major advances in
directing their learning and what they understand about themselves as learners.
3) Assessment is used to assign report card grades: Grade reports provide parents,
employers, schools, and other stakeholders including the government, post-secondary
institutions and employers with summary information about student learning.
6
4) Assessment is used to motivate students: Research has shown that students will be
confident and motivated when they experience progress and achievement, rather than the
failure and defeat associated with being compared to more successful peers.
As you might remember from what you have learned in your “Secondary School curriculum and
Instruction” course, the first step in planning any good teaching is to clearly define the learning
objectives or outcomes. A learning objective is an outcome statement that captures specifically
what knowledge, skills, attitudes learners should be able to exhibit following instruction.
Defining learning objectives is also essential to the assessment of students’ learning. Effective
assessment practice requires relating the assessment procedures as directly as possible to the
learning objectives.
Instructional objectives which are commonly known as learning outcomes play a key role in both
the instructional process and the assessment process. They serve as guides for both teaching and
learning, communicate the intent of instruction to others, and provide guidelines for assessing
students learning.
Instructional objectives or learning outcomes are stated in terms of what the students are
expected to be able to do at the end of the instruction. For instance, after teaching them on how
to solve quadratic equations, we might expect students to have the skill of solving any quadratic
7
equation. A learning outcome stated in this way clearly indicates the kind of performance
students are expected to exhibit as a result of the instruction. This situation also makes clear the
intent of our instruction and sets the stage for assessing students learning. Well stated learning
outcomes make clear the types of students performance we are willing to accept as evidence that
the instruction has been successful.
1. To what extent are the course objectives you learnt directly related to the
assessment types your instructors were using to measure your learning progress?
2. How frequently were your instructors assessing your progress to ensure
whether the objectives were achieved or not?
3. Have you ever thought of the objectives of the course(s) you learn during
the learning process and when you study in preparation for exams?
Assessment principles consist of statements highlighting what are considered as critical elements
of a system designed to assess student progress. These principles are expressed in terms of
elements for a fair (reliable and valid) assessment system. Thus, each principle introduces an
issue that must be addressed when evaluating a student assessment system. Assessment
principles guide the collection of meaningful information that will help inform instructional
decisions, promote student engagement, and improve student learning.
Different educators and school systems have developed somehow different sets of assessment
principles. Miller, Linn and Grunland (2009) have identified the following general principles of
assessment.
8
Perhaps the assessment principles developed by New South West Wales Department of
Education and Training (2008) in Australia are more inclusive than those principles listed by
other educators. Let us look at these principles and compare them with those developed by
Miller, Linn and Grunland as described above.
3. Assessment should be fair. Assessment needs to provide opportunities for every student
to demonstrate what they know, understand and can do. Assessment must be based on a
belief that all learners are on a path of development and that every learner is capable of
making progress. Students bring a diversity of cultural knowledge, experience, language
proficiency and background, and ability to the classroom. They should not be advantaged
or disadvantaged by such differences that are not relevant to the knowledge, skills and
understandings that the assessment is intended to address. Students have the right to
know what is assessed, how it is assessed and the worth of the assessment. Assessment
will be fair or equitable only if it is free from bias or favoritism.
9
as the first person. Assessment will be fair to all students if it is based on reliable,
accurate and defensible measures.
6. Assessment should be integrated into the teaching and learning cycle. Assessment
needs to be an ongoing, integral part of the teaching and learning cycle. It must allow
teachers and students themselves to monitor learning. From the teacher perspective, it
provides the evidence to guide the next steps in teaching and learning. From the student
perspective, it provides the opportunity to reflect on and review progress, and can provide
the motivation and direction for further learning.
10
1.5. Assessment and Some Basic Assumptions
Reflection:
When planning to assess students, what are the assumptions that one held in mind? What are the
things that should be kept in mind when preparing assessment tools for assessing students?
Angelo and Cross (1993) have listed seven basic assumptions of classroom assessment which
are described as follows:
1. The quality of student learning is directly, although not exclusively related to the
quality of teaching. Therefore, one of the most promising ways to improve learning is
to improve teaching. If assessment is to improve the quality of students learning, both
teachers and students must become personally invested and actively involved in the
process.
Reflection: What should be the roles of students and teachers in classroom assessment
so as it will help students’ learning?
2. To improve their effectiveness, teachers need first to make their goals and objectives
explicit and then to get specific, comprehendible feedback on the extent to which they
are achieving those goals and objectives. Effective assessment begins with clear goals.
Before teachers can assess how well their students are learning, they must identify and
clarify what they are trying to teach. After teachers have identified specific teaching goals
they wish to assess, they can better determine what kind of feedback to collect.
Reflection: How do you think feedback and self-assessment will help to improve
students’ learning?
4. The type of assessment most likely to improve teaching and learning is that
conducted by teachers to answer questions they themselves have formulated in
response to issues or problems in their own teaching. To best understand their students’
learning, teachers need specific and timely information about the particular individuals in
11
their classes. As a result of the different students’ needs, there is often a gap between
assessment and student learning. One goal of classroom assessment is to reduce this gap.
Reflection: How does classroom assessment help to reduce this gap between
assessment and student learning?
6. Classroom assessment does not require specialized training; it can be carried out by
dedicated teachers from all disciplines. To succeed in classroom assessment, teachers
need only a detailed knowledge of the discipline, dedication to teaching, and the
motivation to improve.
Reflection: Can you explain how teachers’ collaboration with colleagues can be more
effective in enhancing learning and personal satisfaction than working alone?
During teaching, you will be assessing students’ learning continuously. You will be interpreting
what the students say and do in order to make judgments about their achievements. The ability to
analyze the students’ learning is vital if you are to make appropriate teaching points which help
the students develop their knowledge and/or competence. You will be using your subject
knowledge to help you identify what to look for and where to take the student next. You will
need to listen, observe and question in ways which will enable you to give appropriate feedback
or further instruction.
There is considerable evidence that assessment is a powerful process for enhancing learning.
Black and Wiliam (1998) synthesized over 250 studies linking assessment and learning. From
12
this they came up with the finding that the intentional use of assessment in the classroom to
promote learning resulted in improved student achievement. Classroom assessment promotes
learning when teachers use it in the following ways:
When they use it to become aware of the knowledge, skills, and beliefs that their
students bring to a learning task, and;
When they use this knowledge as a starting point for new instruction, and monitor
students’ changing perceptions as instruction proceeds.
As prospective teachers, how do you think you will use the information you collect
through different methods of assessment to improve the teaching and learning process?
When learning is the goal, teachers and students collaborate and use ongoing assessment and
pertinent feedback to move learning forward. When classroom assessment is frequent and varied,
teachers can learn a great deal about their students. They can gain an understanding of students’
existing beliefs and knowledge, and can identify incomplete understandings, false beliefs, and
immature interpretations of concepts that may influence or distort learning. Teachers can observe
and probe students’ thinking over time, and can identify links between prior knowledge and new
learning.
Learning is also enhanced when students are encouraged to think about their own learning, to
review their experiences of learning and to apply what they have learned to their future learning.
Assessment provides the feedback loop for this process. When students (and teachers) become
comfortable with a continuous cycle of feedback and adjustment, students begin to internalize
the process of standing outside their own learning and considering it against a range of criteria,
not just the teacher’s judgment about quality or accuracy. When students engage in this ongoing
metacognitive experience, they are able to monitor their learning along the way, make
corrections, and develop a habit of mind for continually reviewing and challenging what they
know.
13
Assessment also enhances students’ learning by increasing their motivation. Motivation is
essential for students’ engagement in their learning. The higher the motivation, the more time
and energy a student is willing to devote to any given task. Even when a student finds the content
interesting and the activity enjoyable, learning requires sustained concentration and effort.
Reflection: How do you think assessment will help to increase students’ motivation?
According to current cognitive research, people are motivated to learn by success and
competence. When students feel ownership and have choice in their learning, they are more
likely to invest time and energy in it. Assessment can be a motivator, not through reward and
punishment, but by stimulating students’ intrinsic interest. Assessment can enhance student
motivation by:
• reinforcing the idea that students have control over, and responsibility for, their own
learning
When students learn, they make meaning for themselves, and they approach learning tasks in
different ways. They bring with them their own understanding, skills, beliefs, hopes, desires, and
intentions. It is important to consider each individual student’s learning, rather than talk about
14
the learning of “the class.” Assessment practices lead to differentiated learning when teachers
use them to gather evidence to support every student’s learning, every day in every class. The
learning needs of some students may require individualized learning plans.
There is strong evidence that involving students in the assessment process can have very definite
educational benefits. Now stop reading for a moment and reflect on the following questions.
1) As prospective teachers how do you think you can involve your students in the assessment
process?
2) In what ways can students benefit if they are involved in the assessment process?
One way in which we can involve our students in the assessment process is to establish the
standards or assessment criteria with them. This will help students understand what is to be
assessed. Working with students to develop assessment tools is a powerful way to help students
build an understanding of what a good product or performance looks like. It helps students
develop a clear picture of where they are going, where they are now and how they can close the
gap. This does not mean that each student creates his or her own assessment criteria. You, as a
teacher, have a strong role to play in guiding students to identify the criteria and features of
understandings you want your students to develop.
Another important aspect is to involve students in trying to apply the assessment criteria for
themselves. The evidence is that through trying to apply criteria, or mark using a model answer,
students gain much greater insight into what is actually being required and subsequently their
own work improves in the light of this.
An additional benefit is that it may enable the students to be provided with more learning
activities on which they will receive feedback which otherwise would not be provided because of
lack of time by the teacher.
There are different ways in which students can be involved in such type of assessment – self-
assessment and peer assessment. Self-assessment involves students judging their own work. It
begins with students understanding the learning intentions or objectives for the particular lesson
and the success criteria for the specific task or activity. It develops into students’ awareness of
15
their own strengths and weaknesses in a particular subject (and as a learner in general) and the
ability to identify their own ‘next steps’ or targets. Self-assessment allows students to think more
carefully about what they do and do not know, and what they additionally need to know to
accomplish certain tasks.
Peer assessment, by contrast, involves students making judgment about other students’ work.
Students learn how to make better sense of assessment criteria if they have to give feedback
and/or marks against them. Giving and receiving feedback is an important aspect of student
learning and will be valuable skills for them in professional contexts and for future learning.
Assessment requires so much of a teachers professional time, both inside and outside the
classroom. Therefore, a teacher should have some basic competencies on classroom assessment
so as to be able to effectively assess his/her students learning.
As prospective teachers, what competencies do you think you should have in the area of
assessment? Write down your ideas and compare it with the work of another colleague.
A teacher's professional role and responsibilities for student assessment can be conceptualized as
falling along a time continuum. Assessment activities occur prior to instruction, during
instruction, and after instruction. Assessment prior to instruction provides a teacher with
information about individual differences among students as well as an understanding of the
background or prior knowledge of the class as a whole. These assessment activities provide the
basis for planning instruction.
Assessment during instruction provides information about the overall progress of the whole class
as well as specific information about individual students. These assessment activities provide the
basis for monitoring progress during learning.
Following the teaching of a specific unit, semester, academic year, or the like, decisions must be
made about the achievement of short and long-term instructional goals. This is assessment after
instruction.
16
In addition to these activities, communication skills are needed to interpret and report
performance standards or levels of achievement to students and parents.
In the American education system a list of seven standards for teacher competence in educational
assessment of students has been developed. These standards for teacher competence in student
assessment have been developed with the view that student assessment is an essential part of
teaching and that effective teaching cannot exist without appropriate student assessment. The
seven standards articulating teacher competence in the educational assessment of students are
described below.
4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school improvement.
5. Teachers should be skilled in developing valid student grading procedures that use pupil
assessments. Grading students is an important part of professional practice for teachers.
6. Teachers should be skilled in communicating assessment results to students, parents, other lay
audiences, and other educators. Furthermore, teachers will sometimes be in a position that will
require them to defend their own assessment procedures and their interpretations of them. At
other times, teachers may need to help the public to interpret assessment results appropriately.
17
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment information. Teachers must be well-versed in their
own ethical and legal responsibilities in assessment. In addition, they should also attempt to have
the inappropriate assessment practices of others discontinued whenever they are encountered.
In our country, Ethiopia, also the MoE has developed such assessment related competences
which professional teachers are expected to possess.
In small groups:
1) Find Ethiopia’ standards of teacher competence in assessment from your nearby educational
organization (woreda, zone or regional education office) and compare competencies with that
of American standards and report the similarities and differences.
2) Discuss and report on the importance and use of having standards of teacher competence in
assessment for a particular school and the whole education system in general.
Unit Summary
Test, measurement, assessment and evaluation are concepts that are frequently used in the
area of educational assessment and evaluation, often with varying meanings and some
confusion. However, although they overlap, they vary in scope and have different
meanings.
Assessment serves many important purposes including: informing and guiding teaching
and learning; helping students set learning goals; assigning report card grades; motivating
students.
Assessment should be designed in such a way that it will elicit information about students
progression towards the educational objectives.
There are some important principles that professional teachers should be aware of that
guide the assessment process of students’ learning.
Any assessment process is based on certain basic assumptions.
Assessment is an integral process of the teaching and learning process and is an important
tool for enhancing learning.
18
In order to maximize the benefits students can get out of assessment, they should be
involved in the assessment process.
There are certain assessment competencies that teachers need to possess so as to
effectively carry out their professional responsibilities.
Self-Check Exercises
In order to check your understanding of what you have learned in this unit, answer the following
questions and compare your answers with what is discussed in the material. If you couldn’t
answer any one of these questions adequately you need to go back and study the material once
again where there is a problem of understanding.
References
Angelo, T.A. & Cross, K.P (). Classroom Assessment Techniques; A Handbook for College
Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.
Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education
Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. Third edition. Learning
Matters Ltd
McDonald E. S. & Hershman D. M. (2010). Classrooms that Spark! Recharge and Revive Your
Teaching. 2nd Ed. San Francisco: Jossey-Bass Publishers.
19
Mehrens, W.A. & Lehman, I.J () Measurement and Evaluation in Education. 4th Ed. New York:
Harcourt Brace College Publishers.
Miller, D.M, Linn, RL. & Grunland, NE. Measurement and Assessment in Teaching. 10th ed.
Upper Saddle River:Pearson Education, Inc.
Through Assessment, Innovation, and Evaluation. American Academy of Arts and Sciences
Western and Northern Canadian Protocol for Collaboration in Education. (2006). Rethinking
Classroom Assessment with Purpose in Mind: Assessment for Learning, Assessment as
Learning, and Assessment of Learning
20
Unit Two: Assessment Strategies, Methods, and Tools
2.1 Introduction
In the previous unit you have been introduced with the major concepts of educational assessment
and evaluation. You also learned about the purposes and principles of assessment. In this unit
you will learn about various assessment strategies that can be used in the context of secondary
education. You will also learn about planning, construction and administration of classroom
tests.
Compare assessment and evaluation methods and tools to select appropriate ones.
Assessment procedures can be classified according to their functional role during classroom
instruction. One such classification system follows the sequence in which assessment procedures
are likely to be used in the classroom. The most commonly referred to and used categories in this
regard are formative assessment and summative assessment. Can you differentiate these
concepts? Please try to describe them before you proceed studying the following section.
21
a) Formative Assessment: Formative assessments are used to shape and guide classroom
instruction. They can include both informal and formal assessments (which will be discussed
later in this section) and help us to gain a clearer picture of where our students are and what
they still need help with. They can be given before, during, and even after instruction, as long
as the goal is to improve instruction.
Formative assessment is also known by the name ‘assessment for learning’. The basic idea of
this concept is that the basic purpose of assessment should be to enhance students learning.
There is still another name which is associated with the concept of formative assessment,
‘continuous assessment’. Continuous assessment (as opposed to terminal assessment) is
based on the premise that if assessment is to help students’ improvement in their learning and
if a teacher is to determine the progress of students towards the achievement of the learning
goals, it has to be conducted on a continuous basis. Thus, continuous assessment is a teaching
approach as well as a process of deciding to what extent the educational objectives are
actually being realized during instruction. In schools, continuous assessment of learning is
usually carried out by teachers on the basis of impressions gained as they observe their
students at work or by various kinds of tests giving periodically. Therefore, each decision is
based on various types of information that are determined through different assessment
methods at different time by teachers.
22
In order to assess your students' understanding, there are various strategies that you can use.
Can you mention some of the strategies that you can use to assess your students for formative
purposes? Please, try to mention as many strategies as you can.
The following are some of the strategies of assessment you can employ in your classrooms:
o You can make your students write their understanding of vocabulary or concepts
before and after instruction.
o You can ask students to summarize the main ideas they've taken away from your
presentation, discussion, or assigned reading.
o You can make students complete a few problems or questions at the end of
instruction and check answers.
o You can interview students individually or in groups about their thinking as they
solve problems.
o You can assign brief, in-class writing assignments (e.g., "Why is this person or
event representative of this time period in history?)
Tests and homework can also be used formatively if teachers analyze where students are in
their learning and provide specific, focused feedback regarding performance and ways to
improve it.
The techniques used in summative assessment are determined by the instructional goals.
Typically, however, they include teacher made achievement tests, ratings of various types of
performance, and assessment of products (reports, drawings, etc.).
23
A particular assessment task can be both formative and summative. For example, students
could complete unit 1 of their Module and complete an assessment task for which they
earned a mark that counted towards their final grade. In this sense, the task is summative.
They could also receive extensive feedback on their work. Such feedback would guide
learners to achieve higher levels of performance in subsequent tasks. In this sense, the task
is formative – because it helps students form different approaches and strategies to improve
their performance in the future.
Assessment can also be either formal or informal. Let us try to understand their differences
from the following paragraphs.
Formal Assessment: This usually implies a written document, such as a test, quiz, or paper.
A formal assessment is given a numerical score or grade based on student performance. We
will deal more on formal assessment strategies, particularly on tests in a letter section.
Informal Assessment: "Informal" is used here to indicate techniques that can easily be
incorporated into classroom routines and learning activities. Informal assessment techniques
can be used at anytime without interfering with instructional time. Their results are
indicative of the student's performance on the skill or subject of interest. Can you think of
the informal assessment strategies that you can use in your classes? What informal
assessment strategies have your teachers used when you were a student?
An informal assessment usually occurs in a more casual manner and may include
observation, inventories, checklists, rating scales, rubrics, performance and portfolio
assessments, participation, peer and self evaluation, and discussion. Formal tests assume a
single set of expectations for all students and come with prescribed criteria for scoring and
interpretation. Informal assessment, on the other hand, requires a clear understanding of the
levels of ability the students bring with them. Only then may assessment activities be
selected that students can attempt reasonably. Informal assessment seeks to identify the
strengths and needs of individual students without regard to grade or age norms.
24
Methods for informal assessment can be divided into two main types: unstructured (e.g.,
student work samples, journals) and structured (e.g., checklists, observations). The
unstructured methods frequently are somewhat more difficult to score and evaluate, but they
can provide a great deal of valuable information about the skills of the students. Structured
methods can be reliable and valid techniques when time is spent creating the "scoring"
procedures. Another important aspect of informal assessments is that they actively involve
the students in the evaluation process - they are not just paper-and-pencil tests.
How the results of tests and other assessment procedures are interpreted also provides a
method of classifying these instruments. There are two ways of interpreting student
performance – criterion-referenced and norm-referenced.
25
remaining 20% of students have scored above that particular student. Students’ assignment
of ranks is also another example of norm-referenced interpretation of students’
performances.
When selecting assessment strategies in our subject areas, there are a number of things that we
have to consider. First and foremost it is important that we choose the assessment technique
appropriate for the particular behavior being assessed. We have to use a strategy that can give
students an opportunity to demonstrate the kind of behavior that the learning outcome demands.
Assessment strategies should also be related to the course material and relevant to students’
lives. Therefore, we have to provide assessment strategies that relate to students’ future work.
There are many different ways to categorize learning goals for students. Categorizing helps us to
thoroughly think through what we want students to know and be able to do. One way in which
the different learning outcomes that we want out students to develop can be categorized is
presented as follows:
Knowledge and understanding: What facts do students know outright? What
information can they retrieve? What do they understand?
Reasoning proficiency: Can students analyze, categorize, and sort into component
parts? Can they generalize and synthesize what they have learned? Can they evaluate
and justify the worth of a process or decision?
26
Skills: We have certain skills that we want students to master such as reading fluently,
working productively in a group, making an oral presentation, speaking a foreign
language, or designing an experiment.
Ability to create products: Another kind of learning target is student-created
products - tangible evidence that the student has mastered knowledge, reasoning, and
specific production skills. Examples include a research paper, a piece of furniture, or
artwork.
Dispositions: We also frequently care about student attitudes and habits of mind,
including attitudes toward school, persistence, responsibility, flexibility, and desire to
learn.
Activity: In groups discuss and identify the assessment strategies that you consider are best for
assessing each of these categories of learning goals and compare your work with that of other
groups.
From among the various assessment strategies that can be used by classroom teachers, some are
described below for your consideration as student teachers.
Conferences: A conference is a formal or informal meeting between the teacher and a student
for the purpose of exchanging information or sharing ideas. A conference might be held to
explore the student’s thinking and suggest next steps; assess the student’s level of understanding
of a particular concept or procedure; and review, clarify, and extend what the student has already
completed. What advantages do you think conference as a method of assessment will have?
27
individual achievement of specific skills and knowledge. What type of objectives do you think
this assessment strategy could serve to measure?
Interviews: You should be familiar with the interviews journalists conduct with different
personalities. An interview can also be used for assessment purposes in educational settings. In
such applications interview is a face-to-face conversation in which teacher and student use
inquiry to share their knowledge and understanding of a topic or problem. This form of
assessment can be used by the teacher to:
explore the student’s thinking;
assess the student’s level of understanding of a concept or procedure; and
gather information, obtain clarification, determine positions, and probe for motivations.
Observation: Observation is a process of systematically viewing and recording students while
they work, for the purpose of making instruction decisions. Observation can take place at any
time and in any setting. It provides information on students' strengths and weaknesses, learning
styles, interests, and attitudes. Observations may be informal or highly structured, and incidental
or scheduled over different periods of time in different learning contexts.
Performance tasks: During a performance task, students create, produce, perform, or present
works on "real world" issues. The performance task may be used to assess a skill or proficiency,
and provides useful information on the process as well as the product. Please mention some
examples of performance tasks that students can do in your subject area.
Portfolios: A portfolio is a collection of samples of a student’s work over time. It offers a visual
demonstration of a student’s achievement, capabilities, strengths, weaknesses, knowledge, and
specific skills, over time and in a variety of contexts. For a portfolio to serve as an effective
assessment instrument, it has to be focused, selective, reflective, and collaborative. Portfolios can
be prepared for different subjects in any educational level. What type of materials can be
included in the portfolio of students in relation to your subject?
Attention: At this point of your study, you will be required to start filing
samples of your work (those that are indicated) as part of your portfolio to
serve as evidence of your performance on this course.
28
Questions and answers: Perhaps, this is a widely used strategy by teachers with the intention of
involving their students in the learning and teaching process. In this strategy, the teacher poses a
question and the student answers verbally, rather than in writing. This strategy helps the teacher
to determine whether students understand what is being, or has been, presented; it also helps
students to extend their thinking, generate ideas, or solve problems. Strategies for effective
question and answer assessment include:
Apply a wait time or 'no hands-up rule' to provide students with time to think after a
question before they are called upon randomly to respond.
Ask a variety of questions, including open-ended questions and those that require more
than a right or wrong answer.
During what time of the lesson do you think question and answer strategy will be more useful?
Why?
Checklists, Rating Scales and Rubrics: These are tools that state specific criteria and allow
teachers and students to gather information and to make judgments about what students know
and can do in relation to the outcomes. They offer systematic ways of collecting data about
specific behaviors, knowledge and skills.
Checklists usually offer a yes/no format in relation to student demonstration of specific criteria.
They may be used to record observations of an individual, a group or a whole class.
Rating Scales allow teachers to indicate the degree or frequency of the behaviors, skills and
strategies displayed by the learner. Rating scales state the criteria and provide three or four
response selections to describe the quality or frequency of student work.
Rubrics use a set of criteria to evaluate a student's performance. They consist of a fixed
measurement scale and detailed description of the characteristics for each level of performance.
These descriptions focus on the quality of the product or performance and not the quantity.
29
Rubrics use a set of specific criteria to evaluate student performance. They may be used to assess
individuals or groups and, as with rating scales, may be compared over time.
One- Minute paper: During the last few minutes of the class period, you may ask students to
answer on a half-sheet of paper: "What is the most important point you learned today?" and,
"What point remains least clear to you?" The purpose is to obtain data about students'
comprehension of a particular class session. Then you can review responses and note any useful
comments. During the next class periods you can emphasize the issues illuminated by your
students' comments.
Muddiest Point: This is similar to ‘One-Minute Paper’ but only asks students to describe what
they didn't understand and what they think might help. It is an important technique that will help
you to determine which key points of the lesson were missed by the students. Here also you have
to review before next class meeting and use to clarify, correct, or elaborate.
Student- generated test questions: You may allow students to write test questions and model
answers for specified topics, in a format consistent with course exams. This will give students
the opportunity to evaluate the course topics, reflect on what they understand, and what good test
items are. You may evaluate the questions and use the goods ones as prompts for discussion.
30
Tests: This is the type of assessment that you are mostly familiar with. A test requires students
to respond to prompts in order to demonstrate their knowledge (orally or in writing) or their
skills (e.g., through performance). We will learn much more about tests later in this section.
Activity: Let’s say you need to assess student achievement on each of the following learning
targets. Which assessment strategy would you choose? Please jot down your answers with their
justifications and file it in your portfolio for later reference.
1. Ability to write clearly and coherently
2. Group discussion proficiency
3. Reading comprehension
4. Proficiency using specified mathematical procedures
5. Proficiency conducting investigations in science
The existing educational literature has identified various assessment issues associated with large
classes. They include:
a) Surface Learning Approach: Traditionally, teachers rely on time-efficient and exam-
based assessment methods for assessing large classes, such as multiple choices and short
answer question examinations. These assessments often only assess learning at the lower
levels of intellectual complexity. Furthermore, students tend to adopt a surface rote
learning approach when preparing for these kinds of assessment methods. Higher level
learning such as critical thinking and analysis are often not fully assessed.
b) Feedback is often inadequate: Feedback plays an important role in the learning process of
students. Particularly, if students can receive feedback at an early stage of their learning
process, this will help them identify their own problems and improve their learning.
However, with a large class, teachers may not have time to give detailed and constructive
31
feedback to every student. Most teachers usually can only afford to give general feedback to
their students on written assignments and tests.
c) Inconsistency in marking: Large class usually consists of a diverse and complex group of
students. The issues of different perception towards assessments, cultural and educational
background, prior knowledge and level of interest to the subject all pose challenges to the
fairness of marking and grading. Teachers have to take all these into account in order to
ensure the consistency and fairness in marking and grading.
e) Lack of interaction and engagement: Students are often not motivated to engage in a
large-sized lecture. When teachers raise questions in large classes, not many students are
willing to respond. Students are less likely to interact with teachers because they feel less
motivated and tend to hide themselves in a large group. In fact, interacting with students in
class is important for teachers because they can receive immediate feedback from students
regarding their quality of teaching.
Although these issues can be problems in assessment for any class size, they are worse in large
classes because of the additional limitation and strain on resources. They are problems that are
applicable whether the function of the assessment is to facilitate learning via feedback, or to
classify students via grading.
There are a number of ways to make the assessment of large numbers of students more effective
whilst still supporting effective student learning. These include:
1. Front ending: The basic idea of this strategy is that by putting in an increased effort at the
beginning in setting up the students for the work they are going to do, the work submitted can
32
be improved. Therefore the time needed to mark it is reduced (as well as time being saved in
less requests for tutorial guidance).
2. Making use of in-class assignments: In-class assignments are usually quick and therefore
relatively easy to mark and provide feedback on, but help you to identify gaps in
understanding. Students could be asked to complete a task within the timeframe of a
scheduled lecture, field exercise or practical class. This might be a very quick task, for
example, completing a graph, doing some calculations, answering some quick questions,
making brief notes on a piece of text etc. In some cases it might be possible to merge the in-
class assignment with peer assessment.
3. Self-and peer-assessment: Students can perform a variety of assessment tasks in ways,
which both save the tutor’s time and bring educational benefits, especially the development
of their own judgment skills. These include self assessment and peer assessment strategies.
i. Self-assessment reduces the marking load because it ensures a higher quality of work is
submitted, thereby minimizing the amount of time expended on marking and feedback.
The emphasis on student self- assessment represents a fundamental shift in the teacher-
student relationship, placing the primary responsibility for learning with the student.
However, there are problems involved in self-assessment for grading purposes pertaining
to their validity and reliability. If self-assessment is utilized for the purposes of grading, it
is imperative to employ peer or staff cross-marking to ensure the validity of the results.
Self-assessment should also be confined to certain limited objectives such as ascertaining
whether all of the required components of an answer are present, or the articulation of
very transparent assessment criteria and standards, possibly accompanied by examples of
work of varying standards. In this regard, self-assessment can decrease the marking load
of teachers and provide students with a positive learning experience by compelling them
to examine their work from the perspective of a marker as well as a participant.
33
undertake the marking of those assignments in class. For example, you can ask students
to exchange works with one another or collect in all of the named works and randomly
assign student markers to them.
However, as with any form of peer-assessment it needs to be carefully designed. Students need
to know what to do and there needs to be a transparent system by which students can appeal their
marks (especially if used in a summative rather than formative context). The benefits of this
approach are that:
• students can get to see how their peers have tackled a particular piece of work,
• they can see how you would assess the work (e.g. from the model answers/answer
sheets you've provided) and;
• they are put in the position of being an assessor, thereby giving them an
opportunity to internalize the assessment criteria.
5. Changing the assessment method, or at least shortening it : Being faced with large
numbers of students will present challenges but may also provide opportunities to
either modify existing assessments or to explore new methods of assessment. You might, for
example, be able to reduce the length of the assessment task you are currently using without
detracting from your module's learning outcomes. Alternatively a large class may provide a
new opportunity to make use of peer and self-assessment.
34
Assignment: Visit any one of the schools in your vicinity and interview at least three teachers in
your subject area using questions you have prepared for the purpose. The questions should be
related to 1) the problems they have faced in assessing students of large classes; and 2) the
strategies they have used to tackle the problems. Based on the information you have collected
prepare a report of 1-2 pages. You have to file the report as part of your portfolio.
Activity: List all of the forms of assessment that you have experienced during your school years.
Are there other approaches to assessment with which you are familiar even if you haven't
personally experienced them as a student?
A wide variety of tools are available for assessing student performance and there are approaches
that are suitable for essentially any educational objective you want to test. Examples include
objective exams, short answer and essay exams, portfolios, projects, practical exams,
presentations, and combinations of these. Appropriate tools or combinations of tools must be
selected and used if the assessment process is to successfully provide information relevant to
stated educational outcomes.
Constructing Tests
There are a wide variety of styles & formats for writing test items. Miller, Linn, & Gronlund
(2009) make distinctions between classroom tests that consist of objective test items and
performance assessments that require students to construct responses (e.g. write an essay) or
35
perform a particular task (e.g., measure air pressure). Objective tests are highly structured and
require the test taker to select the correct answer from several alternatives or to supply a word or
short phrase to answer a question or complete a statement. They are called objective because
they have a single right or best answer that can be determined in advance. Performance
assessment tasks permit the student to organize and construct the answer in essay form. Other
types of performance assessment tasks may require the student to use equipment, generate
hypothesis, make observations, construct something or perform for an audience. For most
performance assessment tasks, there is not a single best or right response. Expert judgment is
required to score the performances.
Each type of test has its unique characteristics, uses, advantages, limitations, and rules for
construction.
Activity: 1) As students you have taken tests with different types of formats – Multiple choice
test items, True/False test items, short answer test items, etc. Which of this test
items did you feel more comfortable with? What are your reasons? Write down your
answers and compare it with that of your friends.
2) In groups discuss the advantages and limitations of the different types of test items.
Present the results of your discussion to the whole class.
The chief advantage of true/false items is that they do not require the student much time for
answering. This allows a teacher to cover a wide range of content by using a large number of
such items. In addition, true/false test items can be scored quickly, reliably, and objectively by
36
any body using an answer key. If carefully constructed, true/false test items have also the
advantage of measuring higher mental processes of understanding, application and interpretation.
The major disadvantage of true/false items is that when they are used exclusively, they tend to
promote memorization of factual information: names, dates, definitions, and so on. Some argue
that another weakness of true/false items is that they encourage students for guessing. This is
because any student who takes such type of tests does have a 50 percent probability of getting
the right answer. In addition true/false items:
Can often lead a teacher to write ambiguous statements due to the difficulty of writing
statements which are clearly true or false
Do not discriminate b/n students of varying ability as well as other test items
Can often include more irrelevant clues than do other item types
Can often lead a teacher to favour testing of trivial knowledge
The following suggestions might perhaps help teachers to construct good quality true/false test
items.
Avoid negative statements, and never use double negatives. In Right-Wrong or True-False
items, negatively phrased statements make it needlessly difficult for students to decide
whether that statement is accurate or inaccurate.
Restrict single-item statements to single concepts. If you double-up two concepts in a
single item statement, how does a student respond if one concept is accurate and the other
isn’t? Take a look at this confusing item:
Use an approximately equal number of items, reflecting the two categories tested. If you
typically overbook on false items in your True-False tests, students who are totally at sea
about an item will be apt to opt for a false answer and will probably be correct.
Make statements representing both categories equal in length. Again, to avoid giving
away the correct answers, don’t make all your false statements brief and (in an effort to
include necessary qualifiers) make all your true statements long. Students catch on quickly to
this kind of test-making tendency.
Matching Items
A matching item consists of two lists of words or phrases. The test-taker must match components
in one list (the premises, typically presented on the left) with components in the other list (the
37
responses, typically presented on the right), according to a particular kind of association
indicated in the item’s directions.
Like True-False items, matching items can cover a good deal of content in an efficient fashion.
They are a good choice if you’re interested in finding out if your students have memorized
factual information. Matching items sometimes can work well if you want your students to cross-
reference and integrate their knowledge regarding the listed premises and responses.
The major advantage of matching items is its compact form, which makes it possible to measure
a large amount of related factual material in a relatively short time. Another advantage is its ease
of construction.
The main limitation of matching test items is that they are restricted to the measurement of
factual information based on rote learning. Another limitation is the difficulty of finding
homogenous material that is significant from the perspective of the learning outcomes. As a
result test constructors tend to include in their matching items material which is less significant.
The following suggestions are important guidelines for the construction of good matching items.
Use fairly brief lists, placing the shorter entries on the right. If the premises and
responses in a matching item are too long, students tend to lose track of what they originally
set out to look for. The words and phrases that make up the premises should be short, and
those that make up the responses should be shorter still.
Employ homogeneous lists. Both the list of premises and responses must be composed of
similar sorts of things. If not, an alert student will be able to come up with the correct
associations simply by “elimination” because some entries in the premises or responses may
clearly be noticeable from the others.
Include more responses than premises. If you use the exact same number of responses as
premises in a matching item, then a student who knows half or more of the correct
associations is in a position to guess the rest of the associations with very good chances.
List responses in a logical order. This rule is designed to make sure you don’t accidentally
give away hints about which responses connect with which premises. Choose a logical
ordering scheme for your responses (say, alphabetical or chronological) and stick with it.
38
Describe the basis for matching and the number of times a response can be used. To
satisfy this rule, you need to make sure your test’s directions clarify the nature of the
associations you want students to use when they identify matches. Regarding the student’s
use of responses, a phrase such as the following is often employed: “Each response in the list
at the right may be used once, more than once, or not at all.”
Try to place all premises and responses for any matching item on a single page. This
rule’s intent is to free your students from lots of potentially confusing flipping back and forth
in order to accurately link responses to premises.
The short-answer items and completion test items are essentially the same that can be answered
by a word, phrase, number or formula. They differ in the way the problem is presented. The short
answer type uses a direct question, where as the completion test item consists of an incomplete
statement requiring the student to complete. This can be demonstrated by the following
examples:
Short answer item: In which year did the Ethiopians defeat the Italian invaders at Adwa?
Completion item: The Ethiopian forces defeated the Italian invaders at Adwa in the year _____.
The short-answer test items are one of the easiest to construct, partly because of the relatively
simple learning outcomes it usually measures. Except for the problem-solving outcomes
measured in Mathematics and Science, it is used almost exclusively to measure the recall of
memorized information.
A more important advantage of the short-answer item is that the students must supply the
answer. This reduces the possibility that students will obtain the correct answer by guessing.
They must either recall the information requested or make the necessary computations to solve
the problem presented to them. Partial knowledge, which might enable them to choose the
correct answer on a selection item, is insufficient for answering a short answer test item
correctly.
There are two limitations cited in the use of short-answer test items. One is that they are
unsuitable for assessing complex learning outcomes. The other is the difficulty of scoring. This is
39
especially true where the item is not clearly phrased to require a definitely correct answer and the
student’s spelling ability.
The following suggestions will help to make short-answer type test items to function as intended.
Word the item so that the required answer is both brief and specific.
Example: An animal that eats the flesh of other animals is _____. Poorly stated
An animal that eats the flesh of other animals is classified as _____. Better item
Do not take statements directly from textbooks to use as a basis for short-answer items.
When taken out of context, such statements are frequently too general and ambiguous to
serve as good short-answer items.
A direct question is generally more desirable than an incomplete statement.
If the answer is to be expressed in numerical units, indicate the type of answer wanted.
For computational problems, it is usually preferable to indicate the units in which the
answer is to be expressed.
Multiple-Choice Items
This is the most popular type of selected-response item. It can effectively measure many of the
simple learning outcomes measured by the the short-answer item, the true-false item, and the
matching item types. In addition, it can measure a variety of complex cognitive learning
outcomes.
A multiple-choice item consists of a problem and a list of suggested solutions. A student is first
given either a question or a partially complete statement. This part of the item is referred to as
the item’s stem. Then three or more potential answer-options are presented. These are usually
called alternatives, choices or options.
A key advantage of the multiple-choice item is its widespread applicability to the assessment of
cognitive skills and knowledge, as well as to the measurement of students’ affect.. Another
advantage of multiple-choice items is that it’s possible to make them quite varied in the levels of
40
difficulty they possess. Cleverly constructed multiple-choice items can present very high-level
cognitive challenges to students. And, of course, as with all selected-response items, multiple-
choice items are fairly easy to score.
The key weakness of multiple-choice items is that when students review a set of alternatives for
an item, they may be able to recognize a correct answer that they would never have been able to
generate on their own. In that sense, multiple-choice items can present an exaggerated picture of
a student’s understanding or competence, which might lead teachers to invalid inferences.
Another serious weakness, one shared by all selected-response items, is that multiple-choice
items can never measure a student’s ability to creatively synthesize content of any sort. Finally,
in an effort to come up with the necessary number of plausible alternatives, novice item-writers
sometimes toss in some alternatives that are obviously incorrect.
Well-constructed multiple-choice items, when deployed along with other types of items, can
make a genuine contribution to a teacher’s assessment arsenal. Here are some useful rules for
you to follow.
The question or problem in the stem must be self-contained. The stem should contain
as much of the item’s content as possible, thereby rendering the alternatives much shorter
than would otherwise be the case.
Avoid negatively stated stems. Just as with the True/False items, negatively stated stems
can create genuine confusion in students.
Each alternative must be grammatically consistent with the item’s stem. Well, as you
can see from the next sample item, grammatical inconsistency for three of these answer-
options supplies students with an unintended clue to the correct answer.
Make all alternatives plausible, but be sure that one of them is indisputably the
correct or best answer. As I indicated when describing the weaknesses of multiple-
choice items, teachers sometimes toss in one or more implausible alternatives, thereby
diminishing the item substantially. Although avoiding that problem is important, it’s even
more important to make certain that you really do have one valid correct answer in any
item’s list of alternatives, rather than two similar answers, either of which could be
arguably correct.
41
Randomly use all answer positions in approximately equal numbers. If you use four-
option items, make sure that roughly one-fourth of the correct answers turn out to be A,
one fourth B, and so on.
Never use “all of the above” as an answer choice, but use “none of the above” to
make items more demanding.
Students often become confused when confronted with items that have more than one correct
answer. Usually, what happens is they’ll see one correct alternative and instantly opt for it
without recognizing that there are other correct options later in the list. In addition, students will
definitely opt for the “all of the above option” if they realize that two of the alternatives are
correct without considering the third option. However, we can increase the difficulty level of a
test item by presenting three or four answer options, none of which is correct, followed by a
correct “none-of-the-above” option.
Activity: Examine the following faulty multiple choice items and identify their problems.
1. The term "side effect" of a drug refers to:
A. additional benefits from the drug.
B. the chain effect of drug action.
C. the influence of drugs on crime.
D. any action of a drug in the body other than the one the doctor wanted the drug to have.
2. When linking two clauses, one main and one subordinate, one should use a:
A. coordinate conjunction such as and or so
B. subordinate conjunction such as because or although.
C. preposition such as to or from.
D. semicolon.
3. Entomology is:
A. the study of birds.
B. the study of fish.
C. the study of insects.
4. The promiscuous use of sprays, oils, and antiseptics in the nose during acute colds is a pernicious
practice because it may have a deleterious effect on
A. the sinuses.
B. red blood cells.
C. white blood cells.
5. An electric transformer can be used:
A. for storing electricity
B. to increase the voltage of alternating current
C. It converts electric energy in mechanical energy
D. alternating current is changed to direct current
42
Constructing Performance Assessments
In the previous paragraphs you have been learning on how objective test items should be
constructed. You have learned that well constructed objective tests can measure a variety of
learning outcomes, from simple to complex. Despite this wide applicability of objective-item
types, there remain significant learning outcomes for which no satisfactory objective
measurements have been developed. These include such outcomes as the ability to recall,
organize, and integrate ideas; the ability to express oneself in writing; and the ability to create
rather than merely identify interpretations and applications of data. Such outcomes require less
structuring of responses than objective test items, and it is in the measurement of these outcomes
that written essays and other performance-based assessments are of great value.
In this section, you will be presented with the most familiar form of performance-based
assessment – essay question. The distinctive feature of essay questions is that students are free to
construct, relate, and present ideas in their own words. Learning outcomes concerned with the
ability to conceptualize, construct, organize, relate, and evaluate ideas require the freedom of
response and the originality provided by essay questions.
Essay questions can be classified into two types – restricted-response essay questions and
extended response essay questions. Now let us briefly see these type of questions.
Restricted-response essay questions: These types of questions usually limit both the content
and the response. The content is usually restricted by the scope of the topic to be discussed.
Limitations on the form of response are generally indicated in the question. This can be
demonstrated in the following example:
In what ways are essay questions more preferable than objective test items? Answer in a
brief paragraph.
43
This freedom enables them to demonstrate their ability to analyze problems, organize their ideas,
describe in their own words, and/or develop a coherent argument.
In addition to the already described capacity in measuring higher order thinking skills, essay
questions have some more advantages which include the following:
Extended-response essays focus on the integration and application of thinking and
problem solving skills.
Essay assessments enable the direct evaluation of writing skills.
Essay questions, as compared to objective tests, are easy to construct.
Essay questions have a positive effect on students learning.
On the other hand, essay questions also have some limitations which you need to be aware of.
Perhaps the most commonly cited problem of those test questions is their unreliability of scoring.
Thus, the same paper may be scored differently by different teachers, and even the same teacher
may give different scores for the same paper at different times. Another limitation is the amount
of time required for scoring the responses. Still another problem with essay tests is the limited
sampling of content they provide.
44
information they need to respond appropriately to an essay item. The less guessing that
your students are obliged to do about how they’re supposed to respond, the less likely it
is that you’ll get lots of off-the-wall essays that don’t give you the evidence you need.
Employ more questions requiring shorter answers rather than fewer questions
requiring longer answers. This rule is intended to foster better content sampling in a
test’s essay items. With only one or two items on a test, chances are awfully good that
your items may miss your students’ areas of content mastery or non mastery.
Don’t employ optional questions. When students are made to choose their essay items
from several options, you really end up with different tests, unsuitable for comparison.
Test a question’s quality by creating a trial response to the item. A great way to
determine if your essay items are really going to get at the responses you want is to
actually try writing a response to the item, much as a student might do.
As we have seen earlier the most serious limitation with essay questions is related to scoring.
Therefore, the following guidelines would be helpful in making the scoring of essay items easier
and more reliable.
1. You should ensure that you are firm emotionally, mentally etc before scoring
2. All responses to one item should be scored before moving to the next item
3. Write out in advance a model answer to guide yourself in grading the students’ answers
4. Shuffle exam papers after scoring every question before moving to the next
5. The names of test takers should not be known while scoring to avoid bias
Table of Specification
The development of valid, reliable and usable questions involves proper planning. The plan
entails designing a framework that can guide the test developers in the items development
process. This is necessary because classroom test is a key factor in the evaluation of learning
outcomes. The validity, reliability and usability of such test depend on the care with which the
45
test are planned and prepared. Planning helps to ensure that the test covers the pre-specified
instructional objectives and the subject matter (content) under consideration. Hence, planning
classroom test involves identifying the instructional objectives earlier stated and the subject
matter (content) covered during the teaching/learning process. This leads to the preparation of
table of specification (the test blue print) for the test while bearing in mind the type of test that
would be relevant for the purpose of testing.
To plan a classroom test that will be both practical and effective in providing evidence of
mastery of the instructional objectives and content covered requires relevant considerations.
Hence the following serves as guide in planning a classroom test.
i. Determine the purpose of the test;
ii. Describe the instructional objectives and content to be measured.
iii. Determine the relative emphasis to be given to each learning outcome;
iv. Select the most appropriate item formats (essay or objective);
v. Develop the test blue print to guide the test construction;
vi. Prepare test items that is relevant to the learning outcomes specified in the test plan;
vii. Decide on the pattern of scoring and the interpretation of result;
viii. Decide on the length and duration of the test, and
ix. Assemble the items into a test, prepare direction and administer the test.
The instructional objectives of the course are critically considered while developing the test
items. This is because the instructional objectives are the intended behavioural changes or
intended learning outcomes of instructional programs which students are expected to possess at
the end of the instructional process. The instructional objectives usually stated for the assessment
of behavior in the cognitive domain of educational objectives are classified by Bloom (1956) in
his taxonomy of educational objectives into knowledge, comprehension, application, analysis,
synthesis and evaluation. The objectives are also given relative weight in respect to the level of
importance and emphasis given to them. Educational objectives and the content of a course are
the focus on which test development is based.
A table of specification is a two-way table that matches the objectives and content you have
taught with the level at which you expect your students to perform. It contains an estimate of the
percentage of the test to be associated to each topic at each level at which it is to be measured. In
46
effect we establish how much emphasis to give to each objective or content. A table of
specification guides the selection of test items which in effect ensures that the test measures a
representative sample of instructionally relevant tasks.
Instructional Objectives
Knowled Comprehensi Applicati Analysi Synthesi Evaluatio Tota
Contents Percent
ge on on s s n l
Air
2 2 1 1 - - 6 24%
pressure
Wind 1 1 1 1 - - 4 16%
Temperatu
2 2 1 1 - 1 7 28%
re
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25
Percent 28% 32% 16% 16% 4% 4% 100%
As can be observed from the table, the rows show the content areas from which the test is to be
sampled; and the columns indicate the level of thinking students are required to demonstrate in
each of the content areas. Thus, the test items are distributed among each of the five content
areas with their corresponding representation among the six levels of the cognitive domain. The
percentage row and column also shown the degree of representation of both the contents and
levels of the cognitive domain in this particular test. Thus objectives you consider are more
important should get more representation in the test items. Similarly, content areas on which you
have spent more instructional time should be allotted more test items.
47
Which of the objectives on the example above were given more emphasis? Which of them
obtained least emphasis? Which content areas obtained the highest representation? Which one
obtained the last representation? What are the implications of these differences?
There are also other ways of developing a test blue print. One of this is a way of showing the
distribution of test items among the content areas and the type of test items to be developed from
each content area. For example, the table of specification that we have seen earlier can be
prepared in the following way.
Item Types
Contents True/ Matchin Short Multiple Tota Percent
False g Answer Choice l
Air pressure 1 1 1 3 6 24%
Wind 1 1 1 1 4 16%
Temperature 1 2 1 3 7 28%
Rainfall 1 1 1 2 5 20%
Clouds 1 - 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%
Arranging the sections of a test in this order produces a sequence that roughly approximates the
complexity of the outcomes measured, ranging from the simple to the complex. It is then a
merely a matter of grouping the items within each item type. For this purpose, items that measure
48
similar outcomes should be placed together and then arranged in order of ascending difficulty.
For example the items under the multiple choice section might be arranged in the following
order: knowledge of terms, knowledge of specific facts, knowledge of principles, and application
of principles. Keeping together items that measure similar learning outcomes is especially
helpful in determining the type of learning outcomes causing students the greatest difficulty.
If, for any reason, it is not feasible to group the items by the learning outcomes measured, then it
is still desirable to arrange them in order of increasing difficulty. Beginning with the easiest
items and proceeding gradually to the most difficult has a motivating effect on students. Also,
encountering difficult items early in the test often causes students to spend a disproportionate
amount of time on such items. If the test is long, they may be forced to omit later questions that
they could easily have answered. With the items classified by item type, the sections of the test
and the items within each section can be arranged in order of increasing difficulty.
To summarize, the most effective method for organizing items in the typical classroom test is to:
1. Form sections by item type
2. Group the items within each section by the learning outcomes measured, and
3. Arrange both the sections and the items within sections in an ascending order of
difficulty.
Project Work: In groups of four, take one exam paper from the school you are placed for your
practicum experience which includes at least three types of test items. Then evaluate the items
and the test in general based on the guidelines of test construction you have learned in the unit.
You have to prepare and submit a report of your evaluation to your instructor. The test paper
you have evaluated should also be attached with your report.
49
their best efforts and the control of such factors such as malpractices and unnecessary threat from
test administrators that may interfere with valid measurement. It is also concerned with selecting
convenient and accurate procedures for scoring the results.
There are a number of conditions that may create test anxiety on students ant therefore should be
taken care of during test administration. These include:
Threatening students with tests if they do not behave
Warning students to so their best “because the test is important”
Telling students they must work fast in order to finish on time.
Threatening dire consequences if they fail.
50
recipients and users of the results of assessment place on the result with respect to the grades
obtained, certificates issued or the issuing institution. While civility on the other hand enquires
whether the persons being assessed are in such conditions as to give their best without
hindrances and burdens in the attributes being assessed and whether the exercise is seen as
integral to or as external to the learning process.
Hence, in test administration, effort should be made to see that the test takers are given a fair and
unaided chance to demonstrate what they have learnt with respect to:
a) Instructions: Test should contain a set of instructions which are usually of two types. One
is the instruction to the test administrator while the other one is to the test taker. The
instruction to the test administrator should explain how the test is to be administered the
arrangements to be made for proper administration of the test and the handling of the scripts
and other materials. The instructions to the administrator should be clear for effective
compliance. For the test takers, the instruction should direct them on the amount of work to
be done or of tasks to be accomplished. The instruction should explain how the test should
be performed. Examples may be used for illustration and to clarify the instruction on what
should be done by the test takers. The language used for the instruction should be
appropriate to the level of the test takers. The necessary administrators should explain the
test takers instruction for proper understanding especially when the ability to understand and
follow instructions is not part of the test.
b) Duration of the Test: The time for accomplishing the test is technically important in test
administration and should be clearly stated for both the test administrators and test takers.
Ample time should be provided for candidates to demonstrate what they know and what
they can do. The duration of test should reflect the age and attention span of the test takers
and the purpose of the test.
c) Venue and Sitting Arrangement: The test environment should be learner friendly with
adequate physical conditions such as work space, good and comfortable writing desks,
proper lighting, good ventilation, moderate temperature, conveniences within reasonable
distance and serenity necessary for maximum concentration. It is important to provide
enough and comfortable seats with adequate sitting arrangement for the test takers’ comfort
and to reduce collaboration between them. Adequate lighting, good ventilation and moderate
51
temperature reduce test anxiety and loss of concentration which invariably affects
performance in the test. Noise is another undesirable factor that has to be adequately
controlled both within and outside the test immediate environment since it affects
concentration and test scores.
d) Other necessary conditions: Other necessary conditions include the fact that the questions
and questions paper should be friendly with bold characters, neat, decent, clear and appealing
and not such that intimidates test taker into mistakes. All relevant materials for carrying out
the demands of the test should be provided in reasonable number, quality and on time.
All these are necessary to enhance the test administration and to make assessment civil in
manifestation.
On the other hand, for the credibility effort should be made to moderate the test questions before
administration based on laid down standard. It is also important to ensure that valid questions are
constructed based on procedures for test construction which you already have learned in the
earlier sections of this unit.
Secure custody should be provided for the questions from the point of drafting to constituting the
final version of the test, to provision of security and safe custody of live scripts after the
assessment, transmitting them to the graders and provision of secure custody for the grades
arising from the assessment against loss, mutilation and alteration. The test administrators and
the graders should be of proven moral integrity and should hold appropriate academic and
professional qualifications. The test scripts are to be graded and marks awarded strictly by using
itemized marking schemes. All these are necessary because an assessment situation in which
credibility is seriously called to question cannot really claim to be valid.
Unit Summary
In this unit you were introduced to different types of assessment approaches, namely formal vs.
informal, criterion referenced vs. norm referenced, formative vs. summative assessments. You
also learned about various assessment strategies. These include: classroom presentations,
exhibitions/demonstrations, conferences, interviews, observations, performance tasks, portfolios,
question and answer, students’ self assessment, checklists, rating scales and rubrics, one-minute
paper, muddiest point, students-generated questions and tests.
52
You also learned about the challenges in the assessment of large classes and their consequences
and some of the strategies that we can use to minimize those challenges. These strategies
include: front ending, Making use of in-class assignments, self and peer assessment, group
assessment, Changing the assessment method, or at least shortening it.
Much of this unit was devoted to the construction of the widely assessment techniques, that is
tests. In this regard, tests were classified into two broad categories: Objective tests and
performance assessment tasks (essay tests). Objective tests were further divided into supply type
items and selection type items. Supply type items include short answer and completion items,
where as selection type items include True/false items, matching items and multiple choice
items. Essay items were also classified into restricted essay items and extended essay items. Here
you have learned about the strengths and limitations of these different test item types. You were
also introduced to the major guidelines you should follow in constructing these test item types.
This unit also covered about the planning or tests and particularly on the preparation of table of
specification or test blue print. You were also familiarized with how test item types should
arranged. Finally you also learned about the techniques and procedures we should follow when
during test administration.
Self-Check Exercises
1. State the differences between formative and summative assessment, criterion referenced
and norm referenced assessment, and formal and informal assessment.
2. What conditions do we consider in selecting assessment strategies in our subject?
3. List down the major assessment strategies that you can use in your subject and classify
them as formal and informal strategies.
4. What are the major problems associated with assessing students in large classes? What
strategies can we use to minimize these problems?
5. What is the difference between objective tests and essay tests?
6. What are the advantages of objective tests as compared to essay tests?
7. What are the advantages of essay tests as compared to objective tests?
8. What is a table of specification and what major purposes does it serve?
9. What are the major procedures we need to follow during test administration?
53
54
Unit 3: Item Analysis
Introduction
In unit two you learned about various assessment strategies that can be used in the context of secondary
education. You were also introduced with the planning, construction and administration of classroom
tests. In this unit, you are going to learn the techniques of analyzing responses to test items so as to
determine their validity and reliability. You will also learn about the advantages and techniques of test
item banking.
Learning Outcomes
Upon completion of this unit, you should be able to:
Item analysis is an important phase in the construction of tests. It is the process involved in examining or
analyzing testee’s responses to each item on a test with a basic intent of judging the quality of item. Item
analysis helps to determine the adequacy of the items within a test as well as the adequacy of the test
itself. There are several reasons for analyzing questions and tests that students have completed and that
have already been graded. Some of the reasons that have been cited include the following:
1. Identify content that has not been adequately covered and should be re-taught,
2. Provide feedback to students,
3. Determine if any items need to be revised in the event they are to be used again or become
part of an item file or bank,
55
4. Identify items that may not have functioned as they were intended,
5. Direct the teacher's attention to individual student weaknesses.
The results of an item analysis provide information about the difficulty of the items and the ability of the
items to discriminate between better and poorer students. If an item is too easy, too difficult, failing to
show a difference between skilled and unskilled examinees, or even scored incorrectly, an item analysis
will reveal it. The two most common statistics reported in an item analysis are the item difficulty and the
item discrimination. An additional analysis that is often reported is the distractor analysis. Once the item
analysis information is available, an item review is often conducted. In the following sections you are
going to learn the statistical techniques used to analyse responses to test items.
Item difficulty index is one of the most useful, and most frequently reported, item analysis statistics. It is
a measure of the proportion of examinees who answered the item correctly; for this reason it is frequently
called the p-value. If scores from all students in a group are included the difficulty index is simply the
total percent correct. When there is a sufficient number of scores available (i.e., 100 or more) difficulty
indexes are calculated using scores from the top and bottom 27 percent of the group.
1. Rank the papers in order from the highest to the lowest score
2. Select one-third of the papers with the highest total score and another one-third of the papers with
lowest total scores
3. For each test item, tabulate the number of students in the upper & lower groups who selected each
option
4. Compute the difficulty of each item (% of students who got the right item)
P=
56
Where, HSG = High Scoring Groups
The difficulty indexes can range between 0.0 and 1.0 and are usually expressed as a percentage. A higher
value indicates that a greater proportion of examinees responded to the item correctly, and it was thus an
easier item. The average difficulty of a test is the average of the individual item difficulties. For
maximum discrimination among students, an average difficulty of .60 is ideal. For example: If 243
students answered item no. 1 correctly and 9 students answered incorrectly, the difficulty level of the item
would be 243/252 or .96.
In the example below, five true-false questions were part of a larger test administered to a class of 20
students. For each question, the number of students answering correctly was determined, and then
converted to the percentage of students answering correctly.
57
Activity: Calculate the item difficulty level for the following four options multiple choice test item. (The
sign (*) shows the correct answer).
Response Options
Groups A B C D* Total
High Scorers 0 1 1 8 10
Low Scorers 1 1 5 3 10
Total 1 2 6 11 20
For criterion-referenced tests (CRTs), with their emphasis on mastery-testing, many items on an exam
form will have p-values of .9 or above. Norm-referenced tests (NRTs), on the other hand, are designed to
be harder overall and to spread out the examinees’ scores. Thus, many of the items on an NRT will have
difficulty indexes between .4 and .6.
The index of discrimination is a numerical indicator that enables us to determine whether the question
discriminates appropriately between lower scoring and higher scoring students. When students who earn
high scores are compared with those who earn low scores, we would expect to find more students in the
high scoring group answering a question correctly than students from the low scoring group. In the case
58
of very difficult items which no one in either group answered correctly or fairly easy questions which
even the students in the low group answered correctly, the numbers of correct answers might be equal for
the two groups. What we would not expect to find is a case in which the low scoring students answered
correctly more frequently than students in the high group.
D=
In the example below, there are 8 students in the high scoring group and 8 in the low scoring group (with
12 between the two groups which are not represented). For question 1, all 8 in the high scoring group
answered correctly, while only 4 in the low scoring group did so. Thus success in the HSG – Success in
the LSG (8 - 4) = +4. The last step is to divide the +4 by half of the total number of both groups (16).
1 8 4 8–4=4 .5
2 7 2
3 5 6
Activity 2: Calculate the item discrimination index for the questions 2 & 3 on the table above.
The item discrimination index can vary from -1.00 to +1.00. A negative discrimination index (between -
1.00 and zero) results when more students in the low group answered correctly than students in the high
group. A discrimination index of zero means equal numbers of high and low students answered correctly,
59
so the item did not discriminate between groups. A positive index occurs when more students in the high
group answer correctly than the low group. If the students in the class are fairly homogeneous in ability
and achievement, their test performance is also likely to be similar, resulting in little discrimination
between high and low groups.
Questions that have an item difficulty index (NOT item discrimination) of 1.00 or 0.00 need not be
included when calculating item discrimination indices. An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered correctly. We already know that neither type of
item discriminates between students.
When computing the discrimination index, the scores are divided into three groups with the top 27% of
the scores in the upper group and the bottom 27% in the lower group. The number of correct responses
for an item by the lower group is subtracted from the number of correct responses for the item in the
upper group. The difference is divided by the number of students in either group. The process is repeated
for each item.
For a small group of students, an index of discrimination for an item that exceeds .20 is considered
satisfactory. For larger groups, the index should be higher because more difference between groups would
be expected. The guidelines for an acceptable level of discrimination depend upon item difficulty. For
60
very easy or very difficult items, low discrimination levels would be expected; most students, regardless
of ability, would get the item correct or incorrect as the case may be. For items with a difficulty level of
about 70 percent, the discrimination should be at least .30.
When an item is discriminating negatively, overall the most knowledgeable examinees are getting the
item wrong and the least knowledgeable examinees are getting the item right. A negative discrimination
index may indicate that the item is measuring something other than what the rest of the test is measuring.
More often, it is a sign that the item has been mis-keyed.
Just as the key, or correct response option, must be definitively correct, the distracters must be clearly
incorrect (or clearly not the "best" option). In addition to being clearly incorrect, the distractors must also
be plausible. That is, the distractors should seem likely or reasonable to an examinee who is not
sufficiently knowledgeable in the content area.
If a distractor appears so unlikely that almost no examinee will select it, it is not contributing to the
performance of the item. In fact, the presence of one or more plausible distractors in a multiple choice
item can make the item artificially far easier than it ought to be. Let us try to explain this using the
following table as an example that shows the responses of eight students to five multiple-choice
questions.
A B C D
TEST ITEM NO 1 5** 1 1 1
TEST ITEM NO 2 0 2 6** 0
TEST ITEM NO 3 2** 2 2 2
TEST ITEM NO 4 0 3** 0 5
TEST ITEM NO 5 2 1 0 5**
** Denotes Correct Answer
61
Over 50% of the students answered question number 1 correctly, and each of the distractors was selected.
The distractors have functioned as they should. The teacher may be less than satisfied with only 5 of 8
students answering correctly, but a class would generally have more than eight students and could well
have a higher percentage of correct answers while still having effective distractors.
It is not desirable to have one of the distractors chosen more often than the correct answer, as occurred
with question 4. This result indicates a potential problem with the question. Distractor D may be too
similar to the correct answer and/or there may be something in either the stem or the alternatives that is
misleading.
If students do not know the correct answer and are purely guessing, their answers would be expected to be
distributed among the distractors as well as the correct answer, much like question 3. If one or more
distractors are not chosen, as occurs in questions 2, 4, and 5, the unselected distractors probably are not
plausible. If the teacher wants to make the test more difficult, those distractors should be replaced in
subsequent tests.
In a simple approach to distractor analysis, the proportion of examinees who selected each of the response
options is examined. The proportion of examinees who select each of the distractors can be very
informative. For example, it can reveal an item mis-key. Whenever the proportion of examinees who
selected a distractor is greater than the proportion of examinees who selected the key, the item should be
examined to determine if it has been mis-keyed or double-keyed. A distractor analysis can also reveal an
implausible distractor. In criterion referenced tests, where the item p-values are typically high, the
proportions of examinees selecting all the distractors are, as a result, low. Nevertheless, if examinees
consistently fail to select a given distractor, this may be evidence that the distractor is implausible or
simply too easy.
Project Work
In the school where you are placed for your Practicum activities, take corrected exam papers of 1 section
from the cooperating teacher and by taking 10 multiple choice questions:
62
3.2.4 Item Banking
Building a file of effective test items and assessment tasks involves recording the items or tasks, adding
information from analyses of students responses, and filing the records by both the content area and the
objective that the item or task measures. Thus, items and tasks are recorded on records as they are
constructed; information form analysis of students responses is added after the items and tasks have been
used, and then the effective items and tasks are deposited in the file. In a few years, it is possible to start
using some of the items and tasks from the file and supplement these with new items and tasks. As the file
grow, it becomes possible to select the majority of the items and tasks from the file for any given test or
assessment without repeating them frequently. Such a file is especially valuable in areas of complex
achievement, when the construction of test items and assessment tasks is difficult and time consuming.
When enough high-quality items and tasks have been assembled, the burden of preparing tests and
assessments is considerably lightened. Computer item banking makes tasks even easier.
Summary
In this unit you learned how to judge the quality of classroom test by carrying out item analysis which is
the process of “testing the item” to ascertain specifically whether the item is functioning properly in
measuring what the entire test is measuring. You also learned about the process of item analysis and how
to compute item difficulty, item discriminating power and evaluating the effectiveness of distracters. You
have learned that item difficulty indicates the percentage of testees who get the item right; Item
discriminating power is an index which indicates how well an item is able to distinguish between the high
achievers and low achievers given what the test is measuring; and the distraction power of a distracter is
its ability to differentiate between those who do not know and those who know what the item is
measuring.
Finally you learned that after conducting item analysis, items may still be usable, after modest changes
are made to improve their performance on future exams. Thus, good test items should be kept in test item
banks and in this unit you were given highlights on how to build a Test Item File/Item Bank.
Self-check Exercises
63
5. From the data presented below (where alternative A is the correct answer), compute the
difficulty level and the discrimination power and comment on the effectiveness of the
distractors.
Alternatives
Group A* B C D
Upper 28 14 5 6 3
Lower 28 7 15 1 5
6. What information should be included with the test item we put in our item bank?·
Reading Materials
Abay Tekle (1982). Evaluation in Education (part one). Bahir Dar Teachers College
(Unpublished teaching material)
Bigge, J.L. and Colleen Shea Stump (1999). Curriculum, Assessment, and Instruction. Boston:
Wadsworth Publishing Company.
64
Unit 4: Interpretation Of Scores
Introduction
In unit three you learned about how to analyse test items in order to determine each test item and the
overall test in general. You also learned about how to build your test item banks. In this unit you are
going to be familiarized with the idea of test score interpretation and the major statistical techniques that
can be used to interpret test scores. Particularly, you will learn about the methods of interpreting test
scores, measures of central tendency, measures of dispersion or variability, measures of relative position,
and measures of relationship or association.
Learning Outcomes
Imagine that you receive a grade of 60 for a midterm exam in one of your university classes.
What does the score mean, and how should we interpret it?
Test interpretation is a process of assigning meaning and usefulness to the scores obtained from
classroom test. This is necessary because the raw score obtained from a test standing on itself rarely has
meaning. For instance, a score of 60% in one Assessment and evaluation of learning test cannot be said to
be better than a score of 50% obtained by the same test taker in another test of the same subject. The test
scores on their own lack a true zero point and equal units. Moreover, they are not based on the same
standard of measurement and as such meaning cannot be read into the scores on the basis of which
academic and psychological decisions may be taken.
65
4.1 Kinds of scores
Data differ in terms of what properties of the real number series (order, distance, or origin) we can
attribute to the scores. The most common kinds of scores include nominal, ordinal, interval, and ratio
scales.
A nominal scale involves the assignment of different numerals to categorize that are qualitatively
different. For example, we may assign the numeral 1 for males and 2 for females. These symbols do not
have any of the three characteristics (order, distance, or origin) we attribute to the real number series. The
1 does not indicate more of something than the 0.
An ordinal scale has the order property of a real number series and gives an indication of rank order. For
example, ranking students based on their performance on a certain athletic event would involve an ordinal
scale. We know who is best, second best, third best, etc. But the ranked do not tell us anything about the
difference between the scores.
With interval data we can interpret the distances between scores. If, on a test with interval data, a Almaz
has a score of 60, Abebe a score of 50, and Beshadu a score of 30, we could say that the distance between
Abebe’s and Beshadu’s scores (50 to 30) is twice the distance between Almaz”s and Abebe’s scores (60
t0 50).
If one measures with a ratio scale, the ratio of the scores has meaning. Thus, a person whose height is 2
meters is twice as a tall as a person whose height is 1 meter. We can make this statement because a
measurement of 0 actually indicates no height. That is, there is a meaningful zero point. However, if a
student scored 0 on a spelling test, we would not interpret the score to mean that the student had no
spelling ability.
66
interpretation) or converting it into some type of derived score that indicates the student’s relative
position in a clearly defined referenced group (norm referenced interpretation). In some cases both types
of interpretation may be appropriate and useful.
Criterion - referenced interpretation is the interpretation of test raw score based on the conversion of the
raw score into a description of the specific tasks that the learner can perform. That is, a score is given
meaning by comparing it with the standard of performance that is set before the test is given. It permits
the description of a learner’s test performance without referring to the performance of others. Thus, we
might describe a pupil’s performance in terms of the speed with which a task is performed, the precision
with which a task is performed, or the percentage of items correct on some clearly defined set of learning
tasks. The percentage-correct score is widely used in criterion-referenced test interpretation.
Criterion referenced interpretation of test results is most meaningful when the test has been specifically
designed for this purpose. This typically involves designing a test that measures a set of clearly stated
learning tasks. Enough items are used for each interpretation to make it possible to describe test
performance in terms of students’ mastery or non-mastery of learning tasks.
Norm – referenced interpretation is the interpretation of raw score based on the conversion of the raw
score into some type of derived score that indicates the learner’s relative position in a clearly defined
referenced group. This type of interpretation tells us how an individual compares with other persons who
have taken the same test.
Norm – referenced interpretation is usually used in the classroom test interpretation by ranking the test
takers raw scores from highest to lowest scores. It is then interpreted by noting the position of an
individual’s score relative to that of other test takers in the classroom test. The interpretation such as third
position from highest position or about average position in the class provides a meaningful report for the
teacher and the test takers on which to base decision. In this type of test score interpretation, what is
important is a sufficient spread of test scores to provide reliable ranking. The percentage score or the
relative easy / difficult nature of the test is not necessarily important in the interpretation of test scores in
terms of relative performance.
67
4.2.1 Measures of Central Tendency
It is often important to summarize characteristics of a distribution of test scores. One
characteristic of particular interest is a measure of central tendency. The goal of the measures of
central tendency is to come up with the one single score that best describes a distribution of
scores. They let us know if the distribution of scores tends to be composed of high scores or low
scores.
There are three basic measures of central tendency – the mean, the mode and the median - and
choosing one over another depends on two different things:
1. The scale of measurement used, so that a summary makes sense given the nature of the
scores.
2. The shape of the frequency distribution, so that the measure accurately summarizes the
distribution.
The Mean
The mean, or arithmetic average, is the most widely used measure of central tendency. It is the average of
a set of scores computed simply by adding together all scores and dividing by the number of scores. The
mean takes into account the value of each score, and so one extremely high or low score could have a
considerable effect on it. It is helpful to know the mean because then you can see which numbers are
above and below the mean.
Here is an example of test scores for a Math’s class: 82, 93, 86, 97, 82. To find the Mean, first you must
add up all of the numbers. (82+93+86+97+82= 433) Now, since there are 5 test scores, we will next
divide the sum by 5. (440÷5= 88). Thus, the Mean is 88. The formula used to compute the mean is as
follows:
Where, = Mean
∑ = the sum of
X = any score
N = Number of scores
68
The Median
In some circumstances, the mean may not be the best indicator of student performance. If there
are one or a few students who score considerably lower (or higher) than the other students, their
scores tend to pull the mean in their direction. In this case the median is usually considered a
better indicator of student performance. There are also some types of scores that are reported for
standardized tests for which the mean is not appropriate (percentile scores), so the median is
used.
The median is a counting average. It is the number that divides a distribution of scores exactly in
half. It is determined by arranging the scores in order of size and counting up to (or down to) the
midpoint of the set scores. The median will usually be around where most scores fall. When the
number of scores is odd, the median is the middle score. If the number of scores is even, the
median will be halfway between the two middle most scores. In this case the median is not an
actual score earned by one of the students.
50 50 49 50
48 49 48 49
48 48 48 47
47 46 47 47
45 46 45 45
44 43 44 45
43 43 43 45
42 42 42 44
42 41 42 42
69
41 41 41 41
38 41
In example 1, our line would be between 44 and 45, so the median would be halfway between them at
44.5. In this case the median is not an actual score earned by one of the students. In example 2, the
distance between the two middle scores (43 and 46) is more than one, so we again find the point halfway
between them for our median of 44.5. If the number of students is uneven, the median is the one score
that is the middle score in the frequency distribution, having equal numbers of scores above and below it.
Thus, the median is 44 in example 3, and 45 in example 4. It does not matter if more than one student
earns that score, as in example 4.
The Mode
This is the score (or scores) that occur most frequently and is determined by inspection. It is the
least reliable type of statistical average and is frequently used merely as a preliminary estimate of
central tendency. A set of scores may sometimes have two or more modes and in such cases are
called bimodal or multimodal respectively.
If the data is categorical (measured on the nominal scale), then only the mode can be calculated.
The mode can also be calculated with ordinal and higher data, but it often is not appropriate. If
other measures can be calculated, the mode would never be the first choice. For example, the
following test scores, 7, 7, 7, 20, 23, 23, 24, 25, 26 have a mode of 7, but obviously it doesn’t
make much sense. Remember, measures of central tendency look for the one number which best
describe all of the numbers.
There is one important situation in which all three measures of central tendency are identical. This occurs
when a distribution is symmetrical, that is, when the right half of the distribution is the mirror image of
the left half of the distribution. In this case the mean will fall exactly at the middle of the distribution (the
median position) and the value at this central point will be the most frequently observed data value, the
mode. If the values of the mean, the mode and the median are identical, a distribution will always be
symmetrical.
70
Figure 1: Shape of distribution of scores
To the extent that differences are observed among these three measures, the distribution is asymmetrical
or “skewed”. These include positively skewed distributions and negatively skewed distributions. In a
positively-skewed distribution (see figure 1 above) most of the scores concentrate at the low end of the
distribution. This might occur, for example, if the test was extremely difficult for the students. .In a
negatively-skewed distribution, as shown in figure 1 above, the majority of scores are toward the high end
of the distribution. This could occur if we gave a test that was easy for most of the students.
Points to note
With perfectly bell shaped distributions, the mean, median, and mode are identical.
With positively skewed data, the mode is lowest, followed by the median and mean.
With negatively skewed data, the mean is lowest, followed by the median and mode.
A set of scores can be more adequately described if we know how much they spread out above and below
the measure of central tendency. For example, we might have two groups of students with a mean score
of 70, but in one group the span of scores is from 60 to 80 and in the other group the span is from 50 to
100. These represent quite different spreads of performance. We can identify such differences by numbers
71
that indicate how much scores spread out in a group. These are called measures of variability or
dispersion. The three most commonly used measures of variability are the range, the quartile deviation,
and the standard deviation.
The Range
It is the simplest and crudest measure of variability calculated by subtracting the lowest score from the
highest score. For example, if the score of 10 students in a certain test is: 5, 7, 8, 10, 12, 13, 14, 15, 17,
19, then the range will be 19 -5 = 14. The range provides a quick estimate of variability but is
undependable because it is based on the position of the two extreme scores. The addition of subtraction of
a single score can change the range significantly.
Inter quartile range (IQR) is another range measure but this time looks at the data in terms of quarters or
percentiles. IQR is the distance between the 25 th and 75th percentile or the first and third quarter. The
range of data is divided into four equal percentiles or quarters (25%). IQR is the range of the middle 50%
of the data. Therefore, because it uses the middle 50%, it is not affected by outliers or extreme values.
The IQR is often used with skewed data as it is insensitive to the extreme scores.
Let us say that two classes took a quiz. There were 10 students in each class, and each class had an
average score of 81.5. Since the averages are the same, can we assume that the students in both classes
have the same performance on the exam?
The answer is… No. The average (mean) does not tell us anything about the distribution or variation in
the grades. So, we need to come up with some way of measuring not just the average, but also the spread
of the distribution of our data.
The most useful measure of variability, or spread of scores, is the standard deviation. It is essentially an
average of the degree to which a set of scores deviates from the mean. If the Standard Deviation is large,
it means the numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to their mean. Because it takes into
account the amount that each score deviates from the mean, it is a more stable measure of variability than
either the range or quartile deviation.
72
The procedure for calculating a standard deviation involves the following steps:
Thus the formula for the standard deviation (SD) is: SD=
Now let us take the previous scenario of two groups of students who too a Math quiz with a mean score
of 81.5 to calculate and compare their standard deviations. The individual scores of group A is: 72, 76,
80, 80, 81, 83, 84, 85, 85, & 89. The individual scores of group B is: 57, 63, 65, 71, 83, 93, 94, 95, 96, 98.
Let us start with group A. So, the first step to finding the Standard Deviation is to find all the distances
from the mean. This will be followed by squaring each distances which will give us the following results.
72 - 9.5 90.25
76 - 5.5 30.25
80 - 1.5 2.25
80 - 1.5 2.25
81 - 0.5 0.25
83 1.5 2.25
84 2.5 6.25
85 3.5 12.25
85 3.5 12.25
89 7.5 56.25
Then we add up all of the squared distances which will gives us 214.5. This will be divided by the total
number of scores of the group which will result 214.5 /10 = 21.45. This is the variance of the data set.
73
Variance is the average squared deviation from the mean of a set of data. It is used to find the standard
deviation. Finally, we calculate the Square Root of the variance. This will give us 4.63, which is the
standard deviation.
Activity: Using the same procedures calculate the standard deviation for the scores of Group B.
I am sure you have come up with 15.1 as a standard deviation for the distribution of scores of group B.
Now, let’s compare the two groups of students again.
Group A Group B
What is your interpretation of the test scores of the two groups based on their standard deviations?
Activity: The Math test scores of five students are: 92,88,80,68 and 52. Find the variance and standard
deviation.
The standard deviation, like other measures of variability, represents a distance. If we move the distance
equal to one SD above and below the mean, we will find that somewhere between 60% and 75% of the
scores fall in that region of most distributions of scores. In a normal distribution, 68% of the scores are
included between the mean minus one SD and the mean plus one SD.
The quartile deviation is used with the median and is satisfactory for analyzing a small number of scores.
Because these scores are obtained by counting and thus are not affected by the value of each score, they
are especially useful when one or more scores deviate markedly from the others in the set.
The standard deviation is used with the mean. It is the most reliable measure of variability, and is
especially useful in testing. In addition to describing the spread of scores in a group, it serves as a basis
for computing standard scores, the standard error of measurement, and other statistics used in analyzing
and interpreting test scores.
74
On the surface it might look bad but what if that was the highest in the class or if that score was better
than 80% of the class? This is what we mean by relative position.
Percentiles
A percentile is a score that indicates the rank of the student compared to others (same age or same grade),
using a hypothetical group of 100 students. . It tells you what percentage of people you did better than. A
percentile of 25 (25th percentile), for example, indicates that the student's test performance equals or
exceeds 25 out of 100 students on the same measure. A percentile of 87 indicates that the student equals
or surpasses 87 out of 100 (or 87% of) students. A percentile must always refer to a student’s percentile
rank as relative to a particular norm group. If you scored at the 80th percentile, what does that mean?
2. Count how many items are below your value. If for example your score is 85 and there are multiple
85’s then count how many are under the first 85.
For example, in the students’ scores of 76, 77, 80, 83, 85, 85, 85, 90, 96 ,97 there are 4 items below 85.
10
Quartiles
Quartile is another term referred to in percentile measure. The total of 100% is broken into four equal
parts: 25%, 50%, 75% 100%.
75
Standard Scores
Another method of indicating a pupils relative position in a group is by showing how far the raw score is
above or below average. This is the approach used with standard scores. Basically, standard scores
express test performance in terms of standard deviation units from the mean. Standard scores are scores
that are based on mean and standard deviation.
Z Score: For data distributions that are approximately symmetric, a measure of relative position that is
often used is the z-score. z-score gives us an estimate as to how many standard deviations a particular
score lies from the mean.
We define z score as z = X – X,
S
For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation of 10, then they
scored 2 standard deviations above the mean. So, a z score of 2 means the original score was 2 standard
deviations above the mean.
If the z-score > 0 (positive) then your data value is above the mean
If the z-score < 0 (negative) then your data value is the below the mean.
Example. Almaz scored a 25 on her math test. Suppose the mean for this exam is 21, with a standard
deviation of 4. Dawit scored 60 on an English test which had a mean of 50 with a standard deviation of 5.
Who did relatively better?
Since standardized tests typically have score distributions which are approximately symmetric, we will
find the respective z-scores for Almaz and Dawit.
Almaz= z-score: 25 - 21 =1
76
Dawit's z-score: 60-50 = 2
Since Dawit had a higher z-score, we say Dawit did relatively better.
T Scores: This refers to any set of normally distributed standard scores that has a mean score of 50 and a
standard deviation of 10. The T – score is obtained by multiplying the Z-score by 10 and adding the
product to 50. That is, T – Score = 50 + 10(z). A score of 60 is one standard deviation above the mean,
while a score of 30 is two standard deviations below the mean.
Example
A test has a mean score of 40 and a standard deviation of 4. What are the T – scores of two test takers
who obtained raw scores of 30 and 45 respectively in the test?
Solution
The first step in finding the T-scores is to obtain the z-scores for the test takers. The z-scores would then
be converted to the T – scores. In the example above, the z – scores are:
For the test taker with raw score of 30, the Z – score is:
SD
X = 30, M = 40, SD = 4.
4 4
The T - Score is then obtained by converting the Z – Score (-2∙5) to T – score. Thus:
T – Score = 50 + 10 (z)
= 50 + 10 (-2∙5)
= 50 – 25
= 25
Activity: Following the same procedures find the t score for the second student whose raw score is 45.
77
4.2.4 Measures of Relationship
If we have two sets of scores from the same group of people, it is often desirable to know the degree to
which the scores are related. For example, we may be interested in the relationship between the test scores
of students for the English Subject and their overall scores of other subjects. The degree of relationship is
expressed in terms of coefficient of correlation. The value ranges from -1.00 to +1.00. A perfect positive
correlation is indicated by a coefficient of +1.00 and a perfect negative correlation by a coefficient of -
1.00. A correlation of .00 indicates no relationship between the two sets of scores. Obviously, the larger
the coefficient (positive of negative), the higher the degree of relationship expressed.
There are several different measures of relationship expressed as correlation coefficients. One of these is
the product-moment correlation coefficient, which is by far the most commonly used and most useful
correlation coefficient. It is indicated by the symbol r.
Project work: In groups of five, take the roaster of one cooperating teacher of the school you are
placed for your practicum experience and do the following tasks:
a) Calculate the average marks of the students of the section by taking five subjects
b) Based on the calculated averages, find the mode, the median, the range, the inter-quartile
range, and the standard deviation
c) Find the average scores that lie in the 25th, 50th, and 75th percentiles
d) Take the scores of two subjects and calculate the coefficient of correlation
You have to prepare a report of your work and submit it for correction.
78
Unit Summary
In this unit you learned that test interpretation is a process of assigning meaning and usefulness to the
scores obtained from classroom test and you were introduced to how to interpret test scores. This includes
criterion-referenced and norm-referenced interpretation. Criterion-referenced interpretation is the
interpretation of test raw score based on the conversion of the raw score into a description of the specific
tasks that the learner can perform. Norm-referenced interpretation is the interpretation of raw score based
on the conversion of the raw score into some type of derived score that indicates the learner’s relative
position in a clearly defined reference group.
This unit also introduced you with different statistical techniques that are useful in interpreting test scores.
These are classified into measures of central tendency, measures of dispersion, measures of
relative position and measures of association or relationship. The measures of central tendency
help us to come up with the one single score that best describes a distribution of scores. The most
commonly used measures of central tendency are the mean, the mode and the median. The measures of
dispersion tell us how much the scores spread out above and below the measure of central tendency as
well as how much they are spread out from one another. These measures include the range, the inter-
quartile range and the standard deviation. The measures of relative position are techniques that will show
us the relative standing of individual scores within a certain set of scores. Measures that are used here
include percentile ranks, quartiles, and standardized scores such as the z scores and t scores. The
measures of relationship or association help us to know the degree to which sets of scores are related. The
most commonly used measure of relationship is the product-moment correlation coefficient.
Self-Check Exercises
79
4. You have given an exam to your students. Scores on this exam are normally distributed with
mean = 40 and standard deviation = 6.
a) What score would a student need to be in the top 15%?
b) What score represents the 45th percentile?
c) If 200 students took the exam, how many would you expect to score below 30?
Reading Materials
Cohen, Louis and M. Holliday ( ). Statistics for Education and Physical Education, New York:
Harper and Row Publishers
Hinkle. D.E. et al. (1994) Applied Statistics for the Behavioral Sciences. Bodyon: Houghton
Miffline Company
McClave, J.T. and Terry Sincich (2003). Statistics (9th ed.), New Jersey: Prentice Hall.
80
UNIT 5: Ethical Standards of assessment
Introduction
In the previous units you have learned about the different assessment related concepts, different
strategies and techniques of assessing students learning, as well as methods of maintaining the
quality of tests. In this unit you will be introduced with ethics as a mechanism of maintaining
quality in our assessment practice. You will be familiarized with some basic standards that are
expected of professional teachers to be ethical in their assessment practices. You will also be
familiarized with some general considerations in addressing diversity in the classroom so as to
make the assessment procedures accessible and free of bias.
Learning Outcomes
Upon completion of this unit, you should be able to:
81
The following are some ethical standards that teachers may consider in their assessment
practices.
2. Teachers should develop tests that meet the intended purpose and that are appropriate for
the intended test takers. This requires teachers to:
Define the purpose for testing, the content and skills to be tested, and the intended
test takers.
Develop tests that are appropriate with content, skills tested, and content coverage
for the intended purpose of testing.
Develop tests that have clear, accurate, and complete information.
Develop tests with appropriately modified forms or administration procedures for
test takers with disabilities who need special accommodations.
3. The teacher should be skilled in administering, scoring and interpreting the results from
diverse assessment methods. It is not enough that teachers are able to select and develop
good assessment methods; they must also be able to apply them properly. This requires
teachers to:
Follow established procedures for administering tests in a standardized manner.
Provide and document appropriate procedures for test takers with disabilities who
need special accommodations or those with diverse linguistic backgrounds.
Protect the security of test materials, including eliminating opportunities for test
takers to obtain scores by fraudulent means.
Develop and implement procedures for ensuring the confidentiality of scores.
82
4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school improvement.
Assessment results are used to make educational decisions at several levels: in the
classroom about students, in the community about a school and a school district, and in
society, generally, about the purposes and outcomes of the educational enterprise.
Teachers play a vital role when participating in decision-making at each of these levels
and must be able to use assessment results effectively.
5. Teachers should be skilled in developing valid pupil grading procedures which use pupil
assessments. Grading students is an important part of professional practice for teachers.
Grading is defined as indicating both a student's level of performance and a teacher's
valuing of that performance. The principles for using assessments to obtain valid grades
are known and teachers should employ them.
83
Teachers should also participate with the wider educational community in defining the
limits of appropriate professional behavior in assessment.
In addition, the following are principles of grading that can guide the development of a grading
system.
1. The system of grading should be clear and understandable (to parents, other stakeholders,
and most especially students).
3. Grading should be fair for all students regardless of gender, socioeconomic status or any
other personal characteristics.
Project work: In groups of five, prepare interview questions or a questionnaire to assess the
extent to which assessment ethics is respected in the school you are placed for your Practicum
experience. Using the instrument you have prepared collect data from the concerned members of
the school community (teachers, students), analyze the data and reach valid conclusions. You
have to prepare a report your conclusions as well as the procedures you have gone through to
reach your conclusion.
Do you believe that culture and ethnicity have any role in teachers’ assessment practices? In you
university experience, have you observed situations where instructors were biased in the
assignment of grades to students based on culture and ethnicity? If so, do you think that was
fair?
84
Students represent a variety of cultural and linguistic backgrounds. If the cultural and linguistic
backgrounds are ignored, students may become alienated or disengaged from the learning and
assessment process. Teachers need to be aware of how such backgrounds may influence student
performance and the potential impact on learning. Teachers should be ready to provide
accommodations where needed.
Classroom assessment practices should be sensitive to the cultural and linguistic diversity of
students in order to obtain accurate information about their learning. Assessment practices that
attend to issues of cultural diversity include those that
acknowledge students’ cultural backgrounds.
are sensitive to those aspects of an assessment that may hamper students’ ability to
demonstrate their knowledge and understanding.
use that knowledge to adjust or scaffold assessment practices if necessary.
Assessment practices that attend to issues of linguistic diversity include those that
acknowledge students’ differing linguistic abilities.
use that knowledge to adjust or scaffold assessment practices if necessary.
use assessment practices in which the language demands do not unfairly prevent the
students from understanding what is expected of them.
use assessment practices that allow students to accurately demonstrate their
understanding by responding in ways that accommodate their linguistic abilities, if the
response method is not relevant to the concept being assessed (e.g., allow a student to
respond orally rather than in writing).
Teachers must make every effort to address and minimize the effect of bias in classroom
assessment practices. Bias occurs when irrelevant or arbitrary factors systematically influence
interpretations and results made that affect the performance of an individual student or a
subgroup of students. For example, bias may occur when variables—such as cultural and
language differences and socioeconomic status—are not fairly accounted for when interpreting
results from an assessment.
Assessment should be culturally and linguistically appropriate, fair and bias-free. It may not be
possible to totally eliminate all forms of bias from classroom assessments. However, teachers
85
and others who assess students’ learning should recognize that bias is an ever-present concern to
student assessment and be vigilant and resistant to the sources of bias, including plans for
identifying and addressing bias. For an assessment task to be fair, its content, context, and
performance expectations should:
reflect knowledge, values, and experiences that are equally familiar and appropriate to all
students;
tap knowledge and skills that all students have had adequate time to acquire;
be as free as possible of cultural and ethnic stereotypes.
Activity: In groups of five, find and discuss on the following documents and briefly report the
ideas each document addresses in relation to inclusive education:
1. The Dakar Framework For action (2000)
2. The Salamanca Statement and Framework for Action in Special Needs Education (1994)
UN Convention on the Rights of Persons with Disabilities (2006)
One group should work on one convention and documents can be found from the internet.
Inclusive education is based on the idea that all students, including those with disabilities, should
be provided with the best possible education to develop themselves. This implies for the
provision of all possible accommodations to address the educational needs of disabled students.
Accommodations should not only refer to the teaching and learning process. It should also
consider the assessment mechanisms and procedures.
86
Activity: In small groups, discuss on what type of accommodations that can be made to make
assessment practices accessible to students with different types of disabilities. Each group may
discuss on one type of disability and share its ideas to the other groups.
There are different strategies that can be considered to make assessment practices accessible to
students with disabilities depending on the type of disability. In general terms, however, the
following strategies could be considered in summative assessments:
Modifying assessments: - This should enable disabled students to have full access to the
assessment without giving them any unfair advantage.
Others’ support: - Disabled students may need the support of others in certain
assessment activities which they can not do it independently. For instance, they may
require readers and scribes in written exams; they may also need others’ assistance in
practical activities, such as using equipments, locating materials, drawing and measuring.
Time allowances: - Disabled students should be given additional time to complete their
assessments which the individual instructor has to decide based on the purpose and nature
of the assessment.
Rest breaks: Some students may need rest breaks during the examination. This may be
to relieve pain or to attend to personal needs.
Flexible schedules: In some cases disabled students may require flexibility in the
scheduling of examinations. For example, some students may find it difficult to manage a
number of examinations in quick succession and need to have examinations scheduled
over a period of days.
Alternative methods of assessment:- In certain situations where formal methods of
assessment may not be appropriate for disabled students, the instructor should assess
them using non formal methods such as class works, portfolios, oral presentations, etc.
Assistive Technology: Specific equipment may need to be available to the student in an
examination. Such arrangements often include the use of personal computers, voice
activated software and screen readers.
87
Teachers’ assessment practices can also be affected by gender stereotypes. The issues of gender
bias and fairness in assessment are concerned with differences in opportunities for boys and
girls. A test is biased if boys and girls with the same ability levels tend to obtain different scores.
If the questions involve objects and ideas that are more familiar or less offensive to members of
one gender, then the test may be easier for individuals of that gender. Standards for achievement
on such a test may be unfair to individuals of the gender that is less familiar with or more
offended by the objects and ideas discussed, because it may be more difficult for such
individuals to demonstrate their abilities or their knowledge of the material.
Unit Summary
In this unit you have learned that ethics is a very important issue we have to follow in our
assessment practices. And the most important ethical consideration is fairness. If we are to draw
reasonably good conclusions about what our students have learned, it is imperative that we make
our assessments—and our uses of the results—as fair as possible for as many students as
possible. A fair assessment is one in which students are given equitable opportunities to
demonstrate their abilities and knowledge.
Teachers must make every effort to address and minimize the effect of bias in classroom
assessment practices. Biases in assessment can occur because of differences in culture or
ethnicity, disability as well as gender. To ensure suitability and fairness for all students, teachers
need to check the assessment strategy for its appropriateness and if there are cultural, disability
and gender biases.
Equitable assessment means that students are assessed using methods and procedures most
appropriate to them. Classroom assessment practices should be sensitive and diverse enough to
88
accommodate all types of diversity in the classroom in order to obtain accurate information about
their learning.
Self-Check Exercises
1. What is the meaning of fairness in assessment?
2. What are the basic ethical standards that teachers may consider in their assessment
practices?
3. How does culture and ethnicity influence teachers’ assessment practices?
4. What strategies can teachers to follow to address the special needs of disabled students
during tests?
89