0% found this document useful (0 votes)
26 views96 pages

FNL - Rev - Assessment and Evaluation of Learning

Uploaded by

hambisatiruneh6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views96 pages

FNL - Rev - Assessment and Evaluation of Learning

Uploaded by

hambisatiruneh6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 96

Ministry of Education

Assessment and Evaluation of Learning

December 2013
Addis Ababa

i
Ministry of Education
Module Title: Assessment and Evaluation of
Learning
Prepared by: Mekelle University
Module Writer: Yohannes Geretsadik
Technical Advisor: PRIN International Consultancy &
Research Services PLC

ii
Assessment and Evaluation of Learning

Course Code: PGDT 412

Credit hours: 3

Contact hours per week: 4 hours

iii
Table of Contents

Module Introduction...................................................................................................................................3

Module Learning Outcomes........................................................................................................................3

Module Contents.........................................................................................................................................4

Unit 1...........................................................................................................................................................5

Assessment: Concept, Purpose, and Principles...........................................................................................5

UNIT TWO..................................................................................................................................................26

ASSESSMENT STRATEGIES, METHODS, AND TOOLS..................................................................................26

UNIT 3........................................................................................................................................................61

ITEM ANALYSIS..........................................................................................................................................61

UNIT 4........................................................................................................................................................71

INTERPRETATION OF SCORES....................................................................................................................71

UNIT 5........................................................................................................................................................87

Ethical Standards of assessment...............................................................................................................87

iv
Icons Used

Dear Learner: the following icons are used throughout this module. Critically study what each icon
represents before using the module

This tells you there is an introduction to the module, unit and section.

This tells you there is a question to answer or think about in the text.

This tells you there is an activity to do.

This tells you to note and remember an important point.

This tells you there is a self-test for you to do

This tells you there is a checklist of the main points

This tells you there is written assignment

This tells you that these are the answers to the activities and self-test
questions.

This tells you that there are learning outcomes to the Module or Unit


v
Module Introduction

This module is designed to equip you, as prospective secondary school teachers, with the
conceptual and practical skills of assessing students learning. It incorporates descriptions of
important concepts that help to clarify the assessment process; elaboration of the major
principles and procedures of the assessment process; the different tools and strategies that can be
used for assessing students leaning; different mechanisms that are used to maintain the quality of
assessment tools and procedures; and the ethical standards of assessment. There are two
prerequisite courses that you need to take before learning this module: Secondary School
Curriculum & Instruction; and Psychological Foundations of Learning & Development.

Throughout this module there are different in-text questions which help you to pose your reading
for a moment and reflect on what you are studying. In addition, there are many activities that you
will come across (at least one in each section) and attempt before proceeding from one section to
the next section. Therefore, you need to seriously try to reflect on/answer each question and
activity if you are to have a deep and meaningful understanding of the concepts under discussion
and be successful learners. You will also complete two assignments and submit to your course
tutor that will be graded out of 30%.

Now wishing you a good and successful learning journey, you may start studying your module
right now.

Module Learning Outcomes 


This module is aimed at ensuring the following learning outcomes. On successful completion of
this module you will be able to:
 describe concepts related to student learning assessment
 develop classroom assessment plans
 Develop techniques for assessing the performance of students based on sound principles
and educational objectives

vi
 Apply proper procedures when administering assessment tools
 Analyze items to increase the fit for purpose of classroom assessment tools.
 Interpret assessment results to understand the implications and thereby make appropriate
decisions.
 Conduct self-assessment of their teaching in classrooms in view of student learning and
standards of teacher professionalism.
 Adhere to professional assessment ethical standards in assessing student learning,
handling records, using or communicating assessment results and making decisions.

vii
Unit 1: Assessment: Concept, Purpose, and Principles

Introduction

Welcome to the first unit of “Assessment and Evaluation” course module. This is an introductory
unit that is intended to familiarize you with some basic concepts that you will encounter while
studying this course. Specifically the concepts test, measurement, assessment and evaluation will
be elaborated. Following this, the purposes of educational assessment are described. Next there is
a brief explanation of the role of educational objectives in assessment. This unit also presents
you with the important principles that have to be adhered when assessing students’ learning.
Finally the importance of involving students in the assessment process is highlighted to be
followed by the most important competencies that professional teachers are expected to possess
so as to effectively assess their students.

Unit Learning Outcomes



Upon successful completion of this unit, you will be able to:

 Define the meaning of test, measurement, assessment and evaluation.

 Examine the purposes of assessment and evaluation of learning

 Identify the principles of assessment and evaluation of learning.

 Apply the principles of assessment and evaluation of learning in the local context.

1.1 Concepts

Dear learner, before you start studying educational assessment and evaluation, you need to have
a clear understanding about certain related concepts. You might have come across the concepts
test, measurement, assessment, & evaluation.

Reflection:
What do you know about these concepts? Can you differentiate these concepts? Please try.

You might have found it difficult to come up with a clear distinction in meaning between these
concepts. This is because of the fact that they are concepts which may be involved in a single

1
process. There is also some confusion and differences in usage of these concepts as manifested in
the literature. Now let us see the meaning of these concepts as used in this module.

Test: Perhaps test is a concept that you are more familiar with than the other concepts. You have
been taking tests ever since you have started schooling to determine your academic performance.
Tests are also used in work places to select individuals for a certain job vacancy. Thus test in
educational context is meant to the presentation of a standard set of questions to be answered by
students. It is one instrument that is used for collecting information about students’ behaviors or
performances. Please note that there are many other ways of collecting information about
students’ educational performances other than tests, such as observations, assignments, project
works, portfolios, etc.

Measurement: In our day to day life there are different things that we measure. We measure our
height and put it in terms of meters and centimeters. We measure some of our daily
consumptions like sugar in kilograms and liquids in liters. We measure temperature and express
it in terms of degree centigrade or degree Celsius. How do we measure these things? Well
definitely we need to have appropriate instruments such as a meter, a weighing scale, or a
thermometer in order to have reliable measurements.

Similarly, in education measurement is the process by which the attributes of a person are
measured and described in numbers. It is a quantitative description of the behavior or
performance of students. As educators we frequently measure human attributes such as attitudes,
academic achievement, aptitudes, interests, personality and so forth. Measurement permits more
objective description concerning traits and facilitates comparisons. Hence, to measure we have to
use certain instruments so that we can conclude that a certain student is better in a certain subject
than another student. How do we measure performance in mathematics? We use a mathematics
test which is an instrument containing questions and problems to be solved by students. The
number of right responses obtained is an indication of performance of individual students in
mathematics. Thus, the purpose of educational measurement is to represent how much of
‘something’ is possessed by a person using numbers. Note that we are only collecting
information. We are not evaluating! Evaluation is therefore quite different from measurement.
Measurement is not also that same as testing. While a test is an instrument to collect information

2
about students’ behaviors, measurement is the assignment of quantitative value to the results of a
test or other assessment techniques. Measurement can refer to both the score obtained as well as
the process itself.

Assessment: In educational literature the concepts ‘assessment’ and ‘evaluation’ have been used
with some confusion. Some educators have used them interchangeably to mean the same thing.
Others have used them as two different concepts. Even when they are used differently there is
too much overlap in the interpretations of the two concepts.

Cizek (in Phiye, 1997) provides us a comprehensive definition of assessment that incorporates its
key elements:

the planned process of gathering and synthesizing information relevant to the purposes of
(a) discovering and documenting students' strengths and weaknesses, (b) planning and
enhancing instruction, or (c) evaluating progress and making decisions about students.

Activity 1: Individual Activity

How do teachers collect the information about their students’ academic progress as well as
about their own teaching? Please list the tools as exhaustibly as possible.

Generally, educational assessment is viewed as the process of collecting information with the
purpose of making decisions about students. We may collect information using various
instruments including tests, observations of students, checklists, questionnaires and interviews.
Rowntree (1974) views assessment as a human encounter in which one person interacts with
another directly or indirectly with the purpose of obtaining and interpreting information about
the knowledge, understanding, abilities and attitudes possessed by that person. The key words in
the definition of assessment is collecting data and making decisions. Hence, to make decisions
one has to evaluate which is the process of making judgment about a given situation.

Evaluation: This concept refers to the process of judging the quality of student learning on the
basis of established performance standards and assigning a value to represent the worthiness or
quality of that learning or performance. It is concerned with determining how well they have
learned. When we evaluate, we are saying that something is good, appropriate, valid, positive,

3
and so forth. Evaluation is based on assessment that provides evidence of student achievement at
strategic times throughout the grade/course, often at the end of a period of learning.

Activity 2: Pair Activity

What types of decisions might teachers make based on the information they collect about the
learning and teaching process in general and students learning in particular? Please discuss on
this question with your colleague.

Evaluation includes both quantitative and qualitative descriptions of student behavior plus value
judgment concerning the desirability of that behavior. The following simple mathematical
arrangement shows the relationship between measurement and evaluation.

Evaluation = Quantitative description of students’ behavior (measurement) + qualitative


description of students’ behavior (non-measurement) + value judgment

Thus, evaluation may or may not be based on measurement (or tests) but when it is, it goes
beyond the simple quantitative description of students’ behavior. Evaluation involves judgment.
The quantitative values that we obtain through measurement will not have any meaning until
they are evaluated against some standards. Educators are constantly evaluating students and it is
usually done in comparison with some standard. For example, if the objective of the lesson is for
students to solve quadratic equations and if, having given them a test related to this objective, all
learners are able to solve at least 80% of the problems, then the teacher may conclude that his or
her teaching of the topic was quite successful.

So, we can describe evaluation as the comparison of what is measured against some defined
criteria and to determine whether it has been achieved, whether it is appropriate, whether it is
good, whether it is reasonable, whether it is valid and so forth. Evaluation accurately
summarizes and communicates to parents, other teachers, employers, institutions of further
education, and students themselves what students know and can do with respect to the overall
curriculum expectations.

Now, let’s summarize the differences and relationship between the four concepts. A test is a
particular type of assessment instrument that typically consists of sets of questions administered
during a fixed period of time under reasonably comparable conditions for all students.

4
Measurement is the assigning of numbers to the results of a test or other forms of assessment
according to a specific rule. Assessment is a much more comprehensive and inclusive concept
than testing and measurement. It includes the full range of procedures (observations, rating of
performances, paper and pencil tests, etc) used to gain information about students’ learning. It
may also include quantitative descriptions (measurement) and qualitative descriptions (non-
measurement) of students’ behaviors. Evaluation, on the other hand, consists of making
judgments about the level of students’ achievement for purposes of grading and accountability
and for making decisions about promotion and graduation. To make an evaluation, we need
information, and it is obtained by measuring using a reliable instrument.

Activity 3: Individual Activity

1) Have you ever heard or experienced with the process of assessment? What does the word
“assessment” mean to you?
3) Define the terms test, measurement, evaluation using your own terms.

1.2 Importance and Purposes of Assessment

One of the first things to consider when planning for assessment is its purpose. Who will use the
results? How will they use them? As prospective teachers, you also need to have a clear idea as
to what the purposes assessment serves. So let’s discuss on the following question:

Activity 4: Think-Pair-Share

In the previous section we have seen that assessment is the process of collecting information and
making decisions. Why do we need assessment in education? What do you think is the purpose of
assessment? Why do teachers assess their students? Reflect on these questions, write your ideas
on a piece of paper and share these ideas with your colleagues before you proceed on reading
the contents of this section.

Classroom assessment involves students and teachers in the continuous monitoring of students'
learning. It provides the staff with feedback about their effectiveness as teachers, and it gives
students a measure of their progress as learners. Through close observation of students in the
process of learning and the collection of frequent feedback on students' learning, teachers can
learn much about how students learn and, more specifically, how students respond to particular
teaching approaches. Classroom assessment helps individual teachers obtain useful feedback on
what, how much, and how well their students are learning. The staff can then use this

5
information to refocus their teaching to help students make their learning more efficient and
more effective.

Thus, based on the reasons for assessment described above, it can be summarized that
assessment in education focuses on:

• helping LEARNING, and;


• improving TEACHING.

With regards to the learner, assessment is aimed at providing information that will help us make
decisions concerning remediation, enrichment, selection, exceptionality, progress and
certification. With regard to teaching, assessment provides information about the attainment of
objectives, the effectiveness of teaching methods and learning materials.

Overall, assessment serves the following main purposes.

1) Assessment is used to inform and guide teaching and learning: A good classroom
assessment plan gathers evidence of student learning that informs teachers' instructional
decisions. It provides teachers with information about what students know and can do. To
plan effective instruction, teachers also need to know what the student misunderstands
and where the misconceptions lie. In addition to helping teachers formulate the next
teaching steps, a good classroom assessment plan provides a road map for students.
Students should, at all times, have access to the assessment so they can use it to inform
and guide their learning.

2) Assessment is used to help students set learning goals: Students need frequent
opportunities to reflect on where their learning is at and what needs to be done to achieve
their learning goals. When students are actively involved in assessing their own next
learning steps and creating goals to accomplish them, they make major advances in
directing their learning and what they understand about themselves as learners.

3) Assessment is used to assign report card grades: Grade reports provide parents,
employers, schools, and other stakeholders including the government, post-secondary
institutions and employers with summary information about student learning.

6
4) Assessment is used to motivate students: Research has shown that students will be
confident and motivated when they experience progress and achievement, rather than the
failure and defeat associated with being compared to more successful peers.

Activity 5: Group Activity


In small groups discuss the extent to which each of the purposes of assessment have been served
by the different assessment activities you have gone through while you were at your respective
universities and report the results of your discussions.

1.3. The Role of Educational Objectives in Assessment

Activity 6: Individual Activity


Based on what you have learned from previous courses, reflect on the following questions.

1) Define educational or learning objectives and learning outcomes


2) Describe “Bloom’s Taxonomy of Educational Objectives”.
3) Discuss the importance of educational objectives to the instructional process.

As you might remember from what you have learned in your “Secondary School curriculum and
Instruction” course, the first step in planning any good teaching is to clearly define the learning
objectives or outcomes. A learning objective is an outcome statement that captures specifically
what knowledge, skills, attitudes learners should be able to exhibit following instruction.
Defining learning objectives is also essential to the assessment of students’ learning. Effective
assessment practice requires relating the assessment procedures as directly as possible to the
learning objectives.

Instructional objectives which are commonly known as learning outcomes play a key role in both
the instructional process and the assessment process. They serve as guides for both teaching and
learning, communicate the intent of instruction to others, and provide guidelines for assessing
students learning.

Instructional objectives or learning outcomes are stated in terms of what the students are
expected to be able to do at the end of the instruction. For instance, after teaching them on how
to solve quadratic equations, we might expect students to have the skill of solving any quadratic

7
equation. A learning outcome stated in this way clearly indicates the kind of performance
students are expected to exhibit as a result of the instruction. This situation also makes clear the
intent of our instruction and sets the stage for assessing students learning. Well stated learning
outcomes make clear the types of students performance we are willing to accept as evidence that
the instruction has been successful.

Activity 7: Group activity

In small groups discuss on the following questions.

1. To what extent are the course objectives you learnt directly related to the
assessment types your instructors were using to measure your learning progress?
2. How frequently were your instructors assessing your progress to ensure
whether the objectives were achieved or not?
3. Have you ever thought of the objectives of the course(s) you learn during
the learning process and when you study in preparation for exams?

1.4. Principles of Assessment

Assessment principles consist of statements highlighting what are considered as critical elements
of a system designed to assess student progress. These principles are expressed in terms of
elements for a fair (reliable and valid) assessment system. Thus, each principle introduces an
issue that must be addressed when evaluating a student assessment system. Assessment
principles guide the collection of meaningful information that will help inform instructional
decisions, promote student engagement, and improve student learning.

Different educators and school systems have developed somehow different sets of assessment
principles. Miller, Linn and Grunland (2009) have identified the following general principles of
assessment.

1. Clearly specifying what is to be assessed has priority in the assessment process.


2. An assessment procedure should be selected because of its relevance to the characteristics
or performance to be measured.
3. Comprehensive assessment requires a variety of procedures.
4. Proper use of assessment procedures requires an awareness of their limitations.
5. Assessment is a means to an end, not an end in itself.

8
Perhaps the assessment principles developed by New South West Wales Department of
Education and Training (2008) in Australia are more inclusive than those principles listed by
other educators. Let us look at these principles and compare them with those developed by
Miller, Linn and Grunland as described above.

1. Assessment should be relevant. Assessment needs to provide information about


students’ knowledge, skills and understandings of the learning outcomes specified in the
syllabus.

2. Assessment should be appropriate. Assessment needs to provide information about the


particular kind of learning in which we are interested. This means that we need to use a
variety of assessment methods because not all methods are capable of providing
information about all kinds of learning. For example, some kinds of learning are best
assessed by observing students; some by having students complete projects or make
products and others by having students complete paper and pen tasks. Conclusions about
student achievement in an area of learning are valid only when the assessment method we
use is appropriate and measures what it is supposed to measure.

3. Assessment should be fair. Assessment needs to provide opportunities for every student
to demonstrate what they know, understand and can do. Assessment must be based on a
belief that all learners are on a path of development and that every learner is capable of
making progress. Students bring a diversity of cultural knowledge, experience, language
proficiency and background, and ability to the classroom. They should not be advantaged
or disadvantaged by such differences that are not relevant to the knowledge, skills and
understandings that the assessment is intended to address. Students have the right to
know what is assessed, how it is assessed and the worth of the assessment. Assessment
will be fair or equitable only if it is free from bias or favoritism.

4. Assessment should be accurate. Assessment needs to provide evidence that accurately


reflects an individual student’s knowledge, skills and understandings. That is,
assessments need to be reliable or dependable in that they consistently measure a
student’s knowledge, skills and understandings. Assessment also needs to be objective so
that if a second person assesses a student’s work, they will come to the same conclusion

9
as the first person. Assessment will be fair to all students if it is based on reliable,
accurate and defensible measures.

5. Assessment should provide useful information. The focus of assessment is to establish


where students are in their learning. This information can be used for both summative
purposes, such as the awarding of a grade, and formative purposes to feed directly into
the teaching and learning cycle.

6. Assessment should be integrated into the teaching and learning cycle. Assessment
needs to be an ongoing, integral part of the teaching and learning cycle. It must allow
teachers and students themselves to monitor learning. From the teacher perspective, it
provides the evidence to guide the next steps in teaching and learning. From the student
perspective, it provides the opportunity to reflect on and review progress, and can provide
the motivation and direction for further learning.

7. Assessment should draw on a wide range of evidence. Assessment needs to draw on a


wide range of evidence. A complete picture of student achievement in an area of learning
depends on evidence that is sampled from the full range of knowledge, skills and
understandings that make up the area of learning. An assessment program that
consistently addresses only some outcomes will provide incomplete feedback to the
teacher and student, and can potentially distort teaching and learning.

8. Assessment should be manageable. Assessment needs to be efficient, manageable and


convenient. It needs to be incorporated easily into usual classroom activities and it needs
to be capable of providing information that justifies the time spent.

Activity 8: Group Activity


In small groups discuss on the following questions.
1) Are there any similarities between the two sets of principles discussed above? What about
their differences?
2) What is the importance of each of these principles to the teaching and learning process?
3) List possible reasons that could hinder in applying the principles and identify solutions for
each constraint to make your assessment comprehensive and effective?
4) Based on your experiences, compare and contrast the extent each of these principles were
followed at secondary and university education levels.

10
1.5. Assessment and Some Basic Assumptions

Reflection:
 When planning to assess students, what are the assumptions that one held in mind? What are the
things that should be kept in mind when preparing assessment tools for assessing students?

Angelo and Cross (1993) have listed seven basic assumptions of classroom assessment which
are described as follows:

1. The quality of student learning is directly, although not exclusively related to the
quality of teaching. Therefore, one of the most promising ways to improve learning is
to improve teaching. If assessment is to improve the quality of students learning, both
teachers and students must become personally invested and actively involved in the
process.

Reflection: What should be the roles of students and teachers in classroom assessment
 so as it will help students’ learning?

2. To improve their effectiveness, teachers need first to make their goals and objectives
explicit and then to get specific, comprehendible feedback on the extent to which they
are achieving those goals and objectives. Effective assessment begins with clear goals.
Before teachers can assess how well their students are learning, they must identify and
clarify what they are trying to teach. After teachers have identified specific teaching goals
they wish to assess, they can better determine what kind of feedback to collect.

3. To improve their learning, students need to receive appropriate and focused


feedback early and often; they also need to learn how to assess their own learning.

 Reflection: How do you think feedback and self-assessment will help to improve
students’ learning?

4. The type of assessment most likely to improve teaching and learning is that
conducted by teachers to answer questions they themselves have formulated in
response to issues or problems in their own teaching. To best understand their students’
learning, teachers need specific and timely information about the particular individuals in

11
their classes. As a result of the different students’ needs, there is often a gap between
assessment and student learning. One goal of classroom assessment is to reduce this gap.

 Reflection: How does classroom assessment help to reduce this gap between
assessment and student learning?

5. Systematic inquiry and intellectual challenge are powerful sources of motivation,


growth, and renewal for teachers, and classroom assessment can provide such
challenge. Classroom assessment is an effort to encourage and assist those teachers who
wish to become more knowledgeable, involved, and successful.

6. Classroom assessment does not require specialized training; it can be carried out by
dedicated teachers from all disciplines. To succeed in classroom assessment, teachers
need only a detailed knowledge of the discipline, dedication to teaching, and the
motivation to improve.

7. By collaborating with colleagues and actively involving students in classroom


assessment efforts, teachers (and students) enhance learning and personal
satisfaction. By working together, all parties achieve results of greater value than those
they can achieve by working separately.

Reflection: Can you explain how teachers’ collaboration with colleagues can be more
 effective in enhancing learning and personal satisfaction than working alone?

1.6. Assessment, Learning, and the Involvement of Students

During teaching, you will be assessing students’ learning continuously. You will be interpreting
what the students say and do in order to make judgments about their achievements. The ability to
analyze the students’ learning is vital if you are to make appropriate teaching points which help
the students develop their knowledge and/or competence. You will be using your subject
knowledge to help you identify what to look for and where to take the student next. You will
need to listen, observe and question in ways which will enable you to give appropriate feedback
or further instruction.

There is considerable evidence that assessment is a powerful process for enhancing learning.
Black and Wiliam (1998) synthesized over 250 studies linking assessment and learning. From

12
this they came up with the finding that the intentional use of assessment in the classroom to
promote learning resulted in improved student achievement. Classroom assessment promotes
learning when teachers use it in the following ways:

 When they use it to become aware of the knowledge, skills, and beliefs that their
students bring to a learning task, and;

 When they use this knowledge as a starting point for new instruction, and monitor
students’ changing perceptions as instruction proceeds.

Activity 9: Group Activity


 In small groups discuss on the following issue.

 As prospective teachers, how do you think you will use the information you collect
through different methods of assessment to improve the teaching and learning process?

When learning is the goal, teachers and students collaborate and use ongoing assessment and
pertinent feedback to move learning forward. When classroom assessment is frequent and varied,
teachers can learn a great deal about their students. They can gain an understanding of students’
existing beliefs and knowledge, and can identify incomplete understandings, false beliefs, and
immature interpretations of concepts that may influence or distort learning. Teachers can observe
and probe students’ thinking over time, and can identify links between prior knowledge and new
learning.

Learning is also enhanced when students are encouraged to think about their own learning, to
review their experiences of learning and to apply what they have learned to their future learning.
Assessment provides the feedback loop for this process. When students (and teachers) become
comfortable with a continuous cycle of feedback and adjustment, students begin to internalize
the process of standing outside their own learning and considering it against a range of criteria,
not just the teacher’s judgment about quality or accuracy. When students engage in this ongoing
metacognitive experience, they are able to monitor their learning along the way, make
corrections, and develop a habit of mind for continually reviewing and challenging what they
know.

13
Assessment also enhances students’ learning by increasing their motivation. Motivation is
essential for students’ engagement in their learning. The higher the motivation, the more time
and energy a student is willing to devote to any given task. Even when a student finds the content
interesting and the activity enjoyable, learning requires sustained concentration and effort.

Reflection: How do you think assessment will help to increase students’ motivation?

According to current cognitive research, people are motivated to learn by success and
competence. When students feel ownership and have choice in their learning, they are more
likely to invest time and energy in it. Assessment can be a motivator, not through reward and
punishment, but by stimulating students’ intrinsic interest. Assessment can enhance student
motivation by:

• emphasizing progress and achievement rather than failure

• providing feedback to move learning forward

• reinforcing the idea that students have control over, and responsibility for, their own
learning

• building confidence in students so they can and need to take risks

• being relevant, and appealing to students’ imaginations

• providing the scaffolding that students need to genuinely succeed

Assessment is also an important instrument for implementing differentiated learning. Classes


consist of students with different needs, backgrounds, and skills. Each student’s learning is
unique. The contexts of classrooms, schools, and communities vary. As well, the societal
pressure for more complex learning for all students necessitates that teachers find ways to create
a wide range of learning options and paths, so that all students have the opportunity to learn as
much as they can, as deeply as they can, and as efficiently as they can.

When students learn, they make meaning for themselves, and they approach learning tasks in
different ways. They bring with them their own understanding, skills, beliefs, hopes, desires, and
intentions. It is important to consider each individual student’s learning, rather than talk about

14
the learning of “the class.” Assessment practices lead to differentiated learning when teachers
use them to gather evidence to support every student’s learning, every day in every class. The
learning needs of some students may require individualized learning plans.

There is strong evidence that involving students in the assessment process can have very definite
educational benefits. Now stop reading for a moment and reflect on the following questions.

Activity 10: Think-Pair-Share

1) As prospective teachers how do you think you can involve your students in the assessment
process?

2) In what ways can students benefit if they are involved in the assessment process?

One way in which we can involve our students in the assessment process is to establish the
standards or assessment criteria with them. This will help students understand what is to be
assessed. Working with students to develop assessment tools is a powerful way to help students
build an understanding of what a good product or performance looks like. It helps students
develop a clear picture of where they are going, where they are now and how they can close the
gap. This does not mean that each student creates his or her own assessment criteria. You, as a
teacher, have a strong role to play in guiding students to identify the criteria and features of
understandings you want your students to develop.

Another important aspect is to involve students in trying to apply the assessment criteria for
themselves. The evidence is that through trying to apply criteria, or mark using a model answer,
students gain much greater insight into what is actually being required and subsequently their
own work improves in the light of this.

An additional benefit is that it may enable the students to be provided with more learning
activities on which they will receive feedback which otherwise would not be provided because of
lack of time by the teacher.

There are different ways in which students can be involved in such type of assessment – self-
assessment and peer assessment. Self-assessment involves students judging their own work. It
begins with students understanding the learning intentions or objectives for the particular lesson
and the success criteria for the specific task or activity. It develops into students’ awareness of

15
their own strengths and weaknesses in a particular subject (and as a learner in general) and the
ability to identify their own ‘next steps’ or targets. Self-assessment allows students to think more
carefully about what they do and do not know, and what they additionally need to know to
accomplish certain tasks.

Peer assessment, by contrast, involves students making judgment about other students’ work.
Students learn how to make better sense of assessment criteria if they have to give feedback
and/or marks against them. Giving and receiving feedback is an important aspect of student
learning and will be valuable skills for them in professional contexts and for future learning.

1.7 Assessment and Teacher Professional Competence in Ethiopia

Assessment requires so much of a teachers professional time, both inside and outside the
classroom. Therefore, a teacher should have some basic competencies on classroom assessment
so as to be able to effectively assess his/her students learning.

Activity 11: Think-Pair-Share

As prospective teachers, what competencies do you think you should have in the area of
assessment? Write down your ideas and compare it with the work of another colleague.

A teacher's professional role and responsibilities for student assessment can be conceptualized as
falling along a time continuum. Assessment activities occur prior to instruction, during
instruction, and after instruction. Assessment prior to instruction provides a teacher with
information about individual differences among students as well as an understanding of the
background or prior knowledge of the class as a whole. These assessment activities provide the
basis for planning instruction.

Assessment during instruction provides information about the overall progress of the whole class
as well as specific information about individual students. These assessment activities provide the
basis for monitoring progress during learning.

Following the teaching of a specific unit, semester, academic year, or the like, decisions must be
made about the achievement of short and long-term instructional goals. This is assessment after
instruction.

16
In addition to these activities, communication skills are needed to interpret and report
performance standards or levels of achievement to students and parents.

In the American education system a list of seven standards for teacher competence in educational
assessment of students has been developed. These standards for teacher competence in student
assessment have been developed with the view that student assessment is an essential part of
teaching and that effective teaching cannot exist without appropriate student assessment. The
seven standards articulating teacher competence in the educational assessment of students are
described below.

1. Teachers should be skilled in choosing assessment options appropriate for instructional


decisions. They need to be well-acquainted with the kinds of information provided by a broad
range of assessment alternatives and their strengths and weaknesses. In particular, they should be
familiar with criteria for evaluating and selecting assessment methods in light of instructional
plans.

2. Teachers should be skilled in developing assessment methods appropriate for instructional


decisions. Assessment tools may be accurate and fair (valid) or invalid. Teachers must be able to
determine the quality of the assessment tools they develop.

3. Teachers should be skilled in administering, scoring, and interpreting the results of


assessment methods. It is not enough that teachers are able to select and develop good
assessment methods; they must also be able to apply them properly.

4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school improvement.

5. Teachers should be skilled in developing valid student grading procedures that use pupil
assessments. Grading students is an important part of professional practice for teachers.

6. Teachers should be skilled in communicating assessment results to students, parents, other lay
audiences, and other educators. Furthermore, teachers will sometimes be in a position that will
require them to defend their own assessment procedures and their interpretations of them. At
other times, teachers may need to help the public to interpret assessment results appropriately.

17
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment information. Teachers must be well-versed in their
own ethical and legal responsibilities in assessment. In addition, they should also attempt to have
the inappropriate assessment practices of others discontinued whenever they are encountered.

In our country, Ethiopia, also the MoE has developed such assessment related competences
which professional teachers are expected to possess.

Activity 12: Group Activity

In small groups:

1) Find Ethiopia’ standards of teacher competence in assessment from your nearby educational
organization (woreda, zone or regional education office) and compare competencies with that
of American standards and report the similarities and differences.

2) Discuss and report on the importance and use of having standards of teacher competence in
assessment for a particular school and the whole education system in general.

Unit Summary

 Test, measurement, assessment and evaluation are concepts that are frequently used in the
area of educational assessment and evaluation, often with varying meanings and some
confusion. However, although they overlap, they vary in scope and have different
meanings.
 Assessment serves many important purposes including: informing and guiding teaching
and learning; helping students set learning goals; assigning report card grades; motivating
students.
 Assessment should be designed in such a way that it will elicit information about students
progression towards the educational objectives.
 There are some important principles that professional teachers should be aware of that
guide the assessment process of students’ learning.
 Any assessment process is based on certain basic assumptions.
 Assessment is an integral process of the teaching and learning process and is an important
tool for enhancing learning.

18
 In order to maximize the benefits students can get out of assessment, they should be
involved in the assessment process.
 There are certain assessment competencies that teachers need to possess so as to
effectively carry out their professional responsibilities.

Self-Check Exercises

In order to check your understanding of what you have learned in this unit, answer the following
questions and compare your answers with what is discussed in the material. If you couldn’t
answer any one of these questions adequately you need to go back and study the material once
again where there is a problem of understanding.

1. Define the concepts test, measurement, assessment and evaluation.


2. Describe the main purposes of assessment.
3. Discuss the importance of assessment for learning and teaching.
4. What are the strategies that can be used to involve students in the assessment of
their learning?
5. What are the major assessment competencies that Ethiopia professional teachers
are expected to possess?

References

Angelo, T.A. & Cross, K.P (). Classroom Assessment Techniques; A Handbook for College
Teachers. 2nd Ed. San Francisco: Jossey-Bass Publishers.

Braun, H., Kanjee, A., Bettinger, E., and Kremer. M. (2006). Improving Education

Educational Testing Services. Linking Classroom Assessment with Student Learning.

Ellis, V. (Ed). (2007). Learning and Teaching in Secondary Schools. Third edition. Learning
Matters Ltd

McDonald E. S. & Hershman D. M. (2010). Classrooms that Spark! Recharge and Revive Your
Teaching. 2nd Ed. San Francisco: Jossey-Bass Publishers.

19
Mehrens, W.A. & Lehman, I.J () Measurement and Evaluation in Education. 4th Ed. New York:
Harcourt Brace College Publishers.

Miller, D.M, Linn, RL. & Grunland, NE. Measurement and Assessment in Teaching. 10th ed.
Upper Saddle River:Pearson Education, Inc.

NSW DEPARTMENT OF EDUCATION & TRAINING (2008). Principles of Assessment and


Reporting in NSW Public Schools

Phye, G. D. (ed). (1997). HANDBOOK OF Classroom Assessment Learning, Achievement, and


Adjustment. San Diago: Academic Press.

Spiller, D. (2009). Assessment: Feedback to promote student learning. Teaching Development|


Wāhanga Whakapakari Ako

Through Assessment, Innovation, and Evaluation. American Academy of Arts and Sciences

Western and Northern Canadian Protocol for Collaboration in Education. (2006). Rethinking
Classroom Assessment with Purpose in Mind: Assessment for Learning, Assessment as
Learning, and Assessment of Learning

20
Unit Two: Assessment Strategies, Methods, and Tools

2.1 Introduction

In the previous unit you have been introduced with the major concepts of educational assessment
and evaluation. You also learned about the purposes and principles of assessment. In this unit
you will learn about various assessment strategies that can be used in the context of secondary
education. You will also learn about planning, construction and administration of classroom
tests.

2.2 Learning Outcomes

At the end of this unit you should be able to:



 Identify relevant assessment strategies, methods and tools.

 Compare assessment and evaluation methods and tools to select appropriate ones.

 Develop formative and summative assessment tools as per the principles of


assessment.

 Complete tasks on linking formative with summative assessment

 Evaluate assessment tools by identifying relevant criteria.

2.3 Specific Contents

2.3.1. Types of assessment


There are different approaches in conducting assessment in the classroom. Here we are going to
see three pairs of assessment typologies: namely, formal vs. informal, criterion referenced vs.
norm referenced, formative vs. summative assessments.

1. Formative and Summative Assessments

Assessment procedures can be classified according to their functional role during classroom
instruction. One such classification system follows the sequence in which assessment procedures
are likely to be used in the classroom. The most commonly referred to and used categories in this
regard are formative assessment and summative assessment. Can you differentiate these
concepts? Please try to describe them before you proceed studying the following section.

21
a) Formative Assessment: Formative assessments are used to shape and guide classroom
instruction. They can include both informal and formal assessments (which will be discussed
later in this section) and help us to gain a clearer picture of where our students are and what
they still need help with. They can be given before, during, and even after instruction, as long
as the goal is to improve instruction.

Formative assessments are ongoing assessments, reviews, and observations in a classroom.


They serve a diagnostic function for both students and teachers. Students receive feedback
that they can use to adjust, improve their performance or other aspects of their engagement in
the unit such as study techniques. Teachers receive feedback on the quality of learners’
understandings and consequently, can modify their teaching approaches to provide
enrichment or remedial activities to more effectively guide learners. For example, if a teacher
observes that some students do not grasp a concept, he/she can design a review activity to
reinforce the concept or use a different instructional strategy to re-teach it. Teachers can
conduct formative assessment at any point in a unit of study.

Formative assessment is also known by the name ‘assessment for learning’. The basic idea of
this concept is that the basic purpose of assessment should be to enhance students learning.

There is still another name which is associated with the concept of formative assessment,
‘continuous assessment’. Continuous assessment (as opposed to terminal assessment) is
based on the premise that if assessment is to help students’ improvement in their learning and
if a teacher is to determine the progress of students towards the achievement of the learning
goals, it has to be conducted on a continuous basis. Thus, continuous assessment is a teaching
approach as well as a process of deciding to what extent the educational objectives are
actually being realized during instruction. In schools, continuous assessment of learning is
usually carried out by teachers on the basis of impressions gained as they observe their
students at work or by various kinds of tests giving periodically. Therefore, each decision is
based on various types of information that are determined through different assessment
methods at different time by teachers.

22
In order to assess your students' understanding, there are various strategies that you can use.
Can you mention some of the strategies that you can use to assess your students for formative
purposes? Please, try to mention as many strategies as you can.

The following are some of the strategies of assessment you can employ in your classrooms:

o You can make your students write their understanding of vocabulary or concepts
before and after instruction.
o You can ask students to summarize the main ideas they've taken away from your
presentation, discussion, or assigned reading.

o You can make students complete a few problems or questions at the end of
instruction and check answers.

o You can interview students individually or in groups about their thinking as they
solve problems.

o You can assign brief, in-class writing assignments (e.g., "Why is this person or
event representative of this time period in history?)

Tests and homework can also be used formatively if teachers analyze where students are in
their learning and provide specific, focused feedback regarding performance and ways to
improve it.

1. Summative Assessment: Summative assessment typically comes


at the end of a course (or unit) of instruction. It evaluates the quality of students’ learning
and assigns a mark to that students’ work based on how effectively learners have addressed
the performance standards and criteria. Assessment tasks conducted during the progress of a
semester may be regarded as summative in nature if they only contribute to the final grades
of the students.

The techniques used in summative assessment are determined by the instructional goals.
Typically, however, they include teacher made achievement tests, ratings of various types of
performance, and assessment of products (reports, drawings, etc.).

23
A particular assessment task can be both formative and summative. For example, students
could complete unit 1 of their Module and complete an assessment task for which they
earned a mark that counted towards their final grade. In this sense, the task is summative.
They could also receive extensive feedback on their work. Such feedback would guide
learners to achieve higher levels of performance in subsequent tasks. In this sense, the task
is formative – because it helps students form different approaches and strategies to improve
their performance in the future.

2. Formal and Informal Assessment

Assessment can also be either formal or informal. Let us try to understand their differences
from the following paragraphs.

Formal Assessment: This usually implies a written document, such as a test, quiz, or paper.
A formal assessment is given a numerical score or grade based on student performance. We
will deal more on formal assessment strategies, particularly on tests in a letter section.

Informal Assessment: "Informal" is used here to indicate techniques that can easily be
incorporated into classroom routines and learning activities. Informal assessment techniques
can be used at anytime without interfering with instructional time. Their results are
indicative of the student's performance on the skill or subject of interest. Can you think of
the informal assessment strategies that you can use in your classes? What informal
assessment strategies have your teachers used when you were a student?

An informal assessment usually occurs in a more casual manner and may include
observation, inventories, checklists, rating scales, rubrics, performance and portfolio
assessments, participation, peer and self evaluation, and discussion. Formal tests assume a
single set of expectations for all students and come with prescribed criteria for scoring and
interpretation. Informal assessment, on the other hand, requires a clear understanding of the
levels of ability the students bring with them. Only then may assessment activities be
selected that students can attempt reasonably. Informal assessment seeks to identify the
strengths and needs of individual students without regard to grade or age norms.

24
Methods for informal assessment can be divided into two main types: unstructured (e.g.,
student work samples, journals) and structured (e.g., checklists, observations). The
unstructured methods frequently are somewhat more difficult to score and evaluate, but they
can provide a great deal of valuable information about the skills of the students. Structured
methods can be reliable and valid techniques when time is spent creating the "scoring"
procedures. Another important aspect of informal assessments is that they actively involve
the students in the evaluation process - they are not just paper-and-pencil tests.

3. Criterion-referenced and Norm-referenced Assessments

How the results of tests and other assessment procedures are interpreted also provides a
method of classifying these instruments. There are two ways of interpreting student
performance – criterion-referenced and norm-referenced.

b) Criterion-referenced Assessment: This type of assessment allows us to quantify the


extent students have achieved the goals of a unit of study and a course. It is carried out
against previously specified criteria and performance standards. Where a grade is assigned,
it is assigned on the basis of the standard the student has achieved on each of the criteria.
This type of assessment is most appropriate for quickly assessing what concepts and skills
students have learned from a segment of instruction. Criterion referenced classrooms are
mastery-oriented, informing all students of the expected standard and teaching them to
succeed on related outcome measures. Criterion referenced assessments help to eliminate
competition and may improve cooperation.

2. Norm-referenced Assessment: This type of assessment has as its


end point the determination of student performance based on a position within a cohort of
students – the norm group. This type of assessment is most appropriate when one wishes to
make comparisons across large numbers of students or important decisions regarding
student placement and advancement. For example, students’ results in grade 8 national
exams in our country are determined based on their relative standing in comparison to all
other students who have taken the exam. Thus, when we say that a student has scored 80
percentile, it doesn’t mean that the student has scored an average of 80% score. Rather it is
meant to be that the student’s average score stands above 79.9% of the students, and the

25
remaining 20% of students have scored above that particular student. Students’ assignment
of ranks is also another example of norm-referenced interpretation of students’
performances.

To summarize, the criterion-referenced assessment emphasizes description of student’s


performance, and the norm-referenced assessment emphasizes discrimination among
individual students in terms of relative level of learning.

2.3.2. Assessment Strategies


Assessment strategy refers to those assessment tasks (methods/approaches/activities) in which
students are engaged to ensure that all the learning objectives of a subject, a unit or a lesson have
been adequately addressed. Assessment strategies range from informal, almost unconscious,
observation to formal examinations. Although different subject areas may have some differences
on the assessment strategies they use, generally, however, there are variety of methods that can
be used by most subjects.

When selecting assessment strategies in our subject areas, there are a number of things that we
have to consider. First and foremost it is important that we choose the assessment technique
appropriate for the particular behavior being assessed. We have to use a strategy that can give
students an opportunity to demonstrate the kind of behavior that the learning outcome demands.
Assessment strategies should also be related to the course material and relevant to students’
lives. Therefore, we have to provide assessment strategies that relate to students’ future work.

There are many different ways to categorize learning goals for students. Categorizing helps us to
thoroughly think through what we want students to know and be able to do. One way in which
the different learning outcomes that we want out students to develop can be categorized is
presented as follows:
 Knowledge and understanding: What facts do students know outright? What
information can they retrieve? What do they understand?
 Reasoning proficiency: Can students analyze, categorize, and sort into component
parts? Can they generalize and synthesize what they have learned? Can they evaluate
and justify the worth of a process or decision?

26
 Skills: We have certain skills that we want students to master such as reading fluently,
working productively in a group, making an oral presentation, speaking a foreign
language, or designing an experiment.
 Ability to create products: Another kind of learning target is student-created
products - tangible evidence that the student has mastered knowledge, reasoning, and
specific production skills. Examples include a research paper, a piece of furniture, or
artwork.
 Dispositions: We also frequently care about student attitudes and habits of mind,
including attitudes toward school, persistence, responsibility, flexibility, and desire to
learn.

Activity: In groups discuss and identify the assessment strategies that you consider are best for
assessing each of these categories of learning goals and compare your work with that of other
groups.

From among the various assessment strategies that can be used by classroom teachers, some are
described below for your consideration as student teachers.

Classroom presentations: A classroom presentation is an assessment strategy that requires


students to verbalize their knowledge, select and present samples of finished work, and organize
their thoughts about a topic in order to present a summary of their learning. It may provide the
basis for assessment upon completion of a student’s project or essay. For example students can
be made to present a report after an educational visit. What other educational activities can you
imagine in your subject area where students can present their works?

Conferences: A conference is a formal or informal meeting between the teacher and a student
for the purpose of exchanging information or sharing ideas. A conference might be held to
explore the student’s thinking and suggest next steps; assess the student’s level of understanding
of a particular concept or procedure; and review, clarify, and extend what the student has already
completed. What advantages do you think conference as a method of assessment will have?

Exhibitions/Demonstrations: An exhibition/demonstration is a performance in a public setting,


during which a student explains and applies a process, procedure, etc., in concrete ways to show

27
individual achievement of specific skills and knowledge. What type of objectives do you think
this assessment strategy could serve to measure?

Interviews: You should be familiar with the interviews journalists conduct with different
personalities. An interview can also be used for assessment purposes in educational settings. In
such applications interview is a face-to-face conversation in which teacher and student use
inquiry to share their knowledge and understanding of a topic or problem. This form of
assessment can be used by the teacher to:
 explore the student’s thinking;
 assess the student’s level of understanding of a concept or procedure; and
 gather information, obtain clarification, determine positions, and probe for motivations.
Observation: Observation is a process of systematically viewing and recording students while
they work, for the purpose of making instruction decisions. Observation can take place at any
time and in any setting. It provides information on students' strengths and weaknesses, learning
styles, interests, and attitudes. Observations may be informal or highly structured, and incidental
or scheduled over different periods of time in different learning contexts.
Performance tasks: During a performance task, students create, produce, perform, or present
works on "real world" issues. The performance task may be used to assess a skill or proficiency,
and provides useful information on the process as well as the product. Please mention some
examples of performance tasks that students can do in your subject area.

Portfolios: A portfolio is a collection of samples of a student’s work over time. It offers a visual
demonstration of a student’s achievement, capabilities, strengths, weaknesses, knowledge, and
specific skills, over time and in a variety of contexts. For a portfolio to serve as an effective
assessment instrument, it has to be focused, selective, reflective, and collaborative. Portfolios can
be prepared for different subjects in any educational level. What type of materials can be
included in the portfolio of students in relation to your subject?

Attention: At this point of your study, you will be required to start filing
samples of your work (those that are indicated) as part of your portfolio to
serve as evidence of your performance on this course.

28
Questions and answers: Perhaps, this is a widely used strategy by teachers with the intention of
involving their students in the learning and teaching process. In this strategy, the teacher poses a
question and the student answers verbally, rather than in writing. This strategy helps the teacher
to determine whether students understand what is being, or has been, presented; it also helps
students to extend their thinking, generate ideas, or solve problems. Strategies for effective
question and answer assessment include:
 Apply a wait time or 'no hands-up rule' to provide students with time to think after a
question before they are called upon randomly to respond.
 Ask a variety of questions, including open-ended questions and those that require more
than a right or wrong answer.
During what time of the lesson do you think question and answer strategy will be more useful?
Why?

Students’ self-assessments: Self-assessment is a process by which the student gathers


information about, and reflects on, his or her own learning. It is the student’s own assessment of
personal progress in terms of knowledge, skills, processes, or attitudes. Self-assessment leads
students to a greater awareness and understanding of themselves as learners.

Checklists, Rating Scales and Rubrics: These are tools that state specific criteria and allow
teachers and students to gather information and to make judgments about what students know
and can do in relation to the outcomes. They offer systematic ways of collecting data about
specific behaviors, knowledge and skills.

Checklists usually offer a yes/no format in relation to student demonstration of specific criteria.
They may be used to record observations of an individual, a group or a whole class.

Rating Scales allow teachers to indicate the degree or frequency of the behaviors, skills and
strategies displayed by the learner. Rating scales state the criteria and provide three or four
response selections to describe the quality or frequency of student work.

Rubrics use a set of criteria to evaluate a student's performance. They consist of a fixed
measurement scale and detailed description of the characteristics for each level of performance.
These descriptions focus on the quality of the product or performance and not the quantity.

29
Rubrics use a set of specific criteria to evaluate student performance. They may be used to assess
individuals or groups and, as with rating scales, may be compared over time.

The purpose of checklists, rating scales and rubrics is to:


 provide tools for systematic recording of observations
 provide tools for self-assessment
 provide samples of criteria for students prior to collecting and evaluating data on their
work
 record the development of specific skills, strategies, attitudes and behaviours necessary
for demonstrating learning
 clarify students' instructional needs by presenting a record of current accomplishments.
In what specific instances can these assessment strategies (rating scales, checklists and
rubrics) used in your area of study? Think of specific examples and share your ideas with
your colleague.

One- Minute paper: During the last few minutes of the class period, you may ask students to
answer on a half-sheet of paper: "What is the most important point you learned today?" and,
"What point remains least clear to you?" The purpose is to obtain data about students'
comprehension of a particular class session. Then you can review responses and note any useful
comments. During the next class periods you can emphasize the issues illuminated by your
students' comments.

Muddiest Point: This is similar to ‘One-Minute Paper’ but only asks students to describe what
they didn't understand and what they think might help. It is an important technique that will help
you to determine which key points of the lesson were missed by the students. Here also you have
to review before next class meeting and use to clarify, correct, or elaborate.

Student- generated test questions: You may allow students to write test questions and model
answers for specified topics, in a format consistent with course exams. This will give students
the opportunity to evaluate the course topics, reflect on what they understand, and what good test
items are. You may evaluate the questions and use the goods ones as prompts for discussion.

30
Tests: This is the type of assessment that you are mostly familiar with. A test requires students
to respond to prompts in order to demonstrate their knowledge (orally or in writing) or their
skills (e.g., through performance). We will learn much more about tests later in this section.

Activity: Let’s say you need to assess student achievement on each of the following learning
targets. Which assessment strategy would you choose? Please jot down your answers with their
justifications and file it in your portfolio for later reference.
1. Ability to write clearly and coherently
2. Group discussion proficiency
3. Reading comprehension
4. Proficiency using specified mathematical procedures
5. Proficiency conducting investigations in science

2.3.3. Assessment in large classes


It is quite obvious that student numbers in a class limit the teaching methods available to
teachers. Similarly, assessment methods are restricted by class size. Due to time and resources
constraints, teachers often use less time-demanding assessment methods which however, may
not always optimize student learning.
Activity:- What problems do you think teachers will face when assessing students in large
classes? In your school years, what problems have you observed in assessment as a result of
large class size? Please discuss on these questions in groups.

The existing educational literature has identified various assessment issues associated with large
classes. They include:
a) Surface Learning Approach: Traditionally, teachers rely on time-efficient and exam-
based assessment methods for assessing large classes, such as multiple choices and short
answer question examinations. These assessments often only assess learning at the lower
levels of intellectual complexity. Furthermore, students tend to adopt a surface rote
learning approach when preparing for these kinds of assessment methods. Higher level
learning such as critical thinking and analysis are often not fully assessed.

b) Feedback is often inadequate: Feedback plays an important role in the learning process of
students. Particularly, if students can receive feedback at an early stage of their learning
process, this will help them identify their own problems and improve their learning.
However, with a large class, teachers may not have time to give detailed and constructive

31
feedback to every student. Most teachers usually can only afford to give general feedback to
their students on written assignments and tests.

c) Inconsistency in marking: Large class usually consists of a diverse and complex group of
students. The issues of different perception towards assessments, cultural and educational
background, prior knowledge and level of interest to the subject all pose challenges to the
fairness of marking and grading. Teachers have to take all these into account in order to
ensure the consistency and fairness in marking and grading.

d) Difficulty in monitoring cheating and plagiarism: Plagiarism is another challenge in


assessing large classes. Some students deliberately cheat in large classes because they think
that they are less likely to be identified within a large group. In addition, as teachers usually
have a heavy workload and tight marking schedule, they do not have enough time to
thoroughly check the work submitted by their students. To minimize plagiarism, assessment
tasks must be well thought and well-designed.

e) Lack of interaction and engagement: Students are often not motivated to engage in a
large-sized lecture. When teachers raise questions in large classes, not many students are
willing to respond. Students are less likely to interact with teachers because they feel less
motivated and tend to hide themselves in a large group. In fact, interacting with students in
class is important for teachers because they can receive immediate feedback from students
regarding their quality of teaching.

Although these issues can be problems in assessment for any class size, they are worse in large
classes because of the additional limitation and strain on resources. They are problems that are
applicable whether the function of the assessment is to facilitate learning via feedback, or to
classify students via grading.

There are a number of ways to make the assessment of large numbers of students more effective
whilst still supporting effective student learning. These include:
1. Front ending: The basic idea of this strategy is that by putting in an increased effort at the
beginning in setting up the students for the work they are going to do, the work submitted can

32
be improved. Therefore the time needed to mark it is reduced (as well as time being saved in
less requests for tutorial guidance).
2. Making use of in-class assignments: In-class assignments are usually quick and therefore
relatively easy to mark and provide feedback on, but help you to identify gaps in
understanding. Students could be asked to complete a task within the timeframe of a
scheduled lecture, field exercise or practical class. This might be a very quick task, for
example, completing a graph, doing some calculations, answering some quick questions,
making brief notes on a piece of text etc. In some cases it might be possible to merge the in-
class assignment with peer assessment.
3. Self-and peer-assessment: Students can perform a variety of assessment tasks in ways,
which both save the tutor’s time and bring educational benefits, especially the development
of their own judgment skills. These include self assessment and peer assessment strategies.
i. Self-assessment reduces the marking load because it ensures a higher quality of work is
submitted, thereby minimizing the amount of time expended on marking and feedback.
The emphasis on student self- assessment represents a fundamental shift in the teacher-
student relationship, placing the primary responsibility for learning with the student.
However, there are problems involved in self-assessment for grading purposes pertaining
to their validity and reliability. If self-assessment is utilized for the purposes of grading, it
is imperative to employ peer or staff cross-marking to ensure the validity of the results.
Self-assessment should also be confined to certain limited objectives such as ascertaining
whether all of the required components of an answer are present, or the articulation of
very transparent assessment criteria and standards, possibly accompanied by examples of
work of varying standards. In this regard, self-assessment can decrease the marking load
of teachers and provide students with a positive learning experience by compelling them
to examine their work from the perspective of a marker as well as a participant.

ii. In a similar fashion to self-assessment, peer-assessment can provide useful learning


experiences for students at the same time as reducing the marking load of staff. The use
of peer-assessment can be an effective way of ensuring students get individual feedback
that staff may be very busy to provide in a timely manner given the class numbers
involved. This could involve providing students with answer sheets or model answers to a
piece of coursework that you had set them previously and then requiring students to

33
undertake the marking of those assignments in class. For example, you can ask students
to exchange works with one another or collect in all of the named works and randomly
assign student markers to them.

However, as with any form of peer-assessment it needs to be carefully designed. Students need
to know what to do and there needs to be a transparent system by which students can appeal their
marks (especially if used in a summative rather than formative context). The benefits of this
approach are that:

• students can get to see how their peers have tackled a particular piece of work,

• they can see how you would assess the work (e.g. from the model answers/answer
sheets you've provided) and;

• they are put in the position of being an assessor, thereby giving them an
opportunity to internalize the assessment criteria.

4. Group Assessments: The most obvious advantage of group-based assessment is that it


significantly reduces the marking load if the group submits only one piece of assessable
work. The major problem of course is that group members may not contribute equally, so
how are they to be rewarded fairly? There is probably no easy solution to this but there is a
range of possible strategies which may go at least some way to addressing the problem.

5. Changing the assessment method, or at least shortening it : Being faced with large
numbers of students will present challenges but may also provide opportunities to
either modify existing assessments or to explore new methods of assessment. You might, for
example, be able to reduce the length of the assessment task you are currently using without
detracting from your module's learning outcomes. Alternatively a large class may provide a
new opportunity to make use of peer and self-assessment.

34
Assignment: Visit any one of the schools in your vicinity and interview at least three teachers in
your subject area using questions you have prepared for the purpose. The questions should be
related to 1) the problems they have faced in assessing students of large classes; and 2) the
strategies they have used to tackle the problems. Based on the information you have collected
prepare a report of 1-2 pages. You have to file the report as part of your portfolio.

2.3.4. Selecting and developing assessment methods and tools


One of the most difficult tasks for most teachers is assessing the performance of their students,
determining to what extent each individual student has attained the level of mastery defined by
the course outcomes. Students’ learning behaviors will be determined by the examinations you
administer. If your goal is to establish an active learning environment in which students are
expected to learn the facts and learn to apply them, but your exams test only the students' ability
to recite memorized facts, you will have made it unlikely that the students will engage in
meaningful learning. Thus, the assessment tools that you employ in your subject area have an
enormously important role in determining how and what students will learn. Thus, the process of
assessing student performance in your subject area must begin with a close look at your
educational outcomes. What you will discover is that some outcomes are relatively easy to
assess, while other outcomes are very much harder to test.

Activity: List all of the forms of assessment that you have experienced during your school years.
Are there other approaches to assessment with which you are familiar even if you haven't
personally experienced them as a student?

A wide variety of tools are available for assessing student performance and there are approaches
that are suitable for essentially any educational objective you want to test. Examples include
objective exams, short answer and essay exams, portfolios, projects, practical exams,
presentations, and combinations of these. Appropriate tools or combinations of tools must be
selected and used if the assessment process is to successfully provide information relevant to
stated educational outcomes.

Constructing Tests
There are a wide variety of styles & formats for writing test items. Miller, Linn, & Gronlund
(2009) make distinctions between classroom tests that consist of objective test items and
performance assessments that require students to construct responses (e.g. write an essay) or

35
perform a particular task (e.g., measure air pressure). Objective tests are highly structured and
require the test taker to select the correct answer from several alternatives or to supply a word or
short phrase to answer a question or complete a statement. They are called objective because
they have a single right or best answer that can be determined in advance. Performance
assessment tasks permit the student to organize and construct the answer in essay form. Other
types of performance assessment tasks may require the student to use equipment, generate
hypothesis, make observations, construct something or perform for an audience. For most
performance assessment tasks, there is not a single best or right response. Expert judgment is
required to score the performances.

Constructing Objective Test Items


There are various types of objective test items. These can be classified into those that require the
student to supply the answer (supply type items) and those that require the student to select the
answer from a given set of alternatives (selection type items). Supply type items include
completion items and short answer questions. Selection type test items include True/False,
multiple choice and matching.

Each type of test has its unique characteristics, uses, advantages, limitations, and rules for
construction.

Activity: 1) As students you have taken tests with different types of formats – Multiple choice
test items, True/False test items, short answer test items, etc. Which of this test
items did you feel more comfortable with? What are your reasons? Write down your
answers and compare it with that of your friends.
2) In groups discuss the advantages and limitations of the different types of test items.
Present the results of your discussion to the whole class.

True/False Test Items


I am quite sure that you are familiar with true/false test items and therefore it may not be
necessary to describe what it is. Therefore, I will focus on the characteristics of such type of test
items and present you with some guidelines that can help in constructing better true/false items.

The chief advantage of true/false items is that they do not require the student much time for
answering. This allows a teacher to cover a wide range of content by using a large number of
such items. In addition, true/false test items can be scored quickly, reliably, and objectively by

36
any body using an answer key. If carefully constructed, true/false test items have also the
advantage of measuring higher mental processes of understanding, application and interpretation.

The major disadvantage of true/false items is that when they are used exclusively, they tend to
promote memorization of factual information: names, dates, definitions, and so on. Some argue
that another weakness of true/false items is that they encourage students for guessing. This is
because any student who takes such type of tests does have a 50 percent probability of getting
the right answer. In addition true/false items:
 Can often lead a teacher to write ambiguous statements due to the difficulty of writing
statements which are clearly true or false
 Do not discriminate b/n students of varying ability as well as other test items
 Can often include more irrelevant clues than do other item types
 Can often lead a teacher to favour testing of trivial knowledge

The following suggestions might perhaps help teachers to construct good quality true/false test
items.
 Avoid negative statements, and never use double negatives. In Right-Wrong or True-False
items, negatively phrased statements make it needlessly difficult for students to decide
whether that statement is accurate or inaccurate.
 Restrict single-item statements to single concepts. If you double-up two concepts in a
single item statement, how does a student respond if one concept is accurate and the other
isn’t? Take a look at this confusing item:
 Use an approximately equal number of items, reflecting the two categories tested. If you
typically overbook on false items in your True-False tests, students who are totally at sea
about an item will be apt to opt for a false answer and will probably be correct.
 Make statements representing both categories equal in length. Again, to avoid giving
away the correct answers, don’t make all your false statements brief and (in an effort to
include necessary qualifiers) make all your true statements long. Students catch on quickly to
this kind of test-making tendency.

Matching Items
A matching item consists of two lists of words or phrases. The test-taker must match components
in one list (the premises, typically presented on the left) with components in the other list (the

37
responses, typically presented on the right), according to a particular kind of association
indicated in the item’s directions.

Like True-False items, matching items can cover a good deal of content in an efficient fashion.
They are a good choice if you’re interested in finding out if your students have memorized
factual information. Matching items sometimes can work well if you want your students to cross-
reference and integrate their knowledge regarding the listed premises and responses.

The major advantage of matching items is its compact form, which makes it possible to measure
a large amount of related factual material in a relatively short time. Another advantage is its ease
of construction.

The main limitation of matching test items is that they are restricted to the measurement of
factual information based on rote learning. Another limitation is the difficulty of finding
homogenous material that is significant from the perspective of the learning outcomes. As a
result test constructors tend to include in their matching items material which is less significant.

The following suggestions are important guidelines for the construction of good matching items.
 Use fairly brief lists, placing the shorter entries on the right. If the premises and
responses in a matching item are too long, students tend to lose track of what they originally
set out to look for. The words and phrases that make up the premises should be short, and
those that make up the responses should be shorter still.
 Employ homogeneous lists. Both the list of premises and responses must be composed of
similar sorts of things. If not, an alert student will be able to come up with the correct
associations simply by “elimination” because some entries in the premises or responses may
clearly be noticeable from the others.
 Include more responses than premises. If you use the exact same number of responses as
premises in a matching item, then a student who knows half or more of the correct
associations is in a position to guess the rest of the associations with very good chances.
 List responses in a logical order. This rule is designed to make sure you don’t accidentally
give away hints about which responses connect with which premises. Choose a logical
ordering scheme for your responses (say, alphabetical or chronological) and stick with it.

38
 Describe the basis for matching and the number of times a response can be used. To
satisfy this rule, you need to make sure your test’s directions clarify the nature of the
associations you want students to use when they identify matches. Regarding the student’s
use of responses, a phrase such as the following is often employed: “Each response in the list
at the right may be used once, more than once, or not at all.”
 Try to place all premises and responses for any matching item on a single page. This
rule’s intent is to free your students from lots of potentially confusing flipping back and forth
in order to accurately link responses to premises.

Short Answer/Completion Test Items

The short-answer items and completion test items are essentially the same that can be answered
by a word, phrase, number or formula. They differ in the way the problem is presented. The short
answer type uses a direct question, where as the completion test item consists of an incomplete
statement requiring the student to complete. This can be demonstrated by the following
examples:

Short answer item: In which year did the Ethiopians defeat the Italian invaders at Adwa?

Completion item: The Ethiopian forces defeated the Italian invaders at Adwa in the year _____.

The short-answer test items are one of the easiest to construct, partly because of the relatively
simple learning outcomes it usually measures. Except for the problem-solving outcomes
measured in Mathematics and Science, it is used almost exclusively to measure the recall of
memorized information.

A more important advantage of the short-answer item is that the students must supply the
answer. This reduces the possibility that students will obtain the correct answer by guessing.
They must either recall the information requested or make the necessary computations to solve
the problem presented to them. Partial knowledge, which might enable them to choose the
correct answer on a selection item, is insufficient for answering a short answer test item
correctly.

There are two limitations cited in the use of short-answer test items. One is that they are
unsuitable for assessing complex learning outcomes. The other is the difficulty of scoring. This is

39
especially true where the item is not clearly phrased to require a definitely correct answer and the
student’s spelling ability.

The following suggestions will help to make short-answer type test items to function as intended.
 Word the item so that the required answer is both brief and specific.
Example: An animal that eats the flesh of other animals is _____. Poorly stated
An animal that eats the flesh of other animals is classified as _____. Better item
 Do not take statements directly from textbooks to use as a basis for short-answer items.
When taken out of context, such statements are frequently too general and ambiguous to
serve as good short-answer items.
 A direct question is generally more desirable than an incomplete statement.
 If the answer is to be expressed in numerical units, indicate the type of answer wanted.
For computational problems, it is usually preferable to indicate the units in which the
answer is to be expressed.

Multiple-Choice Items
This is the most popular type of selected-response item. It can effectively measure many of the
simple learning outcomes measured by the the short-answer item, the true-false item, and the
matching item types. In addition, it can measure a variety of complex cognitive learning
outcomes.

A multiple-choice item consists of a problem and a list of suggested solutions. A student is first
given either a question or a partially complete statement. This part of the item is referred to as
the item’s stem. Then three or more potential answer-options are presented. These are usually
called alternatives, choices or options.

There are two important variants in a multiple-choice item:


(1) whether the stem consists of a direct question or an incomplete statement, and
(2) whether the student’s choice of alternatives is supposed to be a correct answer or a best
answer.

A key advantage of the multiple-choice item is its widespread applicability to the assessment of
cognitive skills and knowledge, as well as to the measurement of students’ affect.. Another
advantage of multiple-choice items is that it’s possible to make them quite varied in the levels of

40
difficulty they possess. Cleverly constructed multiple-choice items can present very high-level
cognitive challenges to students. And, of course, as with all selected-response items, multiple-
choice items are fairly easy to score.

The key weakness of multiple-choice items is that when students review a set of alternatives for
an item, they may be able to recognize a correct answer that they would never have been able to
generate on their own. In that sense, multiple-choice items can present an exaggerated picture of
a student’s understanding or competence, which might lead teachers to invalid inferences.

Another serious weakness, one shared by all selected-response items, is that multiple-choice
items can never measure a student’s ability to creatively synthesize content of any sort. Finally,
in an effort to come up with the necessary number of plausible alternatives, novice item-writers
sometimes toss in some alternatives that are obviously incorrect.

Well-constructed multiple-choice items, when deployed along with other types of items, can
make a genuine contribution to a teacher’s assessment arsenal. Here are some useful rules for
you to follow.
 The question or problem in the stem must be self-contained. The stem should contain
as much of the item’s content as possible, thereby rendering the alternatives much shorter
than would otherwise be the case.
 Avoid negatively stated stems. Just as with the True/False items, negatively stated stems
can create genuine confusion in students.
 Each alternative must be grammatically consistent with the item’s stem. Well, as you
can see from the next sample item, grammatical inconsistency for three of these answer-
options supplies students with an unintended clue to the correct answer.
 Make all alternatives plausible, but be sure that one of them is indisputably the
correct or best answer. As I indicated when describing the weaknesses of multiple-
choice items, teachers sometimes toss in one or more implausible alternatives, thereby
diminishing the item substantially. Although avoiding that problem is important, it’s even
more important to make certain that you really do have one valid correct answer in any
item’s list of alternatives, rather than two similar answers, either of which could be
arguably correct.

41
 Randomly use all answer positions in approximately equal numbers. If you use four-
option items, make sure that roughly one-fourth of the correct answers turn out to be A,
one fourth B, and so on.
 Never use “all of the above” as an answer choice, but use “none of the above” to
make items more demanding.

Students often become confused when confronted with items that have more than one correct
answer. Usually, what happens is they’ll see one correct alternative and instantly opt for it
without recognizing that there are other correct options later in the list. In addition, students will
definitely opt for the “all of the above option” if they realize that two of the alternatives are
correct without considering the third option. However, we can increase the difficulty level of a
test item by presenting three or four answer options, none of which is correct, followed by a
correct “none-of-the-above” option.

Activity: Examine the following faulty multiple choice items and identify their problems.
1. The term "side effect" of a drug refers to:
A. additional benefits from the drug.
B. the chain effect of drug action.
C. the influence of drugs on crime.
D. any action of a drug in the body other than the one the doctor wanted the drug to have.
2. When linking two clauses, one main and one subordinate, one should use a:
A. coordinate conjunction such as and or so
B. subordinate conjunction such as because or although.
C. preposition such as to or from.
D. semicolon.
3. Entomology is:
A. the study of birds.
B. the study of fish.
C. the study of insects.
4. The promiscuous use of sprays, oils, and antiseptics in the nose during acute colds is a pernicious
practice because it may have a deleterious effect on
A. the sinuses.
B. red blood cells.
C. white blood cells.
5. An electric transformer can be used:
A. for storing electricity
B. to increase the voltage of alternating current
C. It converts electric energy in mechanical energy
D. alternating current is changed to direct current

42
Constructing Performance Assessments

In the previous paragraphs you have been learning on how objective test items should be
constructed. You have learned that well constructed objective tests can measure a variety of
learning outcomes, from simple to complex. Despite this wide applicability of objective-item
types, there remain significant learning outcomes for which no satisfactory objective
measurements have been developed. These include such outcomes as the ability to recall,
organize, and integrate ideas; the ability to express oneself in writing; and the ability to create
rather than merely identify interpretations and applications of data. Such outcomes require less
structuring of responses than objective test items, and it is in the measurement of these outcomes
that written essays and other performance-based assessments are of great value.

In this section, you will be presented with the most familiar form of performance-based
assessment – essay question. The distinctive feature of essay questions is that students are free to
construct, relate, and present ideas in their own words. Learning outcomes concerned with the
ability to conceptualize, construct, organize, relate, and evaluate ideas require the freedom of
response and the originality provided by essay questions.

Essay questions can be classified into two types – restricted-response essay questions and
extended response essay questions. Now let us briefly see these type of questions.

Restricted-response essay questions: These types of questions usually limit both the content
and the response. The content is usually restricted by the scope of the topic to be discussed.
Limitations on the form of response are generally indicated in the question. This can be
demonstrated in the following example:

In what ways are essay questions more preferable than objective test items? Answer in a
brief paragraph.

Extended response Essays: these types of questions allow students:


 to select any factual information that they think is relevant,
 to organize the answer in accordance with their best judgment, and;
 to integrate and evaluate ideas as they deem appropriate.

43
This freedom enables them to demonstrate their ability to analyze problems, organize their ideas,
describe in their own words, and/or develop a coherent argument.

In addition to the already described capacity in measuring higher order thinking skills, essay
questions have some more advantages which include the following:
 Extended-response essays focus on the integration and application of thinking and
problem solving skills.
 Essay assessments enable the direct evaluation of writing skills.
 Essay questions, as compared to objective tests, are easy to construct.
 Essay questions have a positive effect on students learning.

On the other hand, essay questions also have some limitations which you need to be aware of.
Perhaps the most commonly cited problem of those test questions is their unreliability of scoring.
Thus, the same paper may be scored differently by different teachers, and even the same teacher
may give different scores for the same paper at different times. Another limitation is the amount
of time required for scoring the responses. Still another problem with essay tests is the limited
sampling of content they provide.

The improvement of the essay question requires attention to two problems:


a. How to construct essay questions that call forth the desired student response, and
b. How to score the answers so that achievement is reliably measured.
There are some guidelines for improving the reliability and validity of essay scores. The
following are suggestions for the construction of good essay questions:
 Restrict the use of essay questions to those learning outcomes that can not be
measured satisfactorily by objective items. As we have seen earlier, objective measures
have the advantage of efficiency and reliability. When objective items are inadequate for
measuring learning outcomes, however, the use of essay questions becomes necessary
despite their limitations.
 Structure items so that the student’s task is explicitly bounded. Phrase your essay
items so that students will have no doubt about the response you’re seeking. Don’t
hesitate to add details to eliminate ambiguity.
 For each question, specify the point value, an acceptable response-length, and a
recommended time allocation. What this second rule tries to do is give students the

44
information they need to respond appropriately to an essay item. The less guessing that
your students are obliged to do about how they’re supposed to respond, the less likely it
is that you’ll get lots of off-the-wall essays that don’t give you the evidence you need.
 Employ more questions requiring shorter answers rather than fewer questions
requiring longer answers. This rule is intended to foster better content sampling in a
test’s essay items. With only one or two items on a test, chances are awfully good that
your items may miss your students’ areas of content mastery or non mastery.
 Don’t employ optional questions. When students are made to choose their essay items
from several options, you really end up with different tests, unsuitable for comparison.
 Test a question’s quality by creating a trial response to the item. A great way to
determine if your essay items are really going to get at the responses you want is to
actually try writing a response to the item, much as a student might do.

As we have seen earlier the most serious limitation with essay questions is related to scoring.
Therefore, the following guidelines would be helpful in making the scoring of essay items easier
and more reliable.
1. You should ensure that you are firm emotionally, mentally etc before scoring
2. All responses to one item should be scored before moving to the next item
3. Write out in advance a model answer to guide yourself in grading the students’ answers
4. Shuffle exam papers after scoring every question before moving to the next
5. The names of test takers should not be known while scoring to avoid bias

2.3.5. Table of Specification and Arrangement of Items


Tests are one of the most important and commonly used assessment instruments used in
education. If tests are to be valid and reliable they have to be developed based on carefully
designed plans. They also have to be arranged on principles of test construction.

Table of Specification

The development of valid, reliable and usable questions involves proper planning. The plan
entails designing a framework that can guide the test developers in the items development
process. This is necessary because classroom test is a key factor in the evaluation of learning
outcomes. The validity, reliability and usability of such test depend on the care with which the

45
test are planned and prepared. Planning helps to ensure that the test covers the pre-specified
instructional objectives and the subject matter (content) under consideration. Hence, planning
classroom test involves identifying the instructional objectives earlier stated and the subject
matter (content) covered during the teaching/learning process. This leads to the preparation of
table of specification (the test blue print) for the test while bearing in mind the type of test that
would be relevant for the purpose of testing.

To plan a classroom test that will be both practical and effective in providing evidence of
mastery of the instructional objectives and content covered requires relevant considerations.
Hence the following serves as guide in planning a classroom test.
i. Determine the purpose of the test;
ii. Describe the instructional objectives and content to be measured.
iii. Determine the relative emphasis to be given to each learning outcome;
iv. Select the most appropriate item formats (essay or objective);
v. Develop the test blue print to guide the test construction;
vi. Prepare test items that is relevant to the learning outcomes specified in the test plan;
vii. Decide on the pattern of scoring and the interpretation of result;
viii. Decide on the length and duration of the test, and
ix. Assemble the items into a test, prepare direction and administer the test.

The instructional objectives of the course are critically considered while developing the test
items. This is because the instructional objectives are the intended behavioural changes or
intended learning outcomes of instructional programs which students are expected to possess at
the end of the instructional process. The instructional objectives usually stated for the assessment
of behavior in the cognitive domain of educational objectives are classified by Bloom (1956) in
his taxonomy of educational objectives into knowledge, comprehension, application, analysis,
synthesis and evaluation. The objectives are also given relative weight in respect to the level of
importance and emphasis given to them. Educational objectives and the content of a course are
the focus on which test development is based.

A table of specification is a two-way table that matches the objectives and content you have
taught with the level at which you expect your students to perform. It contains an estimate of the
percentage of the test to be associated to each topic at each level at which it is to be measured. In

46
effect we establish how much emphasis to give to each objective or content. A table of
specification guides the selection of test items which in effect ensures that the test measures a
representative sample of instructionally relevant tasks.

Developing a table of specification involves:


1. Preparing a list of learning outcomes, i.e. the type of performance students are
expected to demonstrate
2. Outlining the contents of instruction, i.e. the area in which each type of performance
is to be shown, and
3. Preparing the two way chart that relates the learning outcomes to the instructional
content.
Now, let us try to understand how a test blue print is developed using the following table of
specification developed for a Geography test as an example.

Instructional Objectives
Knowled Comprehensi Applicati Analysi Synthesi Evaluatio Tota
Contents Percent
ge on on s s n l
Air
2 2 1 1 - - 6 24%
pressure
Wind 1 1 1 1 - - 4 16%
Temperatu
2 2 1 1 - 1 7 28%
re
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25
Percent 28% 32% 16% 16% 4% 4% 100%

As can be observed from the table, the rows show the content areas from which the test is to be
sampled; and the columns indicate the level of thinking students are required to demonstrate in
each of the content areas. Thus, the test items are distributed among each of the five content
areas with their corresponding representation among the six levels of the cognitive domain. The
percentage row and column also shown the degree of representation of both the contents and
levels of the cognitive domain in this particular test. Thus objectives you consider are more
important should get more representation in the test items. Similarly, content areas on which you
have spent more instructional time should be allotted more test items.

47
Which of the objectives on the example above were given more emphasis? Which of them
obtained least emphasis? Which content areas obtained the highest representation? Which one
obtained the last representation? What are the implications of these differences?

There are also other ways of developing a test blue print. One of this is a way of showing the
distribution of test items among the content areas and the type of test items to be developed from
each content area. For example, the table of specification that we have seen earlier can be
prepared in the following way.

Item Types
Contents True/ Matchin Short Multiple Tota Percent
False g Answer Choice l
Air pressure 1 1 1 3 6 24%
Wind 1 1 1 1 4 16%
Temperature 1 2 1 3 7 28%
Rainfall 1 1 1 2 5 20%
Clouds 1 - 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%

Arrangement of test items


There are various methods of grouping items in an achievement test depending on their purposes.
For most purposes the items scan be arranged by a systematic consideration of:
 The type of items used
 The learning outcomes measured
 The difficulty of the items, and
 The subject matter measured
First, the items should be arranged in sections by item type. That is all True-false items should be
grouped together, then matching items, then all short answer or completion items, and then all
multiple choice items. Extended-response essay questions and performance tasks usually take a
lot of time that they would be administered alone. If combined with some of the other types of
items and tasks, the extended response tasks should come last.

Arranging the sections of a test in this order produces a sequence that roughly approximates the
complexity of the outcomes measured, ranging from the simple to the complex. It is then a
merely a matter of grouping the items within each item type. For this purpose, items that measure

48
similar outcomes should be placed together and then arranged in order of ascending difficulty.
For example the items under the multiple choice section might be arranged in the following
order: knowledge of terms, knowledge of specific facts, knowledge of principles, and application
of principles. Keeping together items that measure similar learning outcomes is especially
helpful in determining the type of learning outcomes causing students the greatest difficulty.

If, for any reason, it is not feasible to group the items by the learning outcomes measured, then it
is still desirable to arrange them in order of increasing difficulty. Beginning with the easiest
items and proceeding gradually to the most difficult has a motivating effect on students. Also,
encountering difficult items early in the test often causes students to spend a disproportionate
amount of time on such items. If the test is long, they may be forced to omit later questions that
they could easily have answered. With the items classified by item type, the sections of the test
and the items within each section can be arranged in order of increasing difficulty.

To summarize, the most effective method for organizing items in the typical classroom test is to:
1. Form sections by item type
2. Group the items within each section by the learning outcomes measured, and
3. Arrange both the sections and the items within sections in an ascending order of
difficulty.

Project Work: In groups of four, take one exam paper from the school you are placed for your
practicum experience which includes at least three types of test items. Then evaluate the items
and the test in general based on the guidelines of test construction you have learned in the unit.
You have to prepare and submit a report of your evaluation to your instructor. The test paper
you have evaluated should also be attached with your report.

2.3.6. Administration of Tests


Test Administration refers to the procedure of actually presenting the learning task that the
examinees are required to perform in order to ascertain the degree of learning that has taken
place during the teaching-learning process. This procedure is as important as the process of
preparing the test. This is because the validity and reliability of test scores can be greatly reduced
when test is poorly administered. While administering test all examinees must be given fair
chance to demonstrate their achievement of the learning outcomes being measured. This requires
the provision of a physical and psychological environment which is conducive to their making

49
their best efforts and the control of such factors such as malpractices and unnecessary threat from
test administrators that may interfere with valid measurement. It is also concerned with selecting
convenient and accurate procedures for scoring the results.

There are a number of conditions that may create test anxiety on students ant therefore should be
taken care of during test administration. These include:
 Threatening students with tests if they do not behave
 Warning students to so their best “because the test is important”
 Telling students they must work fast in order to finish on time.
 Threatening dire consequences if they fail.

i) Ensuring Quality in Test Administration


Quality and good control are necessary components of test administration. The following are
guidelines and steps involved in test administration aimed at ensuring quality in test
administration.
 Collection of the question papers in time from custodian to be able to start the test at
the appropriate time stipulated.
 Ensure compliance with the stipulated sitting arrangements in the test to prevent
collision between or among the test takers.
 Ensure orderly and proper distribution of questions papers to the test takers.
 Do not talk unnecessarily before the test. Test takers’ time should not be wasted at the
beginning of the test with unnecessary remarks, instructions or threat that may develop
test anxiety.
 It is necessary to remind the test takers of the need to avoid malpractices before they
start and make it clear that cheating will be penalized.
 Stick to the instructions regarding the conduct of the test and avoid giving hints to test
takers who ask about particular items. But make corrections or clarifications to the test
takers whenever necessary.
 Keep interruptions during the test to a minimum.

ii) Credibility and Civility in Test Administration


Credibility and Civility are aspects of characteristics of assessment which have day to day
relevance for developing educational communities. Credibility deals with the value the eventual

50
recipients and users of the results of assessment place on the result with respect to the grades
obtained, certificates issued or the issuing institution. While civility on the other hand enquires
whether the persons being assessed are in such conditions as to give their best without
hindrances and burdens in the attributes being assessed and whether the exercise is seen as
integral to or as external to the learning process.

Hence, in test administration, effort should be made to see that the test takers are given a fair and
unaided chance to demonstrate what they have learnt with respect to:
a) Instructions: Test should contain a set of instructions which are usually of two types. One
is the instruction to the test administrator while the other one is to the test taker. The
instruction to the test administrator should explain how the test is to be administered the
arrangements to be made for proper administration of the test and the handling of the scripts
and other materials. The instructions to the administrator should be clear for effective
compliance. For the test takers, the instruction should direct them on the amount of work to
be done or of tasks to be accomplished. The instruction should explain how the test should
be performed. Examples may be used for illustration and to clarify the instruction on what
should be done by the test takers. The language used for the instruction should be
appropriate to the level of the test takers. The necessary administrators should explain the
test takers instruction for proper understanding especially when the ability to understand and
follow instructions is not part of the test.

b) Duration of the Test: The time for accomplishing the test is technically important in test
administration and should be clearly stated for both the test administrators and test takers.
Ample time should be provided for candidates to demonstrate what they know and what
they can do. The duration of test should reflect the age and attention span of the test takers
and the purpose of the test.

c) Venue and Sitting Arrangement: The test environment should be learner friendly with
adequate physical conditions such as work space, good and comfortable writing desks,
proper lighting, good ventilation, moderate temperature, conveniences within reasonable
distance and serenity necessary for maximum concentration. It is important to provide
enough and comfortable seats with adequate sitting arrangement for the test takers’ comfort
and to reduce collaboration between them. Adequate lighting, good ventilation and moderate

51
temperature reduce test anxiety and loss of concentration which invariably affects
performance in the test. Noise is another undesirable factor that has to be adequately
controlled both within and outside the test immediate environment since it affects
concentration and test scores.

d) Other necessary conditions: Other necessary conditions include the fact that the questions
and questions paper should be friendly with bold characters, neat, decent, clear and appealing
and not such that intimidates test taker into mistakes. All relevant materials for carrying out
the demands of the test should be provided in reasonable number, quality and on time.

All these are necessary to enhance the test administration and to make assessment civil in
manifestation.

On the other hand, for the credibility effort should be made to moderate the test questions before
administration based on laid down standard. It is also important to ensure that valid questions are
constructed based on procedures for test construction which you already have learned in the
earlier sections of this unit.

Secure custody should be provided for the questions from the point of drafting to constituting the
final version of the test, to provision of security and safe custody of live scripts after the
assessment, transmitting them to the graders and provision of secure custody for the grades
arising from the assessment against loss, mutilation and alteration. The test administrators and
the graders should be of proven moral integrity and should hold appropriate academic and
professional qualifications. The test scripts are to be graded and marks awarded strictly by using
itemized marking schemes. All these are necessary because an assessment situation in which
credibility is seriously called to question cannot really claim to be valid.

Unit Summary
In this unit you were introduced to different types of assessment approaches, namely formal vs.
informal, criterion referenced vs. norm referenced, formative vs. summative assessments. You
also learned about various assessment strategies. These include: classroom presentations,
exhibitions/demonstrations, conferences, interviews, observations, performance tasks, portfolios,
question and answer, students’ self assessment, checklists, rating scales and rubrics, one-minute
paper, muddiest point, students-generated questions and tests.

52
You also learned about the challenges in the assessment of large classes and their consequences
and some of the strategies that we can use to minimize those challenges. These strategies
include: front ending, Making use of in-class assignments, self and peer assessment, group
assessment, Changing the assessment method, or at least shortening it.

Much of this unit was devoted to the construction of the widely assessment techniques, that is
tests. In this regard, tests were classified into two broad categories: Objective tests and
performance assessment tasks (essay tests). Objective tests were further divided into supply type
items and selection type items. Supply type items include short answer and completion items,
where as selection type items include True/false items, matching items and multiple choice
items. Essay items were also classified into restricted essay items and extended essay items. Here
you have learned about the strengths and limitations of these different test item types. You were
also introduced to the major guidelines you should follow in constructing these test item types.

This unit also covered about the planning or tests and particularly on the preparation of table of
specification or test blue print. You were also familiarized with how test item types should
arranged. Finally you also learned about the techniques and procedures we should follow when
during test administration.

Self-Check Exercises
1. State the differences between formative and summative assessment, criterion referenced
and norm referenced assessment, and formal and informal assessment.
2. What conditions do we consider in selecting assessment strategies in our subject?
3. List down the major assessment strategies that you can use in your subject and classify
them as formal and informal strategies.
4. What are the major problems associated with assessing students in large classes? What
strategies can we use to minimize these problems?
5. What is the difference between objective tests and essay tests?
6. What are the advantages of objective tests as compared to essay tests?
7. What are the advantages of essay tests as compared to objective tests?
8. What is a table of specification and what major purposes does it serve?
9. What are the major procedures we need to follow during test administration?

53
54
Unit 3: Item Analysis

Introduction
In unit two you learned about various assessment strategies that can be used in the context of secondary
education. You were also introduced with the planning, construction and administration of classroom
tests. In this unit, you are going to learn the techniques of analyzing responses to test items so as to
determine their validity and reliability. You will also learn about the advantages and techniques of test
item banking.

Learning Outcomes

Upon completion of this unit, you should be able to:

 Define item difficulty and discrimination indices


 Analyze items using difficulty and discrimination indices.
 Analyze distracters of multiple choice items
 Improve item qualities through response analysis
 Select items for different purposes
 Bank test items for future use.

3.2. Sections and sub-sections


Once a teacher has corrected and marked his/her students’ test papers, what do you think he should do
with them? Should he/she throw them away? Keep them? Or what?

Item analysis is an important phase in the construction of tests. It is the process involved in examining or
analyzing testee’s responses to each item on a test with a basic intent of judging the quality of item. Item
analysis helps to determine the adequacy of the items within a test as well as the adequacy of the test
itself. There are several reasons for analyzing questions and tests that students have completed and that
have already been graded. Some of the reasons that have been cited include the following:

1. Identify content that has not been adequately covered and should be re-taught,
2. Provide feedback to students,
3. Determine if any items need to be revised in the event they are to be used again or become
part of an item file or bank,

55
4. Identify items that may not have functioned as they were intended,
5. Direct the teacher's attention to individual student weaknesses.

The results of an item analysis provide information about the difficulty of the items and the ability of the
items to discriminate between better and poorer students. If an item is too easy, too difficult, failing to
show a difference between skilled and unskilled examinees, or even scored incorrectly, an item analysis
will reveal it. The two most common statistics reported in an item analysis are the item difficulty and the
item discrimination. An additional analysis that is often reported is the distractor analysis. Once the item
analysis information is available, an item review is often conducted. In the following sections you are
going to learn the statistical techniques used to analyse responses to test items.

3.2.1. Item difficulty level index


How difficulty do you think a test should be? How do we determine the difficulty level of test items? Why
is it important to know the difficulty level of test items? Please think over these questions and share your
ideas to your friend.

Item difficulty index is one of the most useful, and most frequently reported, item analysis statistics. It is
a measure of the proportion of examinees who answered the item correctly; for this reason it is frequently
called the p-value. If scores from all students in a group are included the difficulty index is simply the
total percent correct. When there is a sufficient number of scores available (i.e., 100 or more) difficulty
indexes are calculated using scores from the top and bottom 27 percent of the group.

Item analysis procedures

1. Rank the papers in order from the highest to the lowest score
2. Select one-third of the papers with the highest total score and another one-third of the papers with
lowest total scores
3. For each test item, tabulate the number of students in the upper & lower groups who selected each
option
4. Compute the difficulty of each item (% of students who got the right item)

Item difficulty index can be calculated using the following formula:

P=

56
Where, HSG = High Scoring Groups

– LSG = Low Scoring Groups

– N= the total number of HSG and LSG

The difficulty indexes can range between 0.0 and 1.0 and are usually expressed as a percentage. A higher
value indicates that a greater proportion of examinees responded to the item correctly, and it was thus an
easier item. The average difficulty of a test is the average of the individual item difficulties. For
maximum discrimination among students, an average difficulty of .60 is ideal. For example: If 243
students answered item no. 1 correctly and 9 students answered incorrectly, the difficulty level of the item
would be 243/252 or .96.

In the example below, five true-false questions were part of a larger test administered to a class of 20
students. For each question, the number of students answering correctly was determined, and then
converted to the percentage of students answering correctly.

Question Correct responses Item difficulty

1 ||||||||||||||| 15 75% (15/20)

2 ||||||||||||||||| 17 85% (17/20)

3 |||||| 6 30% (6/20)

4 ||||||||||||| 13 65% 13/20)

5 |||||||||||||||||||| 20 100% (20/20)

57
Activity: Calculate the item difficulty level for the following four options multiple choice test item. (The
sign (*) shows the correct answer).

Response Options

Groups A B C D* Total

High Scorers 0 1 1 8 10

Low Scorers 1 1 5 3 10

Total 1 2 6 11 20

Item difficulty interpretation

P-Value Percent Range Interpretation

> or = 0.75 75-100 Easy

< or = 0.25 0-25 Difficult

between .25 & .75 26-74 Average

For criterion-referenced tests (CRTs), with their emphasis on mastery-testing, many items on an exam
form will have p-values of .9 or above. Norm-referenced tests (NRTs), on the other hand, are designed to
be harder overall and to spread out the examinees’ scores. Thus, many of the items on an NRT will have
difficulty indexes between .4 and .6.

3.2.2. Item discrimination index


To what extent do you think a test item should discriminate between higher achievers and lower
achievers? Should it be highly discriminating, averagely discriminating, or less discriminating? What are
your reasons?

The index of discrimination is a numerical indicator that enables us to determine whether the question
discriminates appropriately between lower scoring and higher scoring students. When students who earn
high scores are compared with those who earn low scores, we would expect to find more students in the
high scoring group answering a question correctly than students from the low scoring group. In the case

58
of very difficult items which no one in either group answered correctly or fairly easy questions which
even the students in the low group answered correctly, the numbers of correct answers might be equal for
the two groups. What we would not expect to find is a case in which the low scoring students answered
correctly more frequently than students in the high group.

Item discrimination index can be calculated using the following formula:

D=

Where, HSG = High Scoring Groups

– LSG = Low Scoring Groups

In the example below, there are 8 students in the high scoring group and 8 in the low scoring group (with
12 between the two groups which are not represented). For question 1, all 8 in the high scoring group
answered correctly, while only 4 in the low scoring group did so. Thus success in the HSG – Success in
the LSG (8 - 4) = +4. The last step is to divide the +4 by half of the total number of both groups (16).

Thus, will give us +.5, which is the D-value.

Question Success in the HSG Success in the LSG Difference D value

1 8 4 8–4=4 .5

2 7 2

3 5 6

Activity 2: Calculate the item discrimination index for the questions 2 & 3 on the table above.

The item discrimination index can vary from -1.00 to +1.00. A negative discrimination index (between -
1.00 and zero) results when more students in the low group answered correctly than students in the high
group. A discrimination index of zero means equal numbers of high and low students answered correctly,

59
so the item did not discriminate between groups. A positive index occurs when more students in the high
group answer correctly than the low group. If the students in the class are fairly homogeneous in ability
and achievement, their test performance is also likely to be similar, resulting in little discrimination
between high and low groups.

Questions that have an item difficulty index (NOT item discrimination) of 1.00 or 0.00 need not be
included when calculating item discrimination indices. An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered correctly. We already know that neither type of
item discriminates between students.

When computing the discrimination index, the scores are divided into three groups with the top 27% of
the scores in the upper group and the bottom 27% in the lower group. The number of correct responses
for an item by the lower group is subtracted from the number of correct responses for the item in the
upper group. The difference is divided by the number of students in either group. The process is repeated
for each item.

The value is interpreted in terms of both:

• direction (positive or negative) and


• strength (non-discriminating to strongly-discriminating).
These values can range from -1.00 to +1.00.The possible range of the discrimination index is -1.0 to 1.0.

Item discrimination interpretation

D-Value Direction Strength

> +.40 positive strong

+.20 to +.40 positive moderate

-.20 to +.20 none ---

< -.20 negative moderate to strong

For a small group of students, an index of discrimination for an item that exceeds .20 is considered
satisfactory. For larger groups, the index should be higher because more difference between groups would
be expected. The guidelines for an acceptable level of discrimination depend upon item difficulty. For

60
very easy or very difficult items, low discrimination levels would be expected; most students, regardless
of ability, would get the item correct or incorrect as the case may be. For items with a difficulty level of
about 70 percent, the discrimination should be at least .30.

When an item is discriminating negatively, overall the most knowledgeable examinees are getting the
item wrong and the least knowledgeable examinees are getting the item right. A negative discrimination
index may indicate that the item is measuring something other than what the rest of the test is measuring.
More often, it is a sign that the item has been mis-keyed.

3.2.3. Distractor Analysis


One important element in the quality of a multiple choice item is the quality of the item’s distractors.
However, neither the item difficulty nor the item discrimination index considers the performance of the
incorrect response options, or distractors. A distractor analysis evaluates the effectiveness of the
distracters in each item by comparing the number of students in the upper and lower groups who selected
each incorrect alternative (a good distracter will attract more students from the lower group than the upper
group).

Just as the key, or correct response option, must be definitively correct, the distracters must be clearly
incorrect (or clearly not the "best" option). In addition to being clearly incorrect, the distractors must also
be plausible. That is, the distractors should seem likely or reasonable to an examinee who is not
sufficiently knowledgeable in the content area.

If a distractor appears so unlikely that almost no examinee will select it, it is not contributing to the
performance of the item. In fact, the presence of one or more plausible distractors in a multiple choice
item can make the item artificially far easier than it ought to be. Let us try to explain this using the
following table as an example that shows the responses of eight students to five multiple-choice
questions.

A B C D
TEST ITEM NO 1 5** 1 1 1
TEST ITEM NO 2 0 2 6** 0
TEST ITEM NO 3 2** 2 2 2
TEST ITEM NO 4 0 3** 0 5
TEST ITEM NO 5 2 1 0 5**
** Denotes Correct Answer

61
Over 50% of the students answered question number 1 correctly, and each of the distractors was selected.
The distractors have functioned as they should. The teacher may be less than satisfied with only 5 of 8
students answering correctly, but a class would generally have more than eight students and could well
have a higher percentage of correct answers while still having effective distractors.

It is not desirable to have one of the distractors chosen more often than the correct answer, as occurred
with question 4. This result indicates a potential problem with the question. Distractor D may be too
similar to the correct answer and/or there may be something in either the stem or the alternatives that is
misleading.

If students do not know the correct answer and are purely guessing, their answers would be expected to be
distributed among the distractors as well as the correct answer, much like question 3. If one or more
distractors are not chosen, as occurs in questions 2, 4, and 5, the unselected distractors probably are not
plausible. If the teacher wants to make the test more difficult, those distractors should be replaced in
subsequent tests.

In a simple approach to distractor analysis, the proportion of examinees who selected each of the response
options is examined. The proportion of examinees who select each of the distractors can be very
informative. For example, it can reveal an item mis-key. Whenever the proportion of examinees who
selected a distractor is greater than the proportion of examinees who selected the key, the item should be
examined to determine if it has been mis-keyed or double-keyed. A distractor analysis can also reveal an
implausible distractor. In criterion referenced tests, where the item p-values are typically high, the
proportions of examinees selecting all the distractors are, as a result, low. Nevertheless, if examinees
consistently fail to select a given distractor, this may be evidence that the distractor is implausible or
simply too easy.

Project Work

In the school where you are placed for your Practicum activities, take corrected exam papers of 1 section
from the cooperating teacher and by taking 10 multiple choice questions:

1. calculate the difficulty level of each item


2. calculate the discrimination power of each item
3. analyze the plausibility of the distractors
Present your work in the form of a report.

62
3.2.4 Item Banking
Building a file of effective test items and assessment tasks involves recording the items or tasks, adding
information from analyses of students responses, and filing the records by both the content area and the
objective that the item or task measures. Thus, items and tasks are recorded on records as they are
constructed; information form analysis of students responses is added after the items and tasks have been
used, and then the effective items and tasks are deposited in the file. In a few years, it is possible to start
using some of the items and tasks from the file and supplement these with new items and tasks. As the file
grow, it becomes possible to select the majority of the items and tasks from the file for any given test or
assessment without repeating them frequently. Such a file is especially valuable in areas of complex
achievement, when the construction of test items and assessment tasks is difficult and time consuming.
When enough high-quality items and tasks have been assembled, the burden of preparing tests and
assessments is considerably lightened. Computer item banking makes tasks even easier.

Summary

In this unit you learned how to judge the quality of classroom test by carrying out item analysis which is
the process of “testing the item” to ascertain specifically whether the item is functioning properly in
measuring what the entire test is measuring. You also learned about the process of item analysis and how
to compute item difficulty, item discriminating power and evaluating the effectiveness of distracters. You
have learned that item difficulty indicates the percentage of testees who get the item right; Item
discriminating power is an index which indicates how well an item is able to distinguish between the high
achievers and low achievers given what the test is measuring; and the distraction power of a distracter is
its ability to differentiate between those who do not know and those who know what the item is
measuring.

Finally you learned that after conducting item analysis, items may still be usable, after modest changes
are made to improve their performance on future exams. Thus, good test items should be kept in test item
banks and in this unit you were given highlights on how to build a Test Item File/Item Bank.

Self-check Exercises

1. What is the purpose of test item analysis?


2. What is item difficulty index?
3. What is the power of item discrimination?
4. What is the basic intent of distractor analysis? ·

63
5. From the data presented below (where alternative A is the correct answer), compute the
difficulty level and the discrimination power and comment on the effectiveness of the
distractors.

Alternatives

Group A* B C D

Upper 28 14 5 6 3

Lower 28 7 15 1 5

6. What information should be included with the test item we put in our item bank?·

Reading Materials

Abay Tekle (1982). Evaluation in Education (part one). Bahir Dar Teachers College
(Unpublished teaching material)

Bigge, J.L. and Colleen Shea Stump (1999). Curriculum, Assessment, and Instruction. Boston:
Wadsworth Publishing Company.

EQUIP (2008). Reader on Student Assessment. Addis Ababa.

64
Unit 4: Interpretation Of Scores

Introduction
In unit three you learned about how to analyse test items in order to determine each test item and the
overall test in general. You also learned about how to build your test item banks. In this unit you are
going to be familiarized with the idea of test score interpretation and the major statistical techniques that
can be used to interpret test scores. Particularly, you will learn about the methods of interpreting test
scores, measures of central tendency, measures of dispersion or variability, measures of relative position,
and measures of relationship or association.

Learning Outcomes 

Upon completion of this unit, you should be able to:

 List techniques of interpreting scores


 Apply measures of central tendency, variability, relative position and relationship in
interpreting scores.
 Select appropriate technique (s) of interpreting test scores.
 Develop reports on the implications of scores.
 Propose appropriate score interpretation based decisions

Sections and Sub-sections

Imagine that you receive a grade of 60 for a midterm exam in one of your university classes.
What does the score mean, and how should we interpret it?

Test interpretation is a process of assigning meaning and usefulness to the scores obtained from
classroom test. This is necessary because the raw score obtained from a test standing on itself rarely has
meaning. For instance, a score of 60% in one Assessment and evaluation of learning test cannot be said to
be better than a score of 50% obtained by the same test taker in another test of the same subject. The test
scores on their own lack a true zero point and equal units. Moreover, they are not based on the same
standard of measurement and as such meaning cannot be read into the scores on the basis of which
academic and psychological decisions may be taken.

65
4.1 Kinds of scores
Data differ in terms of what properties of the real number series (order, distance, or origin) we can
attribute to the scores. The most common kinds of scores include nominal, ordinal, interval, and ratio
scales.

A nominal scale involves the assignment of different numerals to categorize that are qualitatively
different. For example, we may assign the numeral 1 for males and 2 for females. These symbols do not
have any of the three characteristics (order, distance, or origin) we attribute to the real number series. The
1 does not indicate more of something than the 0.

An ordinal scale has the order property of a real number series and gives an indication of rank order. For
example, ranking students based on their performance on a certain athletic event would involve an ordinal
scale. We know who is best, second best, third best, etc. But the ranked do not tell us anything about the
difference between the scores.

With interval data we can interpret the distances between scores. If, on a test with interval data, a Almaz
has a score of 60, Abebe a score of 50, and Beshadu a score of 30, we could say that the distance between
Abebe’s and Beshadu’s scores (50 to 30) is twice the distance between Almaz”s and Abebe’s scores (60
t0 50).

If one measures with a ratio scale, the ratio of the scores has meaning. Thus, a person whose height is 2
meters is twice as a tall as a person whose height is 1 meter. We can make this statement because a
measurement of 0 actually indicates no height. That is, there is a meaningful zero point. However, if a
student scored 0 on a spelling test, we would not interpret the score to mean that the student had no
spelling ability.

4.2 Methods of Interpreting test scores


If a student responds correctly to 65 items on an objective tests which each correct item counts one point,
the raw score will be 65. Thus a raw score is simply the number of points received on a test when the test
has been scored according to the directions. We all are familiar with raw scores from our many years of
taking classroom tests. Although a raw score is a numerical summary of a student’s test performance, it is
not meaningful without further information. In general we can provide meaning to a raw score either by
converting it into a description of the specific tasks the student can perform (criterion referenced

66
interpretation) or converting it into some type of derived score that indicates the student’s relative
position in a clearly defined referenced group (norm referenced interpretation). In some cases both types
of interpretation may be appropriate and useful.

Criterion referenced interpretation

Criterion - referenced interpretation is the interpretation of test raw score based on the conversion of the
raw score into a description of the specific tasks that the learner can perform. That is, a score is given
meaning by comparing it with the standard of performance that is set before the test is given. It permits
the description of a learner’s test performance without referring to the performance of others. Thus, we
might describe a pupil’s performance in terms of the speed with which a task is performed, the precision
with which a task is performed, or the percentage of items correct on some clearly defined set of learning
tasks. The percentage-correct score is widely used in criterion-referenced test interpretation.

Criterion referenced interpretation of test results is most meaningful when the test has been specifically
designed for this purpose. This typically involves designing a test that measures a set of clearly stated
learning tasks. Enough items are used for each interpretation to make it possible to describe test
performance in terms of students’ mastery or non-mastery of learning tasks.

Norm referenced test interpretation

Norm – referenced interpretation is the interpretation of raw score based on the conversion of the raw
score into some type of derived score that indicates the learner’s relative position in a clearly defined
referenced group. This type of interpretation tells us how an individual compares with other persons who
have taken the same test.

Norm – referenced interpretation is usually used in the classroom test interpretation by ranking the test
takers raw scores from highest to lowest scores. It is then interpreted by noting the position of an
individual’s score relative to that of other test takers in the classroom test. The interpretation such as third
position from highest position or about average position in the class provides a meaningful report for the
teacher and the test takers on which to base decision. In this type of test score interpretation, what is
important is a sufficient spread of test scores to provide reliable ranking. The percentage score or the
relative easy / difficult nature of the test is not necessarily important in the interpretation of test scores in
terms of relative performance.

67
4.2.1 Measures of Central Tendency
It is often important to summarize characteristics of a distribution of test scores. One
characteristic of particular interest is a measure of central tendency. The goal of the measures of
central tendency is to come up with the one single score that best describes a distribution of
scores. They let us know if the distribution of scores tends to be composed of high scores or low
scores.
There are three basic measures of central tendency – the mean, the mode and the median - and
choosing one over another depends on two different things:
1. The scale of measurement used, so that a summary makes sense given the nature of the
scores.
2. The shape of the frequency distribution, so that the measure accurately summarizes the
distribution.

The Mean

The mean, or arithmetic average, is the most widely used measure of central tendency. It is the average of
a set of scores computed simply by adding together all scores and dividing by the number of scores. The
mean takes into account the value of each score, and so one extremely high or low score could have a
considerable effect on it. It is helpful to know the mean because then you can see which numbers are
above and below the mean.

Here is an example of test scores for a Math’s class: 82, 93, 86, 97, 82. To find the Mean, first you must
add up all of the numbers. (82+93+86+97+82= 433) Now, since there are 5 test scores, we will next
divide the sum by 5. (440÷5= 88). Thus, the Mean is 88. The formula used to compute the mean is as
follows:

Where, = Mean

∑ = the sum of
X = any score
N = Number of scores

68
The Median
In some circumstances, the mean may not be the best indicator of student performance. If there
are one or a few students who score considerably lower (or higher) than the other students, their
scores tend to pull the mean in their direction. In this case the median is usually considered a
better indicator of student performance. There are also some types of scores that are reported for
standardized tests for which the mean is not appropriate (percentile scores), so the median is
used.
The median is a counting average. It is the number that divides a distribution of scores exactly in
half. It is determined by arranging the scores in order of size and counting up to (or down to) the
midpoint of the set scores. The median will usually be around where most scores fall. When the
number of scores is odd, the median is the middle score. If the number of scores is even, the
median will be halfway between the two middle most scores. In this case the median is not an
actual score earned by one of the students.

Example 1 Example 2 Example 3 Example 4

Scores Scores Scores Scores

50 50 49 50

48 49 48 49

48 48 48 47

47 46 47 47

45 46 45 45

44 43 44 45

43 43 43 45

42 42 42 44

42 41 42 42

69
41 41 41 41

38 41

In example 1, our line would be between 44 and 45, so the median would be halfway between them at
44.5. In this case the median is not an actual score earned by one of the students. In example 2, the
distance between the two middle scores (43 and 46) is more than one, so we again find the point halfway
between them for our median of 44.5. If the number of students is uneven, the median is the one score
that is the middle score in the frequency distribution, having equal numbers of scores above and below it.
Thus, the median is 44 in example 3, and 45 in example 4. It does not matter if more than one student
earns that score, as in example 4.

The Mode

This is the score (or scores) that occur most frequently and is determined by inspection. It is the
least reliable type of statistical average and is frequently used merely as a preliminary estimate of
central tendency. A set of scores may sometimes have two or more modes and in such cases are
called bimodal or multimodal respectively.

If the data is categorical (measured on the nominal scale), then only the mode can be calculated.
The mode can also be calculated with ordinal and higher data, but it often is not appropriate. If
other measures can be calculated, the mode would never be the first choice. For example, the
following test scores, 7, 7, 7, 20, 23, 23, 24, 25, 26 have a mode of 7, but obviously it doesn’t
make much sense. Remember, measures of central tendency look for the one number which best
describe all of the numbers.

Shape of Distributions: Skewness

There is one important situation in which all three measures of central tendency are identical. This occurs
when a distribution is symmetrical, that is, when the right half of the distribution is the mirror image of
the left half of the distribution. In this case the mean will fall exactly at the middle of the distribution (the
median position) and the value at this central point will be the most frequently observed data value, the
mode. If the values of the mean, the mode and the median are identical, a distribution will always be
symmetrical.

70
Figure 1: Shape of distribution of scores

To the extent that differences are observed among these three measures, the distribution is asymmetrical
or “skewed”. These include positively skewed distributions and negatively skewed distributions. In a
positively-skewed distribution (see figure 1 above) most of the scores concentrate at the low end of the
distribution. This might occur, for example, if the test was extremely difficult for the students. .In a
negatively-skewed distribution, as shown in figure 1 above, the majority of scores are toward the high end
of the distribution. This could occur if we gave a test that was easy for most of the students.

Points to note
 With perfectly bell shaped distributions, the mean, median, and mode are identical.
 With positively skewed data, the mode is lowest, followed by the median and mean.
 With negatively skewed data, the mean is lowest, followed by the median and mode.

4.2.2 Measures of Variability/Dispersion


The measures of central tendency focus on what is typical, average or in the middle of a distribution. The
information provided by these measures is not sufficient to convey all we need to know about a
distribution. Knowing the mean, the median or the mode (or all of these) of a distribution does not allow
us to differentiate between distributions. We need additional information about the distributions.

A set of scores can be more adequately described if we know how much they spread out above and below
the measure of central tendency. For example, we might have two groups of students with a mean score
of 70, but in one group the span of scores is from 60 to 80 and in the other group the span is from 50 to
100. These represent quite different spreads of performance. We can identify such differences by numbers

71
that indicate how much scores spread out in a group. These are called measures of variability or
dispersion. The three most commonly used measures of variability are the range, the quartile deviation,
and the standard deviation.

The Range

It is the simplest and crudest measure of variability calculated by subtracting the lowest score from the
highest score. For example, if the score of 10 students in a certain test is: 5, 7, 8, 10, 12, 13, 14, 15, 17,
19, then the range will be 19 -5 = 14. The range provides a quick estimate of variability but is
undependable because it is based on the position of the two extreme scores. The addition of subtraction of
a single score can change the range significantly.

Inter quartile range

Inter quartile range (IQR) is another range measure but this time looks at the data in terms of quarters or
percentiles. IQR is the distance between the 25 th and 75th percentile or the first and third quarter. The
range of data is divided into four equal percentiles or quarters (25%). IQR is the range of the middle 50%
of the data. Therefore, because it uses the middle 50%, it is not affected by outliers or extreme values.
The IQR is often used with skewed data as it is insensitive to the extreme scores.

The Standard Deviation

Let us say that two classes took a quiz. There were 10 students in each class, and each class had an
average score of 81.5. Since the averages are the same, can we assume that the students in both classes
have the same performance on the exam?

The answer is… No. The average (mean) does not tell us anything about the distribution or variation in
the grades. So, we need to come up with some way of measuring not just the average, but also the spread
of the distribution of our data.

The most useful measure of variability, or spread of scores, is the standard deviation. It is essentially an
average of the degree to which a set of scores deviates from the mean. If the Standard Deviation is large,
it means the numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to their mean. Because it takes into
account the amount that each score deviates from the mean, it is a more stable measure of variability than
either the range or quartile deviation.

72
The procedure for calculating a standard deviation involves the following steps:

1. Compute the mean.


2. Subtract the mean from each individual’s score.
3. Square each of these individual scores.
4. Find the sum of the squared scores (∑X2).
5. Divide the sum obtained in step 4 by N, the number of students, to get the variance.
6. Find the square root of the result of step 5. This number is the standard deviation (SD) of the
scores.

Thus the formula for the standard deviation (SD) is: SD=

Now let us take the previous scenario of two groups of students who too a Math quiz with a mean score
of 81.5 to calculate and compare their standard deviations. The individual scores of group A is: 72, 76,
80, 80, 81, 83, 84, 85, 85, & 89. The individual scores of group B is: 57, 63, 65, 71, 83, 93, 94, 95, 96, 98.
Let us start with group A. So, the first step to finding the Standard Deviation is to find all the distances
from the mean. This will be followed by squaring each distances which will give us the following results.

Scores of Team A Distances from the Mean Distances squared

72 - 9.5 90.25
76 - 5.5 30.25
80 - 1.5 2.25
80 - 1.5 2.25
81 - 0.5 0.25
83 1.5 2.25
84 2.5 6.25
85 3.5 12.25
85 3.5 12.25
89 7.5 56.25

Then we add up all of the squared distances which will gives us 214.5. This will be divided by the total
number of scores of the group which will result 214.5 /10 = 21.45. This is the variance of the data set.

73
Variance is the average squared deviation from the mean of a set of data. It is used to find the standard
deviation. Finally, we calculate the Square Root of the variance. This will give us 4.63, which is the
standard deviation.

Activity: Using the same procedures calculate the standard deviation for the scores of Group B.

I am sure you have come up with 15.1 as a standard deviation for the distribution of scores of group B.
Now, let’s compare the two groups of students again.

Group A Group B

Average on the Quiz 81.5 81.5

Standard Deviation 4.63 15.1

What is your interpretation of the test scores of the two groups based on their standard deviations?

Activity: The Math test scores of five students are: 92,88,80,68 and 52. Find the variance and standard
deviation.

The standard deviation, like other measures of variability, represents a distance. If we move the distance
equal to one SD above and below the mean, we will find that somewhere between 60% and 75% of the
scores fall in that region of most distributions of scores. In a normal distribution, 68% of the scores are
included between the mean minus one SD and the mean plus one SD.

Which measure of dispersion to use

The quartile deviation is used with the median and is satisfactory for analyzing a small number of scores.
Because these scores are obtained by counting and thus are not affected by the value of each score, they
are especially useful when one or more scores deviate markedly from the others in the set.

The standard deviation is used with the mean. It is the most reliable measure of variability, and is
especially useful in testing. In addition to describing the spread of scores in a group, it serves as a basis
for computing standard scores, the standard error of measurement, and other statistics used in analyzing
and interpreting test scores.

4.2.3. Measures of Relative Position


There are different ways to measure the relative position of scores. Suppose that you have scored 55 on a
test. What do you say about this score?

74
On the surface it might look bad but what if that was the highest in the class or if that score was better
than 80% of the class? This is what we mean by relative position.

Percentiles

A percentile is a score that indicates the rank of the student compared to others (same age or same grade),
using a hypothetical group of 100 students. . It tells you what percentage of people you did better than. A
percentile of 25 (25th percentile), for example, indicates that the student's test performance equals or
exceeds 25 out of 100 students on the same measure. A percentile of 87 indicates that the student equals
or surpasses 87 out of 100 (or 87% of) students. A percentile must always refer to a student’s percentile
rank as relative to a particular norm group. If you scored at the 80th percentile, what does that mean?

Converting Data Value to Percentile

1. Arrange the data in ascending order

2. Count how many items are below your value. If for example your score is 85 and there are multiple
85’s then count how many are under the first 85.

For example, in the students’ scores of 76, 77, 80, 83, 85, 85, 85, 90, 96 ,97 there are 4 items below 85.

Percentile = number of items below your data + 0.5 * 100%

total number of values

So in our data example: Percentile = 4 * 0.5 *100% = 45 Percentile

10

Quartiles

Quartile is another term referred to in percentile measure. The total of 100% is broken into four equal
parts: 25%, 50%, 75% 100%.

 Lower Quartile is the 25th percentile. (0.25)


 Median Quartile is the 50th percentile. (0.50)
 Upper Quartile is the 75th percentile. (0.75)

75
Standard Scores

Another method of indicating a pupils relative position in a group is by showing how far the raw score is
above or below average. This is the approach used with standard scores. Basically, standard scores
express test performance in terms of standard deviation units from the mean. Standard scores are scores
that are based on mean and standard deviation.

Types of standard scores

Z Score: For data distributions that are approximately symmetric, a measure of relative position that is
often used is the z-score. z-score gives us an estimate as to how many standard deviations a particular
score lies from the mean.

We define z score as z = X – X,
S

Where, X = the data value in question

X = the sample mean

s = the sample standard deviation

For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation of 10, then they
scored 2 standard deviations above the mean. So, a z score of 2 means the original score was 2 standard
deviations above the mean.

If the z-score is 0 then your data value is the mean

If the z-score > 0 (positive) then your data value is above the mean

If the z-score < 0 (negative) then your data value is the below the mean.

Example. Almaz scored a 25 on her math test. Suppose the mean for this exam is 21, with a standard
deviation of 4. Dawit scored 60 on an English test which had a mean of 50 with a standard deviation of 5.
Who did relatively better?

Since standardized tests typically have score distributions which are approximately symmetric, we will
find the respective z-scores for Almaz and Dawit.

Almaz= z-score: 25 - 21 =1

76
Dawit's z-score: 60-50 = 2

Since Dawit had a higher z-score, we say Dawit did relatively better.

T Scores: This refers to any set of normally distributed standard scores that has a mean score of 50 and a
standard deviation of 10. The T – score is obtained by multiplying the Z-score by 10 and adding the
product to 50. That is, T – Score = 50 + 10(z). A score of 60 is one standard deviation above the mean,
while a score of 30 is two standard deviations below the mean.

Example

A test has a mean score of 40 and a standard deviation of 4. What are the T – scores of two test takers
who obtained raw scores of 30 and 45 respectively in the test?

Solution

The first step in finding the T-scores is to obtain the z-scores for the test takers. The z-scores would then
be converted to the T – scores. In the example above, the z – scores are:

For the test taker with raw score of 30, the Z – score is:

Z – Score = X – M, where the symbols retain their usual meanings.

SD

X = 30, M = 40, SD = 4.

Thus, Z – Score = 30 - 40 = -10 = -2∙5

4 4

The T - Score is then obtained by converting the Z – Score (-2∙5) to T – score. Thus:

T – Score = 50 + 10 (z)

= 50 + 10 (-2∙5)

= 50 – 25

= 25

Activity: Following the same procedures find the t score for the second student whose raw score is 45.

77
4.2.4 Measures of Relationship
If we have two sets of scores from the same group of people, it is often desirable to know the degree to
which the scores are related. For example, we may be interested in the relationship between the test scores
of students for the English Subject and their overall scores of other subjects. The degree of relationship is
expressed in terms of coefficient of correlation. The value ranges from -1.00 to +1.00. A perfect positive
correlation is indicated by a coefficient of +1.00 and a perfect negative correlation by a coefficient of -
1.00. A correlation of .00 indicates no relationship between the two sets of scores. Obviously, the larger
the coefficient (positive of negative), the higher the degree of relationship expressed.

There are several different measures of relationship expressed as correlation coefficients. One of these is
the product-moment correlation coefficient, which is by far the most commonly used and most useful
correlation coefficient. It is indicated by the symbol r.

The formula for obtaining the coefficient of correlation is: r=

Where, X = score of person on one variable

Y = score of same person on the other variable

= mean of the X distribution

= mean of the Y distribution

Sx = standard deviation of the X scores

Sy = standard deviation of the Y scores

N = number of pairs of scores

Project work: In groups of five, take the roaster of one cooperating teacher of the school you are
placed for your practicum experience and do the following tasks:
a) Calculate the average marks of the students of the section by taking five subjects
b) Based on the calculated averages, find the mode, the median, the range, the inter-quartile
range, and the standard deviation
c) Find the average scores that lie in the 25th, 50th, and 75th percentiles
d) Take the scores of two subjects and calculate the coefficient of correlation
You have to prepare a report of your work and submit it for correction.

78
Unit Summary

In this unit you learned that test interpretation is a process of assigning meaning and usefulness to the
scores obtained from classroom test and you were introduced to how to interpret test scores. This includes
criterion-referenced and norm-referenced interpretation. Criterion-referenced interpretation is the
interpretation of test raw score based on the conversion of the raw score into a description of the specific
tasks that the learner can perform. Norm-referenced interpretation is the interpretation of raw score based
on the conversion of the raw score into some type of derived score that indicates the learner’s relative
position in a clearly defined reference group.

This unit also introduced you with different statistical techniques that are useful in interpreting test scores.

These are classified into measures of central tendency, measures of dispersion, measures of

relative position and measures of association or relationship. The measures of central tendency
help us to come up with the one single score that best describes a distribution of scores. The most
commonly used measures of central tendency are the mean, the mode and the median. The measures of
dispersion tell us how much the scores spread out above and below the measure of central tendency as
well as how much they are spread out from one another. These measures include the range, the inter-
quartile range and the standard deviation. The measures of relative position are techniques that will show
us the relative standing of individual scores within a certain set of scores. Measures that are used here
include percentile ranks, quartiles, and standardized scores such as the z scores and t scores. The
measures of relationship or association help us to know the degree to which sets of scores are related. The
most commonly used measure of relationship is the product-moment correlation coefficient.

Self-Check Exercises

1. What is test interpretation and why is it necessary to interpret classroom tests?


2. Highlight the major difference between criterion-referenced interpretation and Norm-referenced
interpretation of test scores.
3. Suppose as student has taken two quizzes in a statistics course. On the first quiz the mean
score was 32, the standard deviation was 8, and the student received a 44. The student
obtained a 28 on the second quiz, for which the mean was 23 and the standard deviation was
3. If test scores are approximately normal, on which quiz did the student perform better
relative to the rest of the class?

79
4. You have given an exam to your students. Scores on this exam are normally distributed with
mean = 40 and standard deviation = 6.
a) What score would a student need to be in the top 15%?
b) What score represents the 45th percentile?
c) If 200 students took the exam, how many would you expect to score below 30?

Reading Materials

Bluman, A. G (1998). Elementary statistics: Step by step Approach, Boston: McGraw-Hill

Cohen, Louis and M. Holliday ( ). Statistics for Education and Physical Education, New York:
Harper and Row Publishers

Hinkle. D.E. et al. (1994) Applied Statistics for the Behavioral Sciences. Bodyon: Houghton
Miffline Company

McClave, J.T. and Terry Sincich (2003). Statistics (9th ed.), New Jersey: Prentice Hall.

80
UNIT 5: Ethical Standards of assessment

Introduction

In the previous units you have learned about the different assessment related concepts, different
strategies and techniques of assessing students learning, as well as methods of maintaining the
quality of tests. In this unit you will be introduced with ethics as a mechanism of maintaining
quality in our assessment practice. You will be familiarized with some basic standards that are
expected of professional teachers to be ethical in their assessment practices. You will also be
familiarized with some general considerations in addressing diversity in the classroom so as to
make the assessment procedures accessible and free of bias.

Learning Outcomes

Upon completion of this unit, you should be able to:

 List down ethical and professional standards of assessment

 Propose contextualized ethical and professional standards in using assessment

 Sensitize the consequences of unethical use of assessments

 Adhere to the ethical standards of tests and test uses

5.2. Sections and sub-sections

5.2.1 Ethical and Professional Standards of Assessment and its Use


Ethical standards guide teachers in fulfilling their obligation to provide and use tests that are fair
to all test takers regardless of age, gender, disability, ethnicity, religion, linguistic background, or
other personal characteristics.

Fairness is a primary consideration in all aspects of testing. It:


 helps to ensure that all test takers are given a comparable opportunity to demonstrate
what they know and how they can perform in the area being tested.
 implies that every test taker has the opportunity to prepare for the test and is informed
about the general nature and content of the test.
 also extends to the accurate reporting of individual and group test results.

81
The following are some ethical standards that teachers may consider in their assessment
practices.

1. Teachers should be skilled in choosing assessment methods appropriate for instructional


decisions. Skills in choosing appropriate, useful, administratively convenient, technically
adequate, and fair assessment methods are prerequisite to good use of information to
support instructional decisions. Teachers need to be well-acquainted with the kinds of
information provided by a broad range of assessment alternatives and their strengths and
weaknesses. In particular, they should be familiar with criteria for evaluating and
selecting assessment methods in light of instructional plans.

2. Teachers should develop tests that meet the intended purpose and that are appropriate for
the intended test takers. This requires teachers to:
 Define the purpose for testing, the content and skills to be tested, and the intended
test takers.
 Develop tests that are appropriate with content, skills tested, and content coverage
for the intended purpose of testing.
 Develop tests that have clear, accurate, and complete information.
 Develop tests with appropriately modified forms or administration procedures for
test takers with disabilities who need special accommodations.

3. The teacher should be skilled in administering, scoring and interpreting the results from
diverse assessment methods. It is not enough that teachers are able to select and develop
good assessment methods; they must also be able to apply them properly. This requires
teachers to:
 Follow established procedures for administering tests in a standardized manner.
 Provide and document appropriate procedures for test takers with disabilities who
need special accommodations or those with diverse linguistic backgrounds.
 Protect the security of test materials, including eliminating opportunities for test
takers to obtain scores by fraudulent means.
 Develop and implement procedures for ensuring the confidentiality of scores.

82
4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school improvement.
Assessment results are used to make educational decisions at several levels: in the
classroom about students, in the community about a school and a school district, and in
society, generally, about the purposes and outcomes of the educational enterprise.
Teachers play a vital role when participating in decision-making at each of these levels
and must be able to use assessment results effectively.

5. Teachers should be skilled in developing valid pupil grading procedures which use pupil
assessments. Grading students is an important part of professional practice for teachers.
Grading is defined as indicating both a student's level of performance and a teacher's
valuing of that performance. The principles for using assessments to obtain valid grades
are known and teachers should employ them.

6. Teachers should be skilled in communicating assessment results to students, parents,


other lay audiences, and other educators. Teachers must routinely report assessment
results to students and to parents or guardians. In addition, they are frequently asked to
report or to discuss assessment results with other educators and with diverse lay
audiences. If the results are not communicated effectively, they may be misused or not
used. To communicate effectively with others on matters of student assessment, teachers
must be able to use assessment terminology appropriately and must be able to articulate
the meaning, limitations, and implications of assessment results. Furthermore, teachers
will sometimes be in a position that will require them to defend their own assessment
procedures and their interpretations of them. At other times, teachers may need to help
the public to interpret assessment results appropriately.
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment information. Fairness, the rights of all
concerned, and professional ethical behavior must undergird all student assessment
activities, from the initial planning for and gathering of information to the interpretation,
use, and communication of the results. Teachers must be well-versed in their own ethical
and legal responsibilities in assessment. In addition, they should also attempt to have the
inappropriate assessment practices of others discontinued whenever they are encountered.

83
Teachers should also participate with the wider educational community in defining the
limits of appropriate professional behavior in assessment.

In addition, the following are principles of grading that can guide the development of a grading
system.
1. The system of grading should be clear and understandable (to parents, other stakeholders,
and most especially students).

2. The system of grading should be communicated to all stakeholders (e.g., students,


parents, administrators).

3. Grading should be fair for all students regardless of gender, socioeconomic status or any
other personal characteristics.

4. Grading should support, enhance, and inform the instructional process.

Project work: In groups of five, prepare interview questions or a questionnaire to assess the
extent to which assessment ethics is respected in the school you are placed for your Practicum
experience. Using the instrument you have prepared collect data from the concerned members of
the school community (teachers, students), analyze the data and reach valid conclusions. You
have to prepare a report your conclusions as well as the procedures you have gone through to
reach your conclusion.

5.2.2. Ethnicity and Culture in tests and assessments


In the previous section you have learned that fairness is the fundamental principle that has to be
followed in teachers’ assessment practices. It has been said that all students have to be provided
with equal opportunity to demonstrate the skills and knowledge being assessed. Fairness is
fundamentally a socio-cultural, rather than a technical, issue . Thus, in this section we are going
to see how culture and ethnicity may influence teachers’ assessment practices and what
precautions we have to take in order avoid bias and be accommodative to students from all
cultural groups.

Do you believe that culture and ethnicity have any role in teachers’ assessment practices? In you
university experience, have you observed situations where instructors were biased in the
assignment of grades to students based on culture and ethnicity? If so, do you think that was
fair?

84
Students represent a variety of cultural and linguistic backgrounds. If the cultural and linguistic
backgrounds are ignored, students may become alienated or disengaged from the learning and
assessment process. Teachers need to be aware of how such backgrounds may influence student
performance and the potential impact on learning. Teachers should be ready to provide
accommodations where needed.

Classroom assessment practices should be sensitive to the cultural and linguistic diversity of
students in order to obtain accurate information about their learning. Assessment practices that
attend to issues of cultural diversity include those that
 acknowledge students’ cultural backgrounds.
 are sensitive to those aspects of an assessment that may hamper students’ ability to
demonstrate their knowledge and understanding.
 use that knowledge to adjust or scaffold assessment practices if necessary.

Assessment practices that attend to issues of linguistic diversity include those that
 acknowledge students’ differing linguistic abilities.
 use that knowledge to adjust or scaffold assessment practices if necessary.
 use assessment practices in which the language demands do not unfairly prevent the
students from understanding what is expected of them.
 use assessment practices that allow students to accurately demonstrate their
understanding by responding in ways that accommodate their linguistic abilities, if the
response method is not relevant to the concept being assessed (e.g., allow a student to
respond orally rather than in writing).

Teachers must make every effort to address and minimize the effect of bias in classroom
assessment practices. Bias occurs when irrelevant or arbitrary factors systematically influence
interpretations and results made that affect the performance of an individual student or a
subgroup of students. For example, bias may occur when variables—such as cultural and
language differences and socioeconomic status—are not fairly accounted for when interpreting
results from an assessment.

Assessment should be culturally and linguistically appropriate, fair and bias-free. It may not be
possible to totally eliminate all forms of bias from classroom assessments. However, teachers

85
and others who assess students’ learning should recognize that bias is an ever-present concern to
student assessment and be vigilant and resistant to the sources of bias, including plans for
identifying and addressing bias. For an assessment task to be fair, its content, context, and
performance expectations should:
 reflect knowledge, values, and experiences that are equally familiar and appropriate to all
students;
 tap knowledge and skills that all students have had adequate time to acquire;
 be as free as possible of cultural and ethnic stereotypes.

5.2.3. Disability and Assessment Practices


It is quite obvious that our education system was exclusionary in fully accommodating the
educational needs of disabled students. This has been true not only in our country but in the rest
of the world as well, although the magnitude might differ from country to country. It was in
response to this situation that UNESCO has been promoting the principle of inclusive education
to guide the educational policies and practice of all governments. Different world conventions
were held and documents signed towards the implementation of inclusive education. Our
country, Ethiopia, has been a signatory of these documents and therefore has accepted inclusive
education as a basic principle to guide its policy and practice in relation to the education of
disabled students

Activity: In groups of five, find and discuss on the following documents and briefly report the
ideas each document addresses in relation to inclusive education:
1. The Dakar Framework For action (2000)
2. The Salamanca Statement and Framework for Action in Special Needs Education (1994)
UN Convention on the Rights of Persons with Disabilities (2006)

One group should work on one convention and documents can be found from the internet.

Inclusive education is based on the idea that all students, including those with disabilities, should
be provided with the best possible education to develop themselves. This implies for the
provision of all possible accommodations to address the educational needs of disabled students.
Accommodations should not only refer to the teaching and learning process. It should also
consider the assessment mechanisms and procedures.

86
Activity: In small groups, discuss on what type of accommodations that can be made to make
assessment practices accessible to students with different types of disabilities. Each group may
discuss on one type of disability and share its ideas to the other groups.

There are different strategies that can be considered to make assessment practices accessible to
students with disabilities depending on the type of disability. In general terms, however, the
following strategies could be considered in summative assessments:

 Modifying assessments: - This should enable disabled students to have full access to the
assessment without giving them any unfair advantage.
 Others’ support: - Disabled students may need the support of others in certain
assessment activities which they can not do it independently. For instance, they may
require readers and scribes in written exams; they may also need others’ assistance in
practical activities, such as using equipments, locating materials, drawing and measuring.
 Time allowances: - Disabled students should be given additional time to complete their
assessments which the individual instructor has to decide based on the purpose and nature
of the assessment.
 Rest breaks: Some students may need rest breaks during the examination. This may be
to relieve pain or to attend to personal needs.
 Flexible schedules: In some cases disabled students may require flexibility in the
scheduling of examinations. For example, some students may find it difficult to manage a
number of examinations in quick succession and need to have examinations scheduled
over a period of days.
 Alternative methods of assessment:- In certain situations where formal methods of
assessment may not be appropriate for disabled students, the instructor should assess
them using non formal methods such as class works, portfolios, oral presentations, etc.
 Assistive Technology: Specific equipment may need to be available to the student in an
examination. Such arrangements often include the use of personal computers, voice
activated software and screen readers.

5.2.4 Gender issues in assessment


Do you feel that gender has any influence in teachers’ assessment practices? Is there any
gender-related stereotype in relation to assessment results? Share your reflections with your
friends.

87
Teachers’ assessment practices can also be affected by gender stereotypes. The issues of gender
bias and fairness in assessment are concerned with differences in opportunities for boys and
girls. A test is biased if boys and girls with the same ability levels tend to obtain different scores.

Test questions should be checked for:

 material or references that may be offensive to members of one gender,


 references to objects and ideas that are likely to be more familiar to men or to women,
 unequal representation of men and women as actors in test items or representation of
members of each gender only in stereotyped roles.

If the questions involve objects and ideas that are more familiar or less offensive to members of
one gender, then the test may be easier for individuals of that gender. Standards for achievement
on such a test may be unfair to individuals of the gender that is less familiar with or more
offended by the objects and ideas discussed, because it may be more difficult for such
individuals to demonstrate their abilities or their knowledge of the material.

Unit Summary

In this unit you have learned that ethics is a very important issue we have to follow in our
assessment practices. And the most important ethical consideration is fairness. If we are to draw
reasonably good conclusions about what our students have learned, it is imperative that we make
our assessments—and our uses of the results—as fair as possible for as many students as
possible. A fair assessment is one in which students are given equitable opportunities to
demonstrate their abilities and knowledge.

Teachers must make every effort to address and minimize the effect of bias in classroom
assessment practices. Biases in assessment can occur because of differences in culture or
ethnicity, disability as well as gender. To ensure suitability and fairness for all students, teachers
need to check the assessment strategy for its appropriateness and if there are cultural, disability
and gender biases.

Equitable assessment means that students are assessed using methods and procedures most
appropriate to them. Classroom assessment practices should be sensitive and diverse enough to

88
accommodate all types of diversity in the classroom in order to obtain accurate information about
their learning.

Self-Check Exercises
1. What is the meaning of fairness in assessment?
2. What are the basic ethical standards that teachers may consider in their assessment
practices?
3. How does culture and ethnicity influence teachers’ assessment practices?
4. What strategies can teachers to follow to address the special needs of disabled students
during tests?

89

You might also like