0% found this document useful (0 votes)
91 views155 pages

EDU 423 Module 1 4 Measurement and Evaluation

Past papers exams.

Uploaded by

frankowan01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views155 pages

EDU 423 Module 1 4 Measurement and Evaluation

Past papers exams.

Uploaded by

frankowan01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

EDU 423 MEASUREMENT AND EVALUATION

MODULE 1 AN OVERVIEW OF MEASUREMENT AND


EVALUATION

Unit 1 Definitions and purposes of Measurement and Evaluation.


Unit 2 Historical Development of Testing and Evaluation
Unit 3 Importance and Functions of Tests in Education

UNIT 1 DEFINITIONS AND PURPOSES OF


MEASUREMENT AND EVALUATION

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Meaning of Terms
5.1.1 Test and Testing
5.1.2 Assessment
5.1.3 Measurement
5.1.4 Evaluation
5.2 Types of Evaluation
5.2.1 Placement Evaluation
5.2.2 Formative Evaluation
5.2.3 Diagnostic Evaluation
5.2.4 Summative Evaluation
5.3 The purposes of measurement and Evaluation
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In this unit, the meaning of assessment, test and testing measurement and evaluation
will be given. The purpose of carrying out measurement and evaluation will also be
given since the primary purpose of educationally measuring and evaluating the
learner is to utilize the results for the improvement of teaching-learning.

2.0 OBJECTIVES

By the end of this unit you will be able to:


 explain the terms test and testing;
 define the term assessment;
 clarify the terms measurement and evaluation;
88
EDU 423 MEASUREMENT AND EVALUATION

 list the purposes of measurement and evaluation; and


 explain the types of evaluation.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Inventory: This is a detailed list of activities, things statement etc from


which a student is required to indicate the one he likes or the one which are
characteristics of him. From the student‟s choice, his interests are known and
they may be used to offer guidance services to him. E.g. study habit
inventory by Bakare (1970).

 Questionnaire: It consists of a set of questions constructed on a particular


area (e.g. interest attitude) in order to generate the desired information
needed.

 Opinionnaire scale: This is also known as attitude scale it comprises a series


of statement that shows individual likes or dislike for an object, person,
subject career etc.

 Cognitive traits: This deals mainly with mental or intellectual processing of


information that is knowledge aspect of learning e.g. mental power,
reasoning ability, processing speed and learning skills.

 Non-cognitive traits: Concerned with the act or process of knowing,


perceiving, and responding.

 Instrument: A device for measuring the present value of a quantity under


observation.

 Self-rating: This involves asking the individual what he/she feel, think, say
and does about a particular thing.

 Interview: This involve a teacher (interviewer) asking question orally to a


student or client (interviewee) which he (student) responds to on face to face
basis.

 Observations: This simply means watching carefully and recording the


features, trait of a person during investigation.

89
EDU 423 MEASUREMENT AND EVALUATION

 Readiness tests: This is a test designed to measure an individual willingness


to embark on a programme or cause of study.

 Ability tests: This measures the power of an individual to perform a given


task.

 Aptitude tests: In-born talent or capacity to do something in the present


future.

 Achievement test: This measures learning that has taken place recently.

5.0 MAIN CONTENT

5.1 Meaning of Terms

5.1.1 Test and Testing

Simply put, a test is a measuring tool or instrument in education. More specifically,


a test is considered to be a kind or class of measurement device typically used to
find out something about a person. Most of the times, when you finish a lesson or
lessons in a week, your teacher gives you a test. This test is an instrument given to
you by the teacher in order to obtain data on which you are judged. It is an
educationally common type of device which an individual completes himself or
herself, the intent is to determine changes or gains resulting from such instruments
as inventory, questionnaire, opinionnaire, scale etc.

Testing on the other hand is the process of administering the test on the pupils. In
other words the process of making you or letting you take the test in order to obtain
a quantitative representation of the cognitive or non-cognitive traits you possess is
called testing. So, the instrument or tool is the test and the process of administering
the test is testing.

5.1.2 Assessment

Now that you have learnt the difference between test and testing. Let us move on to
the next concept which is assessment. As a teacher, you will be inevitably involved
in assessing learners; therefore you should have a clear knowledge and the meaning
of assessment.

The term, “assess” is derived from a Latin word “asoidere” meaning “to sit by” in
judgment. There are many definitions and explanations of assessment in education.
Let us look at few of them.

i. According to Freeman and Lewis (1998), to assess is to judge the extent of


students‟ learning.
ii. Rowntree (1977) sees assessment in education as the something that occurs
whenever one person, in some kind of interaction, direct or indirect, with
90
EDU 423 MEASUREMENT AND EVALUATION

another, is conscious of obtaining and interpreting information about the


knowledge and understanding, of abilities and attitudes of that other person.
To some extent or other, it is an attempt to know the person.
iii. To Erwin, in Brown and Knight, (1994), assessment is a systematic basis for
making inference about the learning and development of students… the
process of defining, selecting, designing, collecting, analyzing, interpreting
and using information to increase students‟ learning and development.

You will have to note from these definitions that:

 Assessment is a human activity.


 Assessment involves interaction, which aims at seeking to understand what the
learners have achieved.
 Assessment can be formal or informal.
 Assessment may be descriptive rather than judgment in nature.
 Its role is to increase students‟ learning and development.
 It helps learners to diagnose their problems and to improve the quality of their
subsequent learning.

5.1.3 Measurement

This is a broad term that refers to the systematic determination of outcomes or


characteristics by means of some sort of assessment device. It is a systematic
process of obtaining the quantified degree to which a trait or an attribute is present
in an individual or object. In other words it is a systematic assignment of numerical
values or figures to a trait or an attribute in a person or object. For instance what is
the height of Uche? What is the weight of the meat? What is the length of the
classroom? In education, the numerical value of scholastics ability, aptitude,
achievement etc can be measured and obtained using instruments such as paper and
pencil test. It means that the values of the attribute are translated into numbers by
measurement.

5.1.4 Evaluation

According to Tuckman (1975), evaluation is a process wherein the parts, processes,


or outcomes of a programme are examined to see whether they are satisfactory,
particularly with reference to the stated objectives of the programme, our own
expectations, or our own standards of excellence.

According to Cronbach et al (1980), evaluation means the systematic examination


of events occurring in and consequent on a contemporary programme. It is an
examination conducted to assist in improving this programme and other
programmes, having the same general purpose.

91
EDU 423 MEASUREMENT AND EVALUATION

For Thorpe (1993), evaluation is the collection analysis and interpretation of


information about training as part of a recognized process of judging its
effectiveness, its efficiency and any other outcomes it may have.
If you study these definitions very well, you will note that evaluation as an integral
part of the instructional process involves three steps. These are

i. identifying and defining the intended outcomes,


ii. constructing or selecting tests and other evaluation tools relevant to the
specified outcomes, and
iii. using the evaluation results to improve learning and teaching.

You will also note that evaluation is a continuous process. It is essential in all fields
of teaching and learning activity where judgment needs to be made.

5.2 Types of Evaluation

The different types of evaluation are: placement, formative, diagnostic and


summative evaluations.

5.2.1 Placement Evaluation

This is a type of evaluations carried out in order to fix the students in the appropriate
group or class. In some schools for instance, students are assigned to classes
according to their subject combinations, such as science, Technical, arts,
Commercial etc. before this is done an examination will be carried out. This is in
form of pretest or aptitude test. It can also be a type of evaluation made by the
teacher to find out the entry behaviour of his students before he starts teaching. This
may help the teacher to adjust his lesson plan. Tests like readiness tests, ability tests,
aptitude tests and achievement tests can be used.

5.2.2 Formative Evaluation

This is a type of evaluation designed to help both the student and teacher to pinpoint
areas where the student has failed to learn so that this failure may be rectified. It
provides a feedback to the teacher and the student and thus estimating teaching
success e.g. weekly tests, terminal examinations etc.

5.2.3 Diagnostic Evaluation

This type of evaluation is carried out most of the time, as a follow up evaluation to
formative evaluation. As a teacher, you have used formative evaluation to identify
some weaknesses in your students. You have also applied some corrective measures
which have not showed success. What you will now do is to design a type of
diagnostic test, which is applied during instruction to find out the underlying cause
of students‟ persistent learning difficulties. These diagnostic tests can be in the form
of achievement tests, performance test, self rating, interviews observations, etc.

92
EDU 423 MEASUREMENT AND EVALUATION

5.2.4 Summative evaluation

This is the type of evaluation carried out at the end of the course of instruction to
determine the extent to which the objectives have been achieved. It is called a
summarizing evaluation because it looks at the entire course of instruction or
programme and can pass judgment on both the teacher and students, the curriculum
and the entire system. It is used for certification. Think of the educational
certificates you have acquired from examination bodies such as WAEC, NECO, etc.
These were awarded to you after you had gone through some types of examination.
This is an example of summative evaluation.

5.3 The Purpose of Measurement and Evaluation


The main purposes of measurement and evaluation are listed below.

i. Placement of student, which involves bringing students appropriately in the


learning sequence and classification or streaming of students according to
ability or subjects.
ii. Selecting the students for courses – general, professional, technical,
commercial etc.
iii. Certification: This helps to certify that a student has achieved a particular
level of performance.
iv. Stimulating learning: This can be motivation of the student or teacher,
providing feedback, suggesting suitable practice etc.
v. Improving teaching: by helping to review the effectiveness of teaching
arrangements.
vi. For research purposes.
vii. For guidance and counseling services.
viii. For modification of the curriculum purposes.
ix. For the purpose of selecting students for employment.
x. For modification of teaching methods.
xi. For the purposes of promotions to the student.
xii. For reporting students‟ progress to their parents.
xiii. For the awards of scholarship and merit awards.
xiv. For the admission of students into educational institutions.
xv. For the maintenance of students.

6.0 ACTIVITY

1. Differentiate between aptitude and attitude test.


2. Differentiate between Assessment, measurement and evaluation.
3. List ten purposes of educational testing, measurement and evaluation.

93
EDU 423 MEASUREMENT AND EVALUATION

7.0 SUMMARY

In general, those practitioners in the educational system are most of the times,
interested in ascertaining the outputs of the educational programme. Output is
counted in terms of test results which are naturally expressed in quantitative indices
such as scores or marks. Test, which is a device, an instrument or a tool consisting
of a set of tasks or questions, is used to obtain the results. Test can be in the form of
pen and paper examination, assignments, practical etc. The process of administering
this test is called testing. But an act of measurement is done when we award marks
to an answer paper or assignment.

So measurement gives the individual‟s ability in numerical indices of scores i.e.


measurement is quantitative. Assessment can be seen as the engine that drives and
shapes learning, rather than simply an end of term examination that grades and
reports performance. Evaluation is expressed in qualitative indices such as good,
excellent pass or fail.

Value judgment is therefore attached to the measurement.


Evaluation can be placement, formative, diagnostic or summative.

8.0 ASSIGNMENT

1. List any four types of evaluation.


2. Explain any two of the characteristic listed.
3. List 5 example each of physical and psychological measurement.

9.0 REFERENCES
Obimba, F. U. (1989) Fundamental of Measurement and Evaluation in Education
and Psychology. Owerri, Totan Pub. Ltd.

Omorogiuwa O. K (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

STRAIDE Handbook (2002) Assessment and Evaluation in Distance Education.


New Delhi: A Publication of Indira Gandhi National Open University
(IGNOU).

94
EDU 423 MEASUREMENT AND EVALUATION

UNIT 2 HISTORICAL DEVELOPMENT OF TESTING AND


EVALUATION

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Early Measurement of Individual Difference
5.2 Test Organizations in Nigeria
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the last unit, you read through important definitions in measurement and
evaluation. You also saw the types of evaluation and the purposes of evaluation. In
this unit, we shall move another step to look at the historical development of testing
and evaluation. This may help you to appreciate the course the more, and also
appreciate the early players in the course.

2.0 OBJECTIVES

After working through this unit you should be able to:

 trace the historical development of testing and evaluation;


 mention some of the early players in testing and evaluation; and
 mention some of the testing organizations in Nigeria.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit with care.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Variability: Variability is also called measures of spread, scatter or


dispersion. They are those measures or indices that express quantitatively the

95
EDU 423 MEASUREMENT AND EVALUATION

extent to which the scores in a given distribution cluster together or scatter


apart.
 Pearson’s product-moment correlation: This is a measure of the strength
and direction of a linear relationship between two variables at interval or
ratio level.

 Intelligence test: Wechsler in Ukwuije (1993) define intelligence test as the


ability to learn (by the educator); ability to adapt to the environment (by the
biologist); ability to adduce relationship (by the psychologist); ability to
process information very quickly (by the computer scientist). However,
intelligence involves one‟s ability to excel in any task one participate.

5.0 MAIN CONTENT

5.1 Early Measurement of Individual Differences

The history of measurement started with the invention of tests to measure individual
differences in skills among adults. In January 1796, the astronomer royal of
Greenwich observatory in England – Maskelyne, was recorded to have dismissed
his assistant, Kinnebrook, for recording the movement of stars across the telescope
with eight-tenths of a second later than his. According to Tuckman (1975), between
1820 and 1823, a German astronomer –Bessel improved on the work of Maskelyne
by demonstrating the variability in personal equations and observations. He argued
that fluctuations existed from occasion to occasion and from person to person. This
means that there is a variation in the simple reaction time or a measure of the time
required to react to a simple stimulus.

In 1863, a half cousin of Charles Darwin, Sir Francis Galton worked on individual
differences. In 1883, he published a book entitled: Inquires into the Human Faculty
and Development. His work was regarded as the beginning of mental tests. Have
you heard of his tests? In 1884, Galton opened an anthropometric laboratory to
collect the characteristic measurements of people within the same period Mckeen
Cathel, an American Psychologist, was also studying individual differences in
primary physical terms.

These were the earliest recorded history of testing. But you will note that early
measurement approaches in history both written and oral, were informal. The first
written tests were the informal examinations used by the Chinese to recruit people
into the civil service. This was about 2200 BC.

The oral examinations conducted by Socrates in the 5th century B.C were informal.
In America, educational achievement tests were used for assessment through oral
examinations before 1815. You have read about Galton, James Cattel etc. and their
roles in the history of test development. There are others, let us briefly mention very
few of them in this section. Karl Pearson developed the Pearson product-moment
correlation coefficient which is useful in checking the reliability and validity of
standardized tests.
96
EDU 423 MEASUREMENT AND EVALUATION

Edward L. Thorndike was a former student of Cattels. He made major contributions


in achievement testing.

By 1904, Alfred Binet had established himself as France‟s premier psychologist,


expert in human individual differences. He studied the differences between bright
and dull children. In 1904, he developed a test for measuring intelligence of children
with his assistant, Theodore Simons. This test is called Binet – Simons intelligence
test. In 1916, Louis Terman and his associates at Stanford University revised the
Binet-Simon scale and brought out the Stamford-Binet version.

Group-tests development started during the World War I when the need to measure
the intelligence of soldiers so as to assign them to different tasks and operations
arose. As a result group of psychologists including Yerkes, R.M and Otis, A.
developed the Army Alpha, which is a written group intelligence test, and Army
Beta, which is the individual non-verbal intelligence test. Others are: David
Wechsler who developed series of individual intelligence scales from 1939 to 1967;
George Fisher, an Englishman who developed the first standardized objectives test
of achievement in 1864 and J.M. Rice, an American developed the standard spelling
objective scale in 1897. The list is not comprehensive yet.

Meanwhile, let us come to the Nigerian situation.

5.2 Test Organizations in Nigeria

Do you know any organization in Nigeria which conduct examinations or develop


examination questions? You must have taken note of WAEC, NECO, JAMB,
NABTEB etc.

The West African Examinations Council was established first on 31st December
1951 in Gambia and in 1952 in Nigeria to serve as examination body for the West
African countries; i.e. the Anglophone countries. WAEC conducts such
examinations as School Certificate Examinations (GCE), Royal Society of Arts
(RSA) and the City and Guilds Examinations.

The Joint Admission and Matriculation Board (JAMB) established in 1976 is


charged with the responsibility of conducting common entrance examinations for
the universities, colleges of education and polytechnics in Nigeria. The National
Business and Technical Education Board NABTEB is charged with the
responsibility of organizing examinations and certification of business and technical
or vocational examinations. Its headquarters is located in Benin.

The National Examinations Council (NECO) is established to organise examinations


for school certificates-both senior and junior, and the common entrance
examinations to the unity schools in Nigeria. It is located in Minna, Niger state.
The International Center for Educational Evaluation (ICEE) is concerned with
educational evaluation in Nigeria. It is located at the Institute of Education of the
University of Ibadan. It offers higher degree programmes in:
97
EDU 423 MEASUREMENT AND EVALUATION

 evaluation of educational achievement


 evaluation of innovative educational programmes
 evaluation of public examinations
 evaluation of curriculum materials
 evaluation of teaching and learning strategies

Apart from the national bodies, there are some or most of the state education boards
and local education boards that have their assessment units. Sometimes, these units
can develop joint examinations for secondary school students or primary school
pupils as the case may be. Does your state education board organize joint
examinations for students? Even the primary school certificate examinations are
organized by the states.

Some states organize their own junior school certificate examinations. Even
companies these days organize some aptitude tests for recruitment or employment
exercises.

Some ministries organize promotion examinations for their staff. These are some
types of tests and measurement.

6.0 ACTIVITY

1. List 4 British countries that conduct WAEC in West Africa.


2. Mention any 3 examination bodies that existed before the introduction of
WAEC.
3. Give the full meaning of the following acronyms WAEC, NECO, JAMB,
NABTEB.

7.0 SUMMARY

In this unit, we have traced the history of test and measurement, starting from 1796
when the astronomer royal of Greenwich recheck in England – Maskelyne
dismissed his assistant for recoding observations which were less by 8/10 of his
own. You noted that the variation in the records was as a result of individual
differences in the handling of the instruments of observation. You will recall that it
was between 1820 and 1823 that a German astronomer – Bessel worked on
variability in individual observations and came out with the simple reaction time. Sir
Francis Galton was associated with the development of the mental test. Galton
worked in his anthropometric laboratory to collect the characteristic measurements
of people. The American psychologist Mckeen Cattel studied individual differences
in primary physical terms. These measurements were all informal. Do you
remember the Chinese, Socrates, Pearson, Thorndike, Binet, Terman etc, all had
something to do in the early development of test and measurement. In Nigeria,
organizations like WAEC, NECO, NABTEB, JAMB, and ICEE are all in the
activities of measurement.

98
EDU 423 MEASUREMENT AND EVALUATION

We shall see the importance of tests in education in the next unit. This will help you
place a value on this course.

8.0 ASSIGNMENT

1. State 3 rationales for the introduction of NECO in Nigeria education system.


2. List 4 uses of test by class teacher.
3. Criticize test to the best of your knowledge, using Nigeria experience.

9.0 REFERENCES/FURTHER READINGS

Asuru, V. A. (2006). Measurement and Evaluation in Education and Psychology.


Port Harcourt: Minson Nigeria Limited.

Obimba, F. U. (1989) Fundamental of Measurement and Evaluation in Education


and Psychology. Abuja, Ibadan, Lagos, Owerri: Totan Publishers Ltd.

Monday, T. J. (2005). Fundamentals of Tests and Measurement Education. Calabar:


Ultimate Index Book Publishers, University of Calabar Press.

99
EDU 423 MEASUREMENT AND EVALUATION

UNIT 3 IMPORTANCE AND FUNCTIONS OF TESTS IN


EDUCATION

CONTENT

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Functions of Tests in Education
5.1.1 To Motivate the Pupils to Study
5.1.2 To Determine how much the Pupils have Learned
5.1.3 To Determine the Pupils‟ Special Difficulties
5.1.4 To Determine the Pupils‟ Special Abilities
5.1.5 To Determine the Strength and Weakness of the Teaching
Methods
5.1.6 To Determine the Adequacy or Otherwise, of Instructional
Materials
5.1.7 To Determine the Extent of Achievement of the Objectives
5.2 Measurement Scales
5.2.1 Nominal Scale
5.2.3 Ordinal Scale
5.2.4 Interval Scale
5.2.5 Ratio Scale
6.0 Assignment
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

Tests, as the instruments of measurement and evaluation in education are very


important as their functions constitute the rationale for the study of the course. Tests
serve various purposes in the educational system. They serve both primary and
secondary or supplementary purpose. The primary purpose is to make use of the
results of evaluation for the improvement of teaching and learning. Whatever other
uses the results are put to are regarded as secondary to this primary use. In this unit
we shall be looking at the importance and functions of tests in education and the
scales of measurement.

2.0 OBJECTIVES

By the end of this unit, you should be able to:


 explain the functions of tests in education; and
 explain the measurement scales.
100
EDU 423 MEASUREMENT AND EVALUATION

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Instructional resources: These are those resources used by the teacher in his
teaching/learning process. They include, chalk board, test books etc

 Magnitude: This has to do with the extent, degree or amount of an object or


quantity

 Arithmetic operation: These are all the mathematical calculations that can
be done on a given set of data e.g. division, multiplication, addition and
subtraction (BODMAS) quantities

5.0 MAIN CONTENT

5.1 Functions of Tests in Education

In section 3.3 of unit I, we discussed the purposes of measurement and evaluation,


but you are aware that test is the instrument used in measurement and evaluation.
Therefore, apart from those purposes, which in one way or the other are applicable
to tests, we shall explain the main functions of tests in the educational system in this
section. As a teacher, when you set a test for your class, what do you do with the
results? How does the test help you in the teaching and learning situation?

According to Nwana (1981), a test can fulfill a variety of functions.

These are:
(a) motivate pupils to study
(b) determine how much the pupils have learned
(c) determine the pupils‟ special difficulties
(d) determine the pupils‟ special abilities
(e) determine the strength and weakness of the teaching method
(f) determine the adequacy or otherwise of instructional resources
(g) determine the extent of achievement of the objectives etc.

101
EDU 423 MEASUREMENT AND EVALUATION

5.1.1 To Motivate the Pupils to Study

When you go to the Church or Mosque, every Sunday or Friday, you listen to the
preacher as he exposes the Word, you are not subjected to any written or verbal
examination on the substance of the preaching. Therefore, the effect of the
preaching on you cannot be determined. Thus, whether you are sleeping or paying
attention to the preacher is left for you and you alone. This is not the case in a
school situation where we need to verify our efforts as soon as possible. This is why
test are regularly used to motivate pupils to learn. They study hard towards their
weekly, terminal or end of the year promotion examinations.

Without these tests, many of the pupils would be reluctant to make out time for
private studies while some of them would be less likely to be attentive when the
teacher is teaching, no matter how interesting and lively the teaching may be. You
can see that listening to a teacher who is not going to give a test is like listening to
the preacher in the church or mosque. But the risk here is that when the tests are too
many, pupils start working to pass them.

5.1.2 To Determine How Much the Pupils Have Learned

A test can be used to find out the extent to which the contents have been covered or
mastered by the testees. For instance, if you treat a topic in your class; at the end
you give a test and many of your students score high marks. This is an indication
that they have understood the topic very well. But if they score very low marks, it
implies that your efforts have been wasted. You need to do more teaching. It is the
results of the test that will help you decide whether to move to the next topic or
repeat the current topic.

5.1.3 To Determine the Pupils’ Special Difficulties

Tests can be constructed and administered to students in order to determine


particular problems of students. This is done in order to determine appropriate
corrective actions. This identification of weaknesses and strength on the part of the
students is the diagnostic use of tests. It helps in the desirable effort to give pupils,
individuals or group remedial attention. Can you think of any of such tests before
you continue to do this activity?

5.1.4 To Determine the Pupils Special Abilities

Tests can be used as a measure to indicate what a person or a group or persons or


students can do. These can be measures of aptitudes – capacity or ability to learn
and measures of achievement or attainment. These can be done using aptitude tests
and achievement tests. The major concentration of the class teacher is achievement
test which he is expected to use to promote learning and bring about purposeful and
desirable changes in the students entrusted to him.

102
EDU 423 MEASUREMENT AND EVALUATION

5.1.5 To Determine the Strength and Weakness of the Teaching Methods

The results of classroom tests provide empirical evidence for the teacher to know
how well or how effective his teaching methods are. Test results are used as self –
evaluation instrument. They can be used by others to evaluate the teacher. If the
results are not encouraging, the teacher may decide to review his teaching methods
with a view to modifying or changing to another.

5.1.6 To Determine the Adequacy or otherwise of Instructional Materials

A good teacher makes use of a variety of teaching aids for illustrations and
demonstrations. Effective use of these instructional resources helps to improve
students understanding of the lesson. Topics which look abstract can be brought to
concrete terms by the use of these materials. Therefore to determine the
effectiveness, adequacy or otherwise of these teaching aids test can be used.

5.1.7 To Determine the Extent of Achievement of the Objectives

There are goals and objectives set for the schools. Every school is expected to
achieve the goals and objectives through the instructional programmes. The results
of tests given to students are used to evaluate how well the instructional
programmes have helped in the achievement of the goals and objectives.

5.2 Measurement Scales

Measurement exists in several levels depending on what is to be measured, the


instrument to be employed, the degree of accuracy or precision desired and the
method of measurement. There are four levels of measuring scales. These are
nominal, ordinal, interval and ratio scales.

5.2.1 Nominal Scale

Some of the times, you are involved in the classification of objects, things, human
beings, animals, etc. For instance, you can classify human beings as males and
females; you can also classify living things as plants and animals.

Have you ever assigned your students into classes or groups? Nominal scale which
is the simplest of the measurement scales involves only the assignment to classes or
groups and does not imply magnitude. It is used when we are interested in knowing
if certain objects belong to the same or different classes. We may decide to assign
teachers in the school system into two groups of graduate teachers = I and non-
graduate teacher = 0. You will note that these numbers are just codes, as they do not
indicate magnitude. We may group students into classes A, B, C D, etc.

These are also codes. There is no order involved here, there is nothing like greater
than or less than. Any letter, number or numerals used in a nominal scale have no
103
EDU 423 MEASUREMENT AND EVALUATION

quantitative significance; they are used for convenience only. Nominal scale does
not lend itself to some of the useful arithmetical and statistical operations such as
addition, subtraction, multiplication and division. Do you think that this is a
shortcoming or an advantage?

5.2.2 Ordinal Scale

In the last section (i.e. 3.2.1), you learnt about the nominal scale, which involves
simple classification of objects, things, people or events. The next in simplicity is
the ordinal scale. In this case, there is order. One real number may be greater than or
equal to or less than another real number. For instance, the number 5 is greater than
the number 3. There is classification, as well as indication of size and rank. In your
class, you rank your students using their test result from first, second, third to nth
positions.

This is ranking. It is the order. It is the ordinal scale. You will note that what is
important here is the position of the individuals or things or events in a group. The
ranks cannot be compared. For instance, you cannot say that a student who is 2nd in
a test is twice as good as the student who is ranked 4th and half as good as the
student ranks 1t, in spite of the fact that 2 is half of 4 and 1 is half of 2. let us
illustrate this with an example. Take the scores of four students ranked 5th, 6th, 7th
and 8th as 70%, 68%, 50%, and 49% respectively. You note that the difference
between 5th and 6th positions is only 2%, while between 6th and 7th positions, we
have, not 2% but 18%.

And the difference between 7th and 8th positions is just 1%. It means therefore that
equal intervals on the scale do not represent equal quantities. Therefore, only limited
arithmetical operations can be used on the ordinal scale.

5.2.3 Interval Scale

This is the next scale after ordinal scale. You remember that we said that in the
ordinal scale, equal intervals do not represent equal quantities. But the reverse is the
case in the interval scale. In other words equal intervals represent equal quantities
here. The amount of difference between adjacent intervals on the scale is equal.
Take the calendar as an example, days on it, represent equal amounts of time. You
will observe that equal amount of time separates 2, 4, 6, and 8 days.
In an examination, a student who scored 80% is 10% greater than the student who
scored 70% and is 10% less than one who scored 90%. Because the data here are
continuous, some arithmetic operations like addition, subtraction, can take place
here. For instance, if you say John‟s height is 1.56m, Olu‟s height is 1.66m and
Ibe‟s height is 1.70m. it follows that Ibe is the tallest. He is 0.04m taller than Olu
and 0.14m taller than John. It implies that you can rank as well as get the differences
between the intervals. But there is no absolute zero in the interval scale.

104
EDU 423 MEASUREMENT AND EVALUATION

5.2.4 Ratio

This is the highest level or scale of measurement. It has all the characteristics of the
others. In addition there is an absolute zero here. Most of the arithmetic operations
like addition, subtraction, multiplication and division are involved in this scale. Let
us use the meter rule as an example of ratio scale. This scale has equal intervals as
in the intervals scale. It has a zero mark, which implies that at this point there is
complete absence of what the meter rule measures. If it is used to measure height, it
means that at zero, there is no height at all. This is not the case with the interval. If
you give a test in your class and a student scores zero, does it imply that the student
does not know anything at all? Take the case of the calendar again as an example.
Look at the calendars you have from different sources and places. Is there any one
with zero point? Can there be any zero day or time?

Let us use another example to drive this point home. If you use a weighing balance
to measure the weights of objects, you will discover that at zero mark there is a
complete lack of weight. There is no weight at all. But if you use a test to measure
intelligence of students, any student who scores zero does not imply he has no
intelligence at all. The ratio scale is appropriately used in the physical sciences,
while the interval scale is used in education, social science and psychology.

The table below illustrates the four scale levels.


S/N Scale level Illustration Possible Arithmetic
Operation
1. Nominal Numbers are used only as names or Counting only
groups or classes e.g. female = 0,
Male = 1, Adults = A, Youths = B,
Children = C, Doctors = 1, Teachers =
2, Lawyers = 3
2. Ordinal Numbers are used as rank order. E.g. Counting, Ranking
1st, 2nd, 3rd etc Mary = 1st, Joy = 2nd method greater than,
less than
3. Interval Intervals between any two numbers Counting, Ranking,
are equal both in interval and Addition, and
quantities. E.g. Scores, calendar etc. subtraction
4. Ratio Each number is a distance from zero. Counting, Ranking,
There is absolute zero. E.g. ruler, Addition, Subtraction,
weighing balance, speedometer etc Multiplication and
division

6.0 ACTIVITY

1. What is a scale?
2. Explain the concept of discrete and continuous data in line with the four
levels of scale.

105
EDU 423 MEASUREMENT AND EVALUATION

3. Match the following into their right scale? Number of books, Mathematics
scores, Celsius scale, States and capital in Nigeria, position of students in a
class, cars, houses, telephone numbers, numbers attached to footballer
players, weighing scale, temperature gauge, English language test scores,
ruler and tape.

7.0 SUMMARY

In this unit you have looked at the major functions of test in education. These
include:
 determination to know how much the students have learned
 determination of the students‟ special difficulties
 determination of students‟ special abilities
 determination of the strength and weaknesses of teaching methods
 determination of the adequacy or otherwise of instructional materials
 determination of the extent of achievement of the objectives etc.

You were also shown the four scales of measurement. These are: nominal scale
which involves, simple classification, ordinal scale which involves rank order,
interval scale which involves equal intervals and equal quantities but without
absolute zero, and the ratio scale which has absolute zero and is especially used in
the physical sciences.

8.0 ASSIGNMENT

1. Differentiate between physical with psychological measurement.


2. Carry out a brief comparative analysis of the different scale.
3. Categorise the four scale of measurement from the highest to the simplest.

9.0 REFERENCES

Asuru, V. A. (2006). Measurement and Evaluation in Education and Psychology.


Port Harcourt: Minson Nigeria Limited.

Monday, T. J. (2005). Fundamental of Tests and Measurement Education. Calabar:


Ultimate index book publishers, University of Calabar press.

Nwana, O. C. (1981). Educational Measurement for Teachers. Ikeja: Thomas


Nelson Africa.

Ohuche, R. O. and Akeju S. A. (1977). Testing and Evaluation in Education. Lagos,


Monrovia Owerri Af zrican Educational Resources (AER).

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

106
EDU 423 MEASUREMENT AND EVALUATION

MODULE 2 EDUCATIONAL OBJECTIVES

Unit 1 Educational Objectives


Unit 2 Taxonomy of Educational Objectives
Unit 3 Affective Domain
Unit 4 Psychomotor Domain
Unit 5 Classroom Tests

UNIT 1 EDUCATIONAL OBJECTIVES


CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Various Levels of Educational Objectives
5.2 At the National Level
5.3 At the Institutional Level
5.4 At the Instructional Level
5.5 Importance of Instructional Objectives
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

Objectives generally indicate the end points of a journey. They specify where you
want to be or what you intend to achieve at the end of a process. An educational
objective is that achievement which a specific educational instruction is expected to
make or accomplish. It is the outcome of any educational instruction. It is the
purpose for which any particular educational undertaking is carried out. It is the goal
of any educational task. In this unit we shall look at the difference role educational
objectives assume at different settings.

2.0 OBJECTIVES
By the end of this unit you should be able to:

 list the different levels of educational objectives;


 give the differences between aims and objectives;
 write specific objectives at the instructional level; and
 explain the importance of feedback in instructional situation.

107
EDU 423 MEASUREMENT AND EVALUATION

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Objectives: These are the foundation upon which you can build lessons and
assessments that can meet your overall course or lesson goals.

 Values: Important and lasting beliefs or ideals shared by the members of


a culture about what is good or bad and desirable or undesirable.

 Aspirations: A hope or ambition of achieving something.

 Goals: An observable and measurable end result having one or more


objectives to be achieved within a more or less fixed time frame.

 Monotechnic: connecting to institution especially college providing


instruction in a single technical subject.

5.0 MAIN CONTENT

5.1 Various Level of Educational objectives

Educational objectives can be specified at various levels. These levels include the
national level, the institutional level and the instructional level. In this unit, we shall
look at the various levels.

5.2 At the National Level

At this level of educational objectives, we have merely policy statements of what


education should achieve for the nation. They are in broad outlines, reflecting
national interests, values, aspirations and goals. The objectives are general and
somewhat vague. At this level, they may be interpreted. It can be in the form of the
National Policy on Education. Have you seen the Federal Republic of Nigeria‟s
National Policy on Education?

The latest edition is the 2004 edition and another edition will soon be released. The
national educational aims and objectives are stated, you will try to get a copy for
your use. The goals of education include:

108
EDU 423 MEASUREMENT AND EVALUATION

i. To motivate national consciousness and national unity.


ii. To inculcate the right type of values and attitudes for the survival of the
individual and the Nigerian society.
iii. To train the mind in the understanding of the world around.
iv. To acquire the appropriate skills, abilities and competences both mental and
physical as equipment for the individual to live in and contribute to the
development of the society.

Apart from these goals of education in Nigeria, the National Policy on


Education also specifies objectives for early childhood education, primary
education, secondary education, adult, technical, teacher, special education,
university and other forms of education.

These educational goals can also be specified in the National development


plan. Let us look at one of such goals specified in the Nigerian National
Development Plan. It says that the development of (i) a strong and self-reliant
nation (ii) a great and dynamic economy (iii) a just and egalitarian society (iv)
a land of bright and full opportunity for all citizens and (v) a free and
democratic society. You will recall that we have earlier said that these are
called goals or aims. They are broad and too vague to give focused direction to
curriculum development.

5.3 At the Institutional level

This is the intermediate objectives level. The aims are logically derived and related
to both the ones at the national level and the ones at the instructional levels. What
are the objectives of your university? By the time you look at the educational
objectives of three or four institutions, you would have noticed that the educational
objectives at this institution have been established. They are narrowed to achieve
local needs like the kinds of certificate to be awarded by the institutions. These
institutional objectives are usually specified by an act or edict of the house of
assembly if it is a state government institution, otherwise by an act of the national
assembly.

The National Open University of Nigeria (NOUN) and National Teachers‟ Institute
(NTI) were established by an Act of the Parliament or National Assembly. You can
read it again in your handbook (know your university). These objectives are often in
line with those stipulated for that kind of institution at the national level. These
objectives are not specific. They are also broad aims. For instance, an objective for a
college of education can be to produce intermediate level teachers to teach in the
Pre-primary, primary and secondary schools. What do you think would be the
objectives for a monotechnic which awards diploma in Agriculture?

5.4 At the Instructional Level

You have seen that objectives specified at both the national and institutional level
are all broad goals and aims. These can be realized in bits and pieces at the
109
EDU 423 MEASUREMENT AND EVALUATION

instructional level. Here, educational objectives are stated in the form in which they
are to operate in the classroom.

They are therefore referred to as instructional objectives, behavioural objectives or


learning outcomes. They are specified based on the intended learning outcomes.
These objectives state what teaching is expected to achieve, what the learner is
expected to learn from the instruction, how the learner is expected to behave after
being subjected to the instruction and what he has to do in order to demonstrate that
he has learnt what is expected from the instruction. These instructional objectives
are therefore stated in behavioural terms with the use of action verbs to specify the
desirable behaviour which the learner will exhibit in order to show that he has
learnt. Now, look at the objective at the section 2.0 of this unit or any other unit or
any other course material. How are the objectives specified?

They are example of the instructional objectives. They are learner-centered not
teacher centered. They can easily be assessed on observed. They are specified
according to each unity, lesson etc.

5.5 Importance of Instructional Objectives.

So far, you can see that instructional objectives are very important component of
teaching system. Let us mention some of the importance especially as a feedback
mechanism in the education system. Learning outcomes as displayed by the learners
serve as feedback on how much the instructional objectives have been achieved.
They also show how appropriate the curriculum of the institution is. These
instructional objectives can be used as a feedback on how much the institutional
objectives have been achieved and how appropriate these objectives are.
You remember that objectives start from broad goals at the national level to the
instructional level. In the same way when evaluating these objectives, we use the
instructional level objectives.

From this, evaluation goes to the institutional to the national levels. In other words,
the feedback got from the assessment of the instructional objectives is translated
into finding and how much the national educational objectives have been achieved
in respect to the particular type of institution, and their appropriateness. In the final
analysis, the findings may lead to revising the objectives of any level or all the
levels. They may lead to curriculum modification at the institution level. At the
instructional level, they may lead to the adjustment of teaching methods or provision
of instructional materials. You see, from the small things, activities, tests,
examinations, projects, assignments, exercises, quizzes, home works etc done in the
classroom setting, we can use to evaluate, in a general process, the national policy at
the national level.

Apart from the fact that feedback instructional objectives are also important because
the teacher‟s plans of what to teach and how to teach it is based on the objectives
specified to be achieved. The evaluation of pupils‟ learning outcome will make him

110
EDU 423 MEASUREMENT AND EVALUATION

know whether the objectives are being achieved or not. It means therefore that the
instructional objectives give meaning and direction to the educational process.

6.0 ACTIVITY

1. What is instructional objective?


2. State and explain 3 characteristics of a well-written objective?
3. What are the role of objectives in teaching and learning?

7.0 SUMMARY

In this unit, you learned the various levels of objectives. These are the national level,
the institutional level and the instructional level. We have said that objectives at the
national level are very broad and vague. They are referred to as goals, aims or policy
such as the National Policy on Education (NPE). You also learned that the
objectives at the institutional level are also broad.

But the objectives at the instructional level or classroom level are specific and stated
in behavioural terms, specifying what the learners are expected to do at the end of
the lesson, using action verbs. Objectives at the national level can only be realized
in bits and pieces through the contributory influence of the many formal and
informal learning experiences at school, home and in the society.

For the nation to achieve her goal, they must be realized after formal education has
been concluded.

Since the schools are institutions consciously created to ensure desirable changes in
human behaviour towards the ultimate realization of the national goals, they have to
make conscious efforts to ensure the attainment of the goals. This can be done
through a systematic translation of these objectives, and then to instructional
objectives. These should be stated in a way that they should be observed or
measured.

Objectives are very important as they provide the necessary feedback for the
adjustments of curriculum, teaching method and teaching aids among others.

8.0 ASSIGNMENT

1. List 4 importance of instructional objectives.


2. State (2) differences between objectives and goals.
3. As a prospective teacher, mention 5 benefits of incorporating objectives
within our coursework.

111
EDU 423 MEASUREMENT AND EVALUATION

9.0 REFERENCES

Anderson, L. W., Krathwohl, D. R., & Bloom, B.S, (2000). Taxonomy for Learning,
Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational
Objectives. White Plains, NY: Longman.

Dick, W., Carey, L., & Carey, J. O. (2001). The Systematic Design of Instruction.5th
edition. Boston, MA: Addison Wesley.

Nenty, H. J. (1985). Fundamental of Measurement and Evaluation in Education.


Calabar: University of Calabar Press.

Obimba, F. U. (1989). Fundamentals of Measurement and Evaluation in Education


and Psychology. Owerri: Totan publishers Ltd.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

112
EDU 423 MEASUREMENT AND EVALUATION

UNIT 2 BLOOM’S TAXONOMY OF EDUCATIONAL


OBJECTIVES

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Cognitive Domain
5.2 Knowledge or Memory Level
5.3 Comprehension Level
5.4 Application Level
5.5 Analysis Level
5.6 Synthesis Level
5.7 Evaluation Level
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

Bloom Benjamin‟s has put forward a taxonomy of educational objectives, which


provides a practical framework within which educational objectives could be
organized and measured. In this taxonomy, Bloom et al (1956) divided educational
objectives into three domains. These are Cognitive domain, Affective domain, and
Psychomotor domain.

In this unit, we shall look at the cognitive domain in some details, later in other units
we shall look at the others.

2.0 OBJECTIVES
By the end of this unit, you should be able to:
 draw the hierarchical diagram of the Boom‟s cognitive domain;
 explain the level of cognitive domain; and
 specify objectives corresponding to each of the domains.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
113
EDU 423 MEASUREMENT AND EVALUATION

4.0 WORD STUDY


 Taxonomy: The classification and naming of organisms in an ordered
system that is intended to indicate natural relationships, especially
evolutionary relationship
 Hierarchical: The nature of a hierarchy; arranged in order of rank.
 Operational difficulties: pertaining to a difficulty in process or series of
actions for achieving a result
 Cues and clues: This is something you hear or find that helps you to achieve
a task
 Abstractions: The process of taking away or removing characteristics from
something in order to reduce it to a set of essential characteristics.
 Intellectual abilities: Refers to general mental capacity, such as learning,
reasoning, problem solving, and so on.

5.0 MAIN CONTENT

5.1 Cognitive domain


The cognitive domain involves those objectives that deal with the development of
intellectual abilities and skills. These have to do with the mental abilities of the
brain. The domain is categorized into six hierarchical comprehension, application,
analysis, synthesis and evaluation. These levels are of hierarchical and increasing
operational difficulties that achievement of a higher level of skill assumes the
achievement of the previous levels. This implies that a higher level of skill could be
achieved only if a certain amount of ability called for by the previous level has been
achieved. For instance, you cannot apply what you do not know or comprehend, can
you now understand what it means to be hierarchical.
Now let us look at the components of the cognitive domain.
Complex

EVALUATION

SYNTHESIS

ANALYSIS

APPLICATION
COMPREHENSION

KNOWLEDGE (MEMORY)ON
Simple

Fig. 5.1: Hierarchical levels of Bloom’s taxonomy

114
EDU 423 MEASUREMENT AND EVALUATION

5.2 Knowledge (or Memory)

If you have studied the figure 5.1 above, you would have noticed that knowledge or
memory is the first, the lowest and the foundation for the development of higher
order cognitive skills. It involves the recognition or recall of previous learned
information. There is no demand for understanding or internalization of information.
For measurement purposes, memory or knowledge involves bringing to mind the
appropriate material. This cognitive level emphasizes the psychological process of
remembering. Action verbs which can serve as appropriate signals, cues and clues
that can effectively bring out stored information in the mind include: define, name,
list, tell, recall, identify, remember, who, which, where, when, what, recognize, how
much, how many etc.

You can now use the above stated verbs to formulate instructional objectives.

5.3 Comprehension Level

You remember that we said that memory is concerned with the accumulation of a
repertoire of facts, specifics, ways and means of dealing with specifics, universals,
abstractions etc. It implies that memory involver‟s verbalization and role learning.
Comprehension is all about internalization of knowledge. It involves making
memory out of what is stored in the brain file. It is on this basis that what is stored
in the brain can be understood and translated, interpreted or extrapolated. It is only
when you have known something that you can understand it. Again, it is only when
you know and understand that you can re-order or re-arrange. Action verbs here
include explain, represent, restate, convert, interpret, re-arrange, re-order, translate,
rephrase, transform etc.

Comprehension level is made up of the following:

i. Translation: which involves the ability to understand literal messages across


communication forms, changing what is known from one form of
communication to another e.g. from words to numbers, graphs, maps, charts,
cartoons, pictures, formulas, symbols, models, equations etc.

ii. Interpretation: which goes beyond mere literal translation to identification of


inter-relationships among parts and components of communication and
interpreting and relating these to the main components e.g. to interpret a chart
or graph etc.

iii. Extrapolation: which involves the ability to draw implications and ability to
identify and continue a trend, isolate or detect consequences, suggest possible
meaning and estimate possible effect.

115
EDU 423 MEASUREMENT AND EVALUATION

5.4 Application Level

In the last section, we noted that you cannot understand what you have not known. It
also means that you cannot apply what you do not understand. The use of
abstractions in a concrete situation is called application. These abstractions can be in
the form of general ideas, rules, or procedures or generalized methods, technical
terms, principles, ideas and theories which must be remembered, understood and
applied.

You must understand what is meant before correct application. Ability to apply what
is learned is an indication of a more permanent acquisition of learning. Application
skills are developed when the learner uses what he knows to solve a new problem,
or in a new situation. Application involves the ability to the learner to grasp exactly
what the problem is all about and what generalization or principles are relevant,
useful, or pertinent for its solution. Some action verbs here include: apply, build,
explain, calculate, classify, solve, specify, state, transfer demonstrate, determine,
design, employ, predict, present, use which, restructure, relate, employ, organize etc.
it involves the principles of transfer of learning.

5.5 Analysis Level

This is the breaking down of communication into its constituent parts or elements in
order to establish the relationship or make the relations between ideas expressed to
be clear or explicit. It means breaking a learnt material into parts, ideas and devices
for clearer understanding.

It goes beyond application and involves such action verbs as analyse, detect,
determine, establish, compare, why, discriminate, distinguish, check consistency,
categories, establish evidence etc.

The components here include:

i. Analysis of Elements: which is concerned with the ability to identify the


underlying elements such as assumptions, hypothesis, conclusions, views,
values, arguments, statements etc and to determine the nature and functions
of such elements?
ii. Analysis of Relationship: which involves trying to determine how the
elements identified are related to each other? For instance, how does the
evidence relate to the conclusion?
iii. Analysis of Organizational principles: which involves determining the
principles or system of organization which holds the different elements and
parts together? It involves finding the pattern, the structure, systematic
arrangements, point of view, etc.

116
EDU 423 MEASUREMENT AND EVALUATION

5.6 Synthesis Level

In sub-section 3.4, you learnt that analysis involves breaking down of materials,
communication, object etc. But in synthesis you build up or put together elements,
parts, pieces and components in order to form a unique whole or to constitute a new
form, plan, pattern or structure. In other words, synthesis is concerned with the
ability to put parts of knowledge together to form a new knowledge. It involves
categorizing of items, composing of poems, and songs, writing etc. it involves
divergent thinking. It calls for imaginative, original and creative thinking. You will
note that creative – though process results in discovery of knowledge that is new or
something that is tangible. It calls for creative answers to problems and for the
development of questioning mind, spirit of inquiry or inquisitive mind.

It requires fluency of novel ideas and flexible mind. It allows students great freedom
at looking for solutions, using many possible approaches to problem solving. Action
verbs includes: plan, develop, devise, write, tell, make, assemble, classify, express,
illustrate, produce, propose, specify, suggest, document, formulate, modify,
organize, derive, design, derive, create, combine, construct, put together,
constituted, etc synthetic can be sub divided into:
(a) Production of unique communication: which is concerned with the ability to
put together in a unique organizational form a piece of written or oral
communication to convey a novel idea, feeling or experience to others?

(b) Production of a plan or proposed set of operations: this is concerned with the
ability to develop a plan or to propose procedures for solving problem or
dealing with others.

(c) Derivation of a set of Abstract Relation: this is based on the result of the
analysis of an experimental data, observation or other specific. It is the ability
to form concepts generalizations, deduce propositions, predictions or
relationship based on classification of experiences or observations.

5.7 Evaluation Level

Your knowledge of the meaning of evaluation is not very different from this level of
cognitive domain. It is the highest in the hierarchy. It involves making a quantitative
or qualitative judgment about a piece of communication, a procedure, a method, a
proposal, a plan etc. Based on certain internal or external criteria alternatives
abound, choice depends on the result of judgment which we make consciously or
unconsciously based on values we held. Every day, we make judgments such as
good or bad, right or wrong, agree or disagree, fast or slow etc.

These are simple evaluations. They may not base on logical or rational judgment. In
education, evaluation as a cognitive objective involves the learners‟ ability to
organize his thought and knowledge to reach a logical and rational decision which is
defendable.

117
EDU 423 MEASUREMENT AND EVALUATION

Evaluation is the most complex of human cognitive behaviour. It embodies elements


of the other five categories. What are the other categories? Can you name them?
They are knowledge, comprehension, application, analysis and synthesis. Action
verbs here include: agree, assess, compare, appraise, choose, evaluate, why,
validate, judge, select, conclude, consider, decide, contract etc. Evaluation can be
subdivided into (a) judgment in terms of internal criteria and (b) judgment in terms
of external criteria.

6.0 ACTIVITY

1. Give a brief history of Benjamin blooms taxonomy.


2. In a tabular form compare and contrast blooms original domain and new
domain.
3. Critically present the new version of Bloom's Taxonomy, with its keywords.

7.0 SUMMARY

In this unit, you have learnt that bloom classified educational objectives of an
intellectual nature or the cognitive domain into six groups, which form a hierarchy
of mental skills from the lowest and easiest level, knowledge or memory to the
highest and most difficult level, evaluation. Knowledge and comprehension are
regarded as the low cognitive objectives, while application, analysis, synthesis and
evaluation are regarded as the higher cognitive objectives.

You have seen that objectives under knowledge require simple straight recall or
recognition of facts and principles based on the accuracy of the memory.
Comprehension implies a thorough or reasonable familiarity with principles and
facts and their recognition when stated in different words or when they appear in
different circumstances. The learner should be able to give explanations to them.
Application implies transfer of learning. Learners should be able to deduce results,
underlying principles and environmental conditions given the effects on the system.
Analysis involves learning apart, breaking into parts etc in order to see the
relationship, while synthesis is about building, constructing, assembling, etc. It
involves creativity, divergent thinking, critical thinking etc. but evaluation which is
the highest in the hierarchy is the ability to judge the value of subject-matter that is
learnt in relation to a specific purpose. In the next unit, we shall look at the other
aspects of the educational objectives – Affective and Psychomotor.

8.0 ASSIGNMENT

1. Why Use Bloom's Cognitive Taxonomy?


2. What is cognitive domain give a practical example?
3. What is cognitive development and memory?

118
EDU 423 MEASUREMENT AND EVALUATION

9.0 REFERENCES

Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E.,
Pintrich, P. R., Raths, J., Wittrock, M. C.(2001). Taxonomy for learning,
teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational
Objectives. New York: Pearson, Allyn & Bacon.

Clark, R., Chopeta, L. (2004). Graphics For Learning: Proven Guidelines For
Planning, Designing, And Evaluating Visuals In Training Materials. San
Francisco: Jossey-Bass/Pfeiffer.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar.

Nwana. O. C. (1981). Educational Measurement for Teacher. Ikeja – Lagos, Hong


Kong, Ontario, Kenya, Thomas Nelson Africa.

Obimba, F. U. (1989). Fundamentals of Measurement and Evaluation in Education


and Psychology. Owerri: Totan Publishers Ltd.

119
EDU 423 MEASUREMENT AND EVALUATION

UNIT 3 AFFECTIVE DOMAIN

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Characteristic Features of Affective Domain
5.2 Receiving
5.3 Responding
5.4 Valuing
5.5 Organization
5.6 Characteristic by a Value or Value Complex
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the past, people were not very happy about emotionalism in education. They
argue that intellectualism had little or nothing to do with the learner‟s interests,
emotions or impulses.

Today, people have recognized that the learner‟s feelings and emotions are
important considerations in education. This is why a group of interests led by
Tanner and Tanner (1975) insist that the primary goals of learning are affective.
They are of the opinion that learners should not learn what is selected for them by
others. This is because it amounts to imposition on the learners, of other people‟s
values and purposes. This of course defys learners‟ own feelings and emotions. You
can see that this argument is in contrast to what happens in our schools today where
most schools hold that fundamental objectives are cognitive.

As a matter of fact, what we have in our school systems is the discipline-centred


curriculum projects which focus on the cognitive learning to the neglect of affective
processes. Although the primary goal of a good teacher is to help students learn, not
to make them feel good, yet it is an important role of a good teacher to make
students feel good about their efforts to learn and their success in learning. This will
help to create a balance and interdependence between the cognitive and the affective
processes of learning. In the last unit, we shall be concerned with the affective
domain and its characteristics. Before we do that, let us look at what you will
achieve at the end of the unit.

120
EDU 423 MEASUREMENT AND EVALUATION

2.0 OBJECTIVES

By the end of this unit you should be able to:

 explain the meaning of affective domain;


 describe the levels of affective domain;
 state objectives in the affective domain; and
 mention the characteristic features of affective domain.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Aesthetic: This is concerned with beauty or the appreciation of beauty.

 Sensibilities: (a) Capacity for sensation or feeling; responsiveness or


susceptibility to sensory stimuli. (b) Mental susceptibility or responsiveness;
quickness and acuteness of apprehension or feeling.

 Capacity for feeling: The function or the power of perceiving by touch,


capacity for emotion, especially compassion: to have great feeling for.

 Attachment: An extra part or extension that is or can be attached to


something to perform a particular function.

 Sympathy: The feelings of pity and sorrow for someone else's misfortune.

 Empathy: The feeling that you understand and share another person's
experiences and emotions.

5.0 MAIN CONTENT

5.1 Characteristic Features of Affective Domain

While you were reading the information in this unit, you learnt that affective domain
has to do with feelings and emotions. These are the emphatic characteristic of this
domain of acceptance or rejection. It is concerned with interests, attitudes,
appreciation, emotional biases and values. The function of the affective domain in
the instructional situation pertains to emotions, the passions, the dispositions, the

121
EDU 423 MEASUREMENT AND EVALUATION

moral and the aesthetic sensibilities, the capacity for feeling, concern, attachment or
detachment, sympathy, empathy, and appreciation.

Can you now see that your feeling, emotion, appreciation, the value you place on
this course, together form your affective disposition of the course. It shows your
personal-social adjustment in this course and indeed in the programme. Basically
you, as a learner, have internalized and appreciated what you have been taught or
what you have learnt is demonstrated in attitudes, likes and dislikes etc. Affective
domain is generally covert in behaviour. The educational objectives here vary from
simple attention to complex and internally consistent qualities of character and
conscience.

Examples of learning outcomes in the affective domain are:

 The learner will be able to appreciate the use of drawing instruments in the
construction of objects in technical drawing.
 The learner will be able to show awareness of the rules and regulations in a
technical workshop to prevent accidents.
 The learner should be able to show his likeness for neatness and accuracy in
the use of measurement instruments etc.

Affective domain has five hierarchical categories. You remember that the cognitive
domain has six hierarchical levels.

Specifically, the levels in affective domain fall into these levels: receiving,
responding, valuing, organization and characterization

5.2 Receiving

This is the lowest level of the learning outcomes in the affective domain. It means
attending. It is the learner‟s willingness to attend to a particular stimulus or his being
sensitive to the existence of a given problem, event, condition or situation. It has
three sub-levels.

These are:

i. Awareness: which involves the conscious recognition of the existence of


some problems, conditions, situations, events, phenomena etc. take for
instance as a teacher, you come into your class while the students are making
noise. You will notice that the atmosphere will change. This is because the
students have become aware of your presence. They are merely aware.

ii. Willingness: This is the next stage which involves the ability to acknowledge
the object, event, problem instead of ignoring or avoiding it. The students in
your class kept quite because they noticed and acknowledged your presence.

122
EDU 423 MEASUREMENT AND EVALUATION

If they had ignored your presence they would continue to make noise in the
class.
iii. Controlled or selected attention: This involves the learner selecting or
choosing to pay attention to the situation, problem, event or phenomenon.
When you teach in the class, the learner is aware of your saying or the points
you are making. In that case he will deliberately shut off messages or
speeches or sounds as noises. Receiving in a classroom situation involves
getting, holding and directing the attention of the learners to whatever the
teacher has to say in the class.

5.3 Responding

In this case the learner responds to the event by participating. He does not only
attend, he also reacts by doing something. If in your class you set a test for your
students, first the students become aware of the test, they are willing to take the test,
they now select to do it and they react by doing it. Responding has three sub-levels
too. These are:

i. Acquiescence in responding: This involves simple obedience or compliance.

ii. Willingness to respond: This involves voluntary responses to a given


situation.

iii. Satisfaction in response: If he is satisfied with the response he enjoys


reacting to the type of situation. If in the school situation you give a project
to your class, they comply by doing the project very well. They are satisfied
with what they have been able to produce. They will be happy and would
wish to have that type of project again and again. That shows that their
interest is now awakened.

5.4 Valuing

This is concerned with the worth or value or benefit which a leaner attaches to a
particular object, behaviour or situation.

This ranges in degree, from mere acceptance of value or a desire to improve group
skills to a more complex level of commitment or an assumption of responsibility for
the effective functioning of the group. As usual, there are three sub-levels of
valuing:

i. Acceptance of a value: This is a situation where the learner believes


tentatively in a proportion, doctrine, condition or situation a denomination
says that women should be ordained priest in the e.g. churches; the members
accept the doctrine.

ii. Preference for a value: In this case, the learner believes in the desirability or
necessity of the condition, doctrine, proposition etc. and ignores or rejects
123
EDU 423 MEASUREMENT AND EVALUATION

other alternatives and deliberately looks for other people views where the
issues are controversial, so as to form his own opinion.
iii. Commitment to a value: In this stage, the learner is convinced and fully
committed to the doctrine, principle or cause. In consequence, the learner
internalizes a set of specific values, which consistently manifest themselves
in his event behaviour, attitudes and appreciation.

Now let us go to the next level of affective domain.

5.5 Organization

In this level the learner starts to bring together different values as an organized
system. He determines the interrelationships and establishes the order of priority by
comparing, relating and synthesizing the values. He then builds a consistent value
system by resolving any possible conflicts between them. If the learner tries to
successfully internalize the value, he may encounter some situations which may
demand more than one value. In this case, he has to organize the values into a
system in order to decide which value to emphasis.

There are two sub-levels of organization. These are:

i. Conceptualization of a Value

This involves the understanding of the relationship of abstract elements of a value to


these already held or to new values which are gaining acceptance. In other words,
you may have to evaluate the works of arts which you appreciate or to find out in
order to clarify some basic assumptions about codes of ethics. It may be in the area
of music where you may have to identify the characteristics of two types of music
such as classical and hip up music, which you admire or enjoy in relation to the
others such as jazz or highlife which you do not like. We have used works of arts
and music as examples. Can you think of any other examples? Think of different
types of vehicles, songs, colours, designs, prints etc.

ii. Organization of Value System?

This involves the development of a complex value system, which includes those
values that cannot be compared for the purpose of making choices in order to
promote public welfare, instead of the sheer aggrandizement of special personal
interest. For instance, you may be in a position to relate your interest in the works of
arts against other value. You may be in situation where you compare alternative
social policies and practices against the standards of public welfare. It is this level
that leads individuals to develop vocational plan which can satisfy their needs for
economic security and social welfare. It leads the individual to develop philosophy
of life, which helps him to avoid dependence upon others, especially to avoid a
situation where one becomes a public nuisance. You can see that this level is a very
important one.

124
EDU 423 MEASUREMENT AND EVALUATION

5.6 Characterization by a Value or a Value Complex

At this stage the value system is so internalized by the people of individuals so that
they act consistently in accordance with such values, beliefs or ideals that comprise
their total philosophy or view of life. A life-style which reflects these beliefs and
philosophy are developed. The behaviour of such individuals or groups can be said
to be controlled by the value system.

This is true, as one can almost predict with accuracy how an individual would
behave or respond. There are two levels here:

i. Generalized set: This involves a situation where the orientation of the


individual enables him to reduce to order a complex environment and to act
consistently and effectively in it. There may be room for the individual to
revise his judgements and to change his behaviour as a result of available
new and valid evidence.

ii. Characterization: In this case, the internalization of a value system is such


that the individual is consistently acting in harmony with it.

The value system regulates the individual‟s personal and civil life according
to a code of behaviour based on ethical principles. You now notice that the
level of value complex individual develop typical or particular behaviour.
Instructional objectives here should be concerned with the learners‟ general
pattern of personal, social, or emotional adjustment.

6.0 ACTIVITY

1. In a tabular form state and describe the levels of affective domain


2. List the levels of the affective domain with their corresponding action verbs
describing learning outcomes
3. Write short notes on valuing and organization

7.0 SUMMARY

In the unit you have gone through the affective domain where you learnt its
characteristic features as something to do with feelings, emotions, degree of
acceptance or rejection, interests, attitudes, appreciation, emotional biases and value
etc.

You also learnt that the affective domain, like the cognitive domain has hierarchical
categories. But while the cognitive domain has six levels, the affective domain has
five levels.

125
EDU 423 MEASUREMENT AND EVALUATION

These levels are:

 Receiving: which is the lowest and has three sub-levels of awareness,


willingness and controlled or selected attention.
 Responding: which also has three sub-levels of acquiescence in responding,
willingness to respond and satisfaction in response.
 Valuing: which has also three sub-levels of acceptance of a value, preference
for a value and commitment to a value.
 Organization: This has two sub-levels of conceptualization of a value and
organization of value system.
 Characterization by a value or value complex: This has two sub-categories of
general set and characterization.
 You also learnt that the cognitive and affective processes are interrelated.

8.0 ASSIGNMENT

1. List any five instruments for measuring affective domain.


2. Explain any two mentioned.
3. Differentiate between rating scale and checklist.
4. State three characteristics each of rating scale and checklist.

9.0 REFERENCES

Fenwick, T. and Parsons, J. (2000). The art of evaluation: A handbook for educators
and trainers. Toronto, Ontario: Thompson Educational Publishing, Inc.

Krathwohl, D. R. (2001). Taxonomy of educational objectives: Handbook II,


affective domain. New York: David McKay Co.

Obimba, F. U. (1989) Fundamental of Measurement and Evaluation in Education


and Psychology. Owerri: Totan Pub. Ltd.

Onwuka, U. (1990) Curriculum Development for Africa. Onitsha: Africana – FEP


Publishers Ltd.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

126
EDU 423 MEASUREMENT AND EVALUATION

UNIT 4 PSYCHOMOTOR DOMAIN

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Psychomotor Domain
5.2 Reflex Movements
5.3 Basic Fundamental Movements
5.4 Perceptual Abilities.
5.5 Physical Abilities
5.6 Skilled Movements
5.7 Non-discursive Communications
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the last two units, you have worked through our discussions of the cognitive and
affective domains of the educational objectives. You have to note that emphasis on
either the cognitive or the affective domain will not develop the interdependence
between both. They cannot operate without the third domain – psychomotor. We
can only produce an educated individual when the three domains come to play
objectives cannot be complex without the psychomotor domain. This is why we
have discussed this domain of educational objectives in this unit.

2.0 OBJECTIVES

After working through this unit, you should be able to:

 explain the psychomotor domain of instructional objectives;


 discuss the six levels of psychomotor domain; and
 give examples of the activities needed in each level.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
127
EDU 423 MEASUREMENT AND EVALUATION

4.0 WORD STUDY

 Interdependence: The quality or condition of being mutually reliant on each


other

 Psychomotor: This is a test that assesses the subject's ability to perceive


instructions and perform motor

 Variety of tasks: Is an activity that needs to be accomplished within a


defined period of time or by a deadline to work towards work-related goals.

 Educators: This is a person or thing that educates, especially a teacher,


principal, or other person involved in planning or directing education.

 Complex nature: Something that is so intricate as to be hard to understand


or deal with: a complex problem.

 Psychomotor behaviors: These are performed actions that are


neuromuscular in nature and demand certain levels of physical dexterity.

5.0 MAIN CONTENT

5.1 Psychomotor Domain

In unit 5.0, you learnt that cognitive domain has to do with mental abilities. You
also noted in the last unit, that the affective domain deals with feelings, emotions,
values etc. In the same way, the psychomotor domain has to do with motor skills or
abilities. It means therefore, that the instructional objectives here will make
performance skills more prominent. The psychomotor domain has to do with
muscular activities. It deals with such activities which involve the use of the limbs
(hand) or the whole of the body. These tasks are inherent in human beings and
normally should develop naturally.

Can you think of such abilities or skills? Consider the skills in running, walking,
swimming, jumping, eating, playing, throwing, etc. One would say that these skills
are material.

Yet, for affective performance or high level performance of a wide variety of tasks,
it is necessary for educators to develop various skills of more complex nature in
addition to the inherent ones. For instance, more complex skills can be developed
through learning in such areas as driving, drawing, sports, etc. Like the cognitive
and affective, psychomotor domain is sub-divided into hierarchical levels. From the
lowest, we have (i) Reflex movements (ii) Basic Fundamental movements (iii)
Perceptual abilities (iv) Physical abilities (v) Skilled movements and (vi) Non-
discursive communication. Now, let us take them one after the other and discuss
them briefly.
128
EDU 423 MEASUREMENT AND EVALUATION

5.2 Reflex Movements

At the lowest level of the psychomotor domain is the reflex movement which every
normal human being should be able to make. The movements are all natural, except
where the case is abnormal, in which case it may demand therapy programmes.

Apart from the abnormal situations, educators are not concerned with these
movements. Now let us think of some examples.

Can you mention some of them? Your mind may have gone to; the twinkling of the
eyes, trying to dodge a blow or something thrown at you, jumping up when there is
danger, swallowing things, urinating or stooling by a child, etc.

5.3 Basic Fundamental Movements

Like the case of reflex movements, these are basic movements which are natural.
Educators have little or nothing to do with them, except in abnormal cases where
special educators step in to assist. There are three sub-categories at this stage. These
are:
i. Locomotor movement which involves movements of the body from place to
place such as crawling, walking, leaping, jumping etc.

ii. Non-locomotor movements which involves body movements that do not


involve moving from one place to another. These include muscular
movements, wriggling of the trunk, head and any other part of the body.
They also include turning, twisting etc of the body.

iii. Manipulative movements which involves the use of the hands or limbs to
move things to control things etc.

5.4 Perceptual abilities

This has to do with the senses and their developments. It means therefore, that
educators have not much to do here except to direct the use of this sense in
association with certain conditions. Perceptual abilities are concerned with the
ability of the individuals to perceive and distinguish things using the senses. Such
individuals recognise and compare things by physically tasting, smelling, seeing,
hearing, and touching. You can identify the sense organs associated with these
activities. With the use of particular taste, smell, sound, appearance and feeling, you
can associate and understand certain objects or situations will and feelings in your
mind. These senses with now help you to determine conditions and necessary course
of action.

129
EDU 423 MEASUREMENT AND EVALUATION

5.5 Physical abilities

These abilities fall in the area of health and physical education. You know that in
athletics and games or sports in general, you need physical abilities and that these
abilities can be developed into varying degrees of perfection with the help of
practices. This is why sports men and women always practise in other to improve on
their skills of endurance, strength, flexibility and agility. For instance, if you are a
goal keeper, you will need to improve on these skills to perform.

5.6 Skilled Movements

This is a higher ability than the physical abilities. Once you have acquired the
physical abilities, you now apply various types of these physical abilities in making
or creating things. You can combine skills in manipulative, endurance and flexibility
in writing and drawing. You can combine the neuromuscular movements together
with flexibility to help you in drawing. An individual can combine strength,
endurance, flexibility and manipulative movements in activities like combat sports
such as wrestling, boxing, karate, taekwondo, judoka, weight lifting, etc.

For skills like drumming, typing or playing the organ or the keyboard in music, you
will need a combination of manipulative movements and some perceptive abilities
and flexibility.

There are three sub-levels of the skilled movements. These are simple adaptive
skills, compound adaptive skills and complex adaptive skills.

5.7 Non-discursive Communication

This is the highest level which demands a combination of all the lower levels to
reach a high degree of expertise.

For instance, one can use the keyboard to play vibes and sounds but it requires a
good deal of training, practice and ability to combine certain movements of the
fingers in order to relay a message or to play a classical music like that of Handel.
At the same time, it will also require certain level of perceptive abilities in order to
be able to interpret or decode the messages or the music. It means that both the
player and interpreter must be operating at the same level of skills.

Everybody that is normal can move his limbs and legs. But you must have some
level of training, practice and the ability to combine a variety of movements and
some perceptive abilities in order to do diving, swimming, typing, driving, cycling
etc.

At the same time, you will also need these in other to read or interpret different
writings, long or shorthand. You need them to be able to manipulate your computer
set accurately to give you what you want. You need training to be able to browse on

130
EDU 423 MEASUREMENT AND EVALUATION

the Internet and to derive maximum benefits from the use of modern information
and communication technologies.

There are two sub-levels of the non-discursive communication. They are expressive
movement and interpretive movement.

6.0 ACTIVITY

1. State the levels of his psychomotor domain as initiated by Elizabeth Simpson.


2. Identify the key verbs.
3. Explain any three of these levels stated.

7.0 SUMMARY

In this unit, you read that the psychomotor domain has to do with motor skills,
performance skills or muscular activities. It is divided into (i) hierarchical categories
from the lowest level of reflex movements which are natural (ii) Basic fundamental
movement which is sub divided into locomotor movement, non-locomotor
movement and manipulative movements (iii) Perceptual abilities: which has to do
with the sense (iv) Physical abilities which have to do with endurance, strength,
flexibility and agility. (v) Skilled movement: which are in three levels: simple
adaptive skills, compound adaptive, flexibility and agility. (vi) Non-discursive
communication: this is the highest and combines all other levels. It has two sub-
levels, which are the expressive movement and the interpretive movement.

8.0 ASSIGNMENT

1. Give examples of activity or demonstration and evidence to measure any


three of the following: Guidance response, mechanism, complex overt
response, adaptation and originality.
2. Identify the key verbs in guidance response, mechanism, complex overt
response, adaptation and originality.
3. Explain the concept of complex overt response, guided response and
mechanism as levels in psychomotor domain.
4. Explain any for proponent in the development of psychomotor domain other
than Blooms.

9.0 REFERENCES

Obimba, F. U (1989). Fundamentals of Measurement and Evaluation in Education


and Psychology. Owerri: Totan Publishers Ltd.

Onwuka, U. (1990) Curriculum Development for Africa. Onitsha: Africana – FEP


Publishers Limited.

131
EDU 423 MEASUREMENT AND EVALUATION

UNIT 5 CLASSROOM TESTS

CONTENT

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Some Pitfalls in Teacher-Made Tests
5.2 Types of tests from Used in the Classroom
5.2.1 Essay test
5.2.2 Extended Responses or Free Response Type of Essay Test
5.2.1 Restricted–Response Type
5.2.2 Uses of Essay Test
5.2.3 Limitations of the Essay Test
5.2.4 How to Make Essay Tests Less Subjective
5.3 Objective Tests
5.3.1 Supply Test Items
5.3.2 Selection Test Items
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

The classroom test, which is otherwise called teacher-made test, is an instrument of


measurement and evaluation. It is a major technique used for the assessment of
students learning outcomes.

The classroom tests can be achievement or performance test and/or any other type of
test like practical test, etc prepared by the teacher for his specific class and purpose
based on what he has taught.

The development, of good questions or items writing for the purpose of classroom
test, cannot be taken for granted. An inexperienced teacher may write good items by
chance. But this is not always possible. Development of good questions or items
must follow a number of principles without which no one can guarantee that the
responses given to the tests will be relevant and consistent. In this unit, we shall
examine the various aspects of the teacher‟s own test.

As a teacher, you will be faced with several problems when it comes to your most
important functions – evaluating of learning outcomes. You are expected to observe
your students in the class, workshop, laboratory, field of play etc and rate their
activities under these varied conditions. You are required to correct and grade
assignments and home works. You are required to give weekly tests and end of term
132
EDU 423 MEASUREMENT AND EVALUATION

examinations. Most of the times, you are expected to decide on the fitness of your
students for promotion on the basis of continuous assessment exercises, end of term
examinations‟ cumulative results and promotion examination given towards the end
of the school year. Given these conditions it becomes very important that you
become familiar with the planning construction and administration of good quality
tests. This is because in the next few years when you graduate as a teacher your tests
will be used to play a very important role in the growth and progress of Nigerian
Youth?

2.0 OBJECTIVES

After going through this unit you should be able to:

 list the different types of items used in classroom test;


 describe the different types of objectives questions;
 describe the different types of essay questions; and
 compare the characteristics of objectives and essay tests.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Test: This is a procedure intended to establish the quality, performance, or


reliability of something, especially before it is taken into widespread use.

 Relevant: Closely connected or appropriate to the matter at hand.

 Consistent: Similar or unchanging in achievement or effect over a period of


time.

 Evaluating: To determine the value or worth of something

 Learning outcomes: These are statements that describe significant and


essential learning that learners have achieved, and can reliably demonstrate
at the end of a course or program.

 Continuous assessment: Is the educational policy in which students are


examined continuously over most of the duration of their education, the
results of which are taken into account after leaving school. It is often
proposed or used as an alternative to a final examination system.

133
EDU 423 MEASUREMENT AND EVALUATION

5.0 MAIN CONTENT

5.1 Some Pitfalls in Teacher – Made Tests

In unit 4.0, you were told that testable educational objectives are classified by
Bloom et al (1956) into recall or memory or knowledge comprehension, application,
analysis, synthesis, and evaluation. It means that you do not only set objectives
along these levels but also test them along the levels. The following observations
have been made about teacher-made tests. They are listed below in order to make
you avoid them when you construct your questions for your class tests.

i. Most teacher-tests are not appropriate to the different levels of learning


outcomes. The teachers specify their instructional objectives covering the
whole range simple recall to evaluation. Yet the teacher‟s items fall within
the recall of specific facts only

ii. Many of the test exercises fail to measure what they are supposed to measure.
In other words most of the teacher-made tests are not valid. You may wonder
what validity is. It is a very important quality of a good test, which implies
that a test is valid if it measures what it is supposed to measure. You will read
about in details later in this course.

ii. Some classroom tests do not cover comprehensively the topics taught. One of
the qualities of a good test is that it should represent the entire topic taught.
But, these tests cannot be said to be a representative sample of the whole
topic taught.

iv. Most tests prepared by teacher lack clarify in the wordings. The questions of
the tests are ambiguous, not precise, not clear and most of the times
carelessly worded. Most of the questions are general or global questions.

v. Most teacher-made tests fail item analysis test. They fail to discriminate
properly and not designed according to difficulty levels.
These are not the only pit falls. But you should try to avoid both the ones
mentioned here and those not mentioned here. Now let us look at the various
types of teacher-made tests items.

5.2 Types of Test Forms used in the Classroom

There are different types of test forms used in the classroom. These can be essay
test, objectives test, norm-referenced test or criterion referenced test. But we are
going to concentrate on the essay test and objectives test. These are the most
common tests which you can easily construct for your purpose in the class.

134
EDU 423 MEASUREMENT AND EVALUATION

5.2.1 Essay Test


This is the type of test introduced in this country by the colonial education. In this
case the testees or the students have the responsibility of thinking out the answers to
the questions asked. They have the freedom to express or state the answers in their
own words. It is a free-answer kind of test.
It is used by teachers to measure achievement, performance etc from classroom
instruction. In this case, you will have the opportunity to write what you can. At this
stage, you have a good idea of essay writing. Since you have studied it in your GST
107, GST 101 & 102 etc.
Before we look at the two types of essay tests, let us see the distinctive features of
essay tests. These are:
i. Students answer small number of questions. Because of time limits, usually
not more than 2 or 3 hours of examination, students are required to answer in
their own words not more than 5 or 6 questions. It is not always that all the
topics covered are covered by the tests.
ii. The script are written in the students own style, words and usually in his own
handwriting. In some cases spelling errors as well as poor language and
handwriting affect student‟s results.
iii. The students are considerably free to organize their own answers. This
implies that answers with varying degrees of accuracy and completeness. It
encourages creativity by allowing their own unique way. It discourages
guess-work and encourages good study habits in students.
Essay tests are of two variations. These are Extended – response type and
Restricted response type.

5.2.2 Extended Response or Free Response Type of Essay Test


In this type, questions are asked in a way that the answers demand that the student is
not limited to the extent to which he has to discuss the issues raised or question
asked. The student has to:
 plan and organize his thoughts in order to give his answer,
 put his ideas across by expressing himself freely, precisely and clearly using
his own words and his own writing, and
 discuss the questions at length, giving various aspects of his knowledge on
the question asked or issue raised.

5.2.3 Restricted – Response Type


In this type, the questions are so structured that the students are limited, the scope of
the response is defined and restricted. The answers given are to some extent
controlled.
Now let us give some examples.
135
EDU 423 MEASUREMENT AND EVALUATION

 Give three advantages and two disadvantages of essay tests.


 State four uses of tests in education.
 Explain five factors which influence the choice of building site.
 Mention five rules for preventing accident in a workshop.
 State 5 technical drawing instruments and their uses.
 Describe four sources of energy.
 Define Helix and give two applications of helix.

5.2.4 Uses of Essay Tests


The essay test has got some advantages in the business of measurement and
evaluation. Some of these advantages are:-
i. The essay test permits a freedom of response, which in turn allows the
students to prevent their ideas as detailed as they choose so as to show how
deep knowledge they have in the subject area covered by the question.

ii. The free response from allows the student to express himself in his own
words making use of his proficiency in the language being used to his
advantage.

iii. Essay tests promote the development of problem – solving skills. This is
because the student has to think out the answer himself and put it down in
organized form.
iv. It helps students to improve their writing skills such as writing speed,
legibility etc because they write in their handwriting.
iv. The essay test is easy to construct. This is why it is very popular.

5.2.5 Limitations of the Essay Test


i. Subjectivity in scoring is the greatest disadvantage here. The scoring is not
reliable because different examiners can grade the score answer differently.
In fact, the same examiner can grade the same question differently at
different times.
ii. Grading of essay tests is time-consuming.
iii. Essay questions do not cover the course content and the objectives as
comprehensively as possible.

iv. Good command of language places individual students at an advantage while


poor command places some students at a disadvantage.

5.2.6 How to Make Essay Test Less Subjective


In the last section, we said that subjectivity is a major limitation of the essay tests.
But you can reduce this to the barest minimum following these tips:

136
EDU 423 MEASUREMENT AND EVALUATION

i. Avoid open-ended questions.

ii. Let the students answer the same questions. Avoid options/choices.

iii. Use students‟ numbers instead of their names, to conceal their identity.

iv. Score all the answers to each question for all students at a time.

v. Do not allow score on one question to influence you while marking the next.
Always rearrange the papers before you mark.

vi. Do not allow your feelings or emotions to influence your marking.

vii. Avoid distractions when marking.

5.3 Objective Tests


The objective test otherwise regarded as the new type of test derives its name from
the fact that the marking is done with a standard key. This key concept is that the
students are provided with a problem to which a limited numbers of choices are
presented for them to select the wanted answer.

It is so much structured that even where the student is to supply the answers, he is
strictly limited to give specific and short answers. But students are given the
opportunity to react to a large sample of questions which may cover the entire
content area. The figure below shows the sub-categories of the objective test.

From the figure shown here in fig. 8.1 you can see that objective test items are
divided first into two-supply test items and selection test items.

Objective test items

Supply test items Selection test items

Short Completion
Answers
Arrangements True- Matching Multiple
False Choice
Fig. 8.1: Types of Objectives tests Items

These two are then sub-divided into

 Short answers
 Completion
137
EDU 423 MEASUREMENT AND EVALUATION

 Arrangements
 True-false
 Matching and
 Multiple choice items.

Let us look at them briefly

5.3.1 Supply Test Items

This is the type of test item, which requires the testee to give very brief answers to
the questions. These answers may be a word, a phrase, a number, a symbol or
symbols etc in order to write effective supply test items, you should make use of
these tips; construct the questions so as to be as brief as possible. The questions
should be carefully worded so as to require an exact answer. Make sure that the
questions have only one answer each.

Supply test items can be in the form of:

a. Short-answer test items which require that the testee should provide a short
answer to the question asked. Example
 Who is the senate president of the Nigerian senate?
 What is the technical drawing instrument used for measuring angles?
 The two instruments used for cutting screw threads are tap and ………..
 Which of the energies is used for motion?

b. Completion test items which require the testee to provide or complete one
or more missing words in a sentence. Examples
 A triangle with all sides and all angles equal is called ……………..
 The process of splitting a leg of wood into planks is called ………..
 ……………. is a special chord passing through the center of circle.

5.3.2 Selection Test Items

This is the type where possible alternatives are provided for the testee to choose the
most appropriate or the correct option.
Can you mention them? Let us take them one by one.

a. True-False: In this type a statement is presented to the testee and he is


required to state whether it is true or false, Yes or No, Agree or disagree etc.
there are only two options.
Examples: In the statement below, answer true or false if the statement is
correct and false if the statement is wrong.
 The Tee Square is used for drawing horizontal lines.
 Wood can be hardened and tempered to improve its qualities.
 Any material which obstructs the flow of magnetic flux is called resistor.
 The use of flowing garment in a workshop can cause accident.
138
EDU 423 MEASUREMENT AND EVALUATION

b. Multiple choice: This is very useful in measuring students achievement in


schools. It is very popular and a widely used type of objective test items. It
has two parts. The first is called the stem or premise, which is the question,
while the second is the suggested answers called alternatives, options or
choices.
The correct option is the answer or the key while other options are called
distractors or distracters. Now develop an objective item of the multiple
choice type and indicate the parts. Before we give examples of multiple
choice items, let us examine the following tips on how to make good multiple
choice items.

 The stems must be simple and clear.


 All alternatives should be possible answers, related to the stems but with
only one most correct or best answer to the question.
 The distractors should be as attractive as the key and very effective in
distracting testees who are not sure of the key.
 To avoid guessing the correct answers should not be positioned in a
particular way so as to form a patter. It should be randomly positioned.
 There should be a minimum of four options for each item. Five options
are generally preferred.
 Avoid as much as possible the options „none of the above‟, „all of the
above‟ or a combination of options like „canded‟.
 You can now construct good multiple choice items using these hints. Let
us see some examples.
 The two forms of oblique projections are:………
a. axonometric and planometric
b. cavalier and planometric
c. cavaliers and cabinet
d. cabinet and axonometric
e. pictorial and perspective.
 At what angle should projections line be to an inclined surface to obtain a
true shape of the surface?
a. 300 b. 450 c. 600 d. 900 e. 1200

 Which of these is NOT an autonomous community in lhite


a. Umuihi b. Ihinna c. Amainyi
d. Avutu e. Amakohia

 Which tribe in Nigeria is associated with „Amala and Ewedu‟


a. Yorubas b. Ndi igbo c. Hausa
d. Efiks e. Jukuns

139
EDU 423 MEASUREMENT AND EVALUATION

6.0 ACTIVITY

1. In a tabular form give four differences between criterion and norm referenced
test.
2. State three limitation of objective test.
3. Differentiate between teacher-made-test and standardized test.

7.0 SUMMARY

In this unit, we have gone through the teacher made tests used in the classroom. You
have learnt how to construct them, the advantages and the disadvantages, how to
make use of the advantages and how to avoid the disadvantages.

The figure below illustrates the types of teacher made tests.

TEAC
Extended Responses or Free
HER
Essay Type Response
Restricted Response

Structure True or False


Response Matching
Type or
Multiple Choice
Objective
MADE Type Arrangements
Completion
Short Answers
Others

8.0 ASSIGNMENT

1) TESTS
A main difference between teacher-made and standardized tests is that
a) standardised tests allow meaningful comparisons between students.
b) standardised tests provide unbiased estimates of students' knowledge.
c) standardised tests allow greater variability in the testing procedures.
d) standardised tests can be developed more quickly.

2) TEST
Behaviour rating scales are typically used to
a) assess a student's general or specific abilities.
b) screen children for possible special education placement.
c) assess mastery of material covered.
d) predict occupational skills and success.

140
EDU 423 MEASUREMENT AND EVALUATION

3) One use of standardized testing is to make decisions about a school's


curriculum based on testing students. Which purpose of a testing program
does this illustrate?
a) Instructional
b) Guidance
c) Administrative
d) Religious

4) A teacher grades a test, compares each student's answers and decides that a
given student's answers, compared to the others, should receive a C. What is
such a test called?
a) Aptitude test.
b) Achievement test.
c) Criterion-referenced test.
d) Norm referenced test.

9.0 REFERENCES
Linn, R. (2000). Assessments and Accountability. ER Online, 29(2), 4-14. Retrieved
September, 2002, from http://www.aera.net/pubs/er/arts/29-02/linn01.htm.
Nwana, O. C. (1981) Educational Measurement for Teachers. Lagos: Nelson Africa.
Obimba, F. U. (1989) Fundamental of Measurement and Evaluation in Education
and Psychology. Owerri Totan Publishers Ltd.
Ohuche, R. O. and Akeju, S. A. (1977) Testing and Evaluation in Education. Lagos:
African Educational Resources (AER).
Sanders, W., & Horn, S. (1995). Educational assessment reassessed: The usefulness of
standardized and alternative measures of student achievement as indicators for the
assessment of educational outcomes. Education Policy Analysis Archives, 3(6).
Retrieved September 2002, from http://olam.ed.asu.edu/epaa/v3n6.html

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

141
EDU 423 MEASUREMENT AND EVALUATION

MODULE 3
Unit 1 Types of Test
Unit 2 Essay Test
Unit 3 Objective Test
Unit 4 Test Development – Planning the Classroom Test
Unit 5 The Administration and Scoring of Classroom Test

UNIT 1 TYPES OF TEST


CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Concept of Test
5.1.1 Concept of Test as Measuring Instrument
5.1.2 Limitations of Test as Measuring Instrument
5.2 Types of Test
5.2.1 Intelligence Test
5.2.2 Aptitude Test
5.2.3 Achievement Test
5.3 Classification of Achievement Test
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In this unit, you will be taken through the concept of test which you are already
familiar with. Here, we should consider the concept of test as a measuring
instrument and its limitations. You will also learn about types of test such as
Intelligence Test, Aptitude Test and Achievement Test. While taking you through
this unit, you will attempt few Self assessment exercise (SAEs). These are designed
to enable you monitor your progress in the Unit. At the end of the unit, you will be
requested to answer some Tutor Marked Assignments (TMAs).

2.0 OBJECTIVES

By the end of this unit, you should be able to:

 state the concept of test;


 identify types of test;

142
EDU 423 MEASUREMENT AND EVALUATION

 classify achievement test; and


 explain types of achievement test.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Concept: A general idea or understanding of something.

 Unit: An individual thing or person regarded as single and complete but


which can also form an individual component of a larger or more complex
whole. "the unit of measurement".

 Tutor Marked Assignments: This is simply the standards inherent in the


intended learning outcomes.

 Pre-specified objectives: this is the degree to which the pre-specified


instructional objectives are attained.

 Intelligence Test: A test designed to measure the ability to think and reason
rather than acquired knowledge.

 Aptitude Test: A test designed to determine a person's ability in a particular


skill or field of knowledge.

 Achievement Test: This is a test of developed skill or knowledge. The most


common type of achievement test is a standardized test developed to measure
skills and knowledge learned in a given grade level, usually through planned
instruction, such as training or classroom instruction.

5.0 MAIN CONTENT

5.1 Concept of Test

As you already know, a test is an instrument or device used for finding out the
presence or absence of a particular phenomenon or trait possessed by an individual
or group of individuals. For instance, one can use an achievement test in integrated
science to determine how well the testee learned what he was exposed to.

143
EDU 423 MEASUREMENT AND EVALUATION

5.1.1 Concept of Test as Measuring Instrument

A test is the major and most commonly used instrument for the assessment of
cognitive behaviours. In this context test simply means a set of questions which
students are expected to answer? Their responses to the questions give a measure of
their level of performance or achievement.

Usually, the test is based on learned content of subject specific area(s) and is
directed to measure the learner‟s level of attainment of pre-specified objectives.
You know that to measure an attribute, a standard instrument is needed. Therefore,
unlike physical attributes, measurements are done by describing the characteristics
associated with such constructs in behavioural terms.

The expected, behaviours (aptitude) such as ability to state, define, manipulate or


perform experiment for instance in integrated science and similar activities are put
down in form of test. The test score gives quantitative information about the
existence of the construct (attribute) possessed by the testee. For this reason, the
test items as measuring instrument must be valid, reliable and usable in order to give
dependable result.

5.1.2 Limitations of Test as Measuring Instrument

The limitations of test as measuring instrument arise because a test measures


attributes indirectly. Hence, the accuracy of information obtained from test results
depend on the representativeness and adequacy of the sample of test items with
respect to the behaviour associated with the attribute. In other words, a test as a
measuring instrument is supposed to have a representative sample of items which
measure all and what it purports to measure. Moreover, unlike physical measuring
instruments test scores are not absolute. The real value of score of 0 percent does
not mean that the learner has zero aptitude and therefore has not learned anything.

However, we know that a learner who scored 60 percent in a given test has more
aptitude than another learner who scored 30 percent. But we cannot really say by
how much. Therefore, the scores are interpreted with caution.

As you know, no test is accepted universally as standard measure for a specific


attribute on its own. A perceived representative samples developed based on some
common objective and content areas of a given locality cannot represent all versions
of likely attributes of interest to all the students outside that given locality. Thus the
use of every test is often localized to specific class, school or area.

5.2 Types of Test

Test may be classified into two broad categories on the basis of nature of the
measurement. These are: Measures of maximum performance and measures of
typical performance. In measures of maximum performance, you have those
procedures used to determine a person‟s ability. They are concerned with how well
144
EDU 423 MEASUREMENT AND EVALUATION

an individual performs when motivated to obtain as high a score as possible and the
result indicates what individuals can do when they put forth their best effort. Can
you recall any test that should be included in this category? Examples are aptitude
test, achievement tests and intelligence tests.

On the other hand, measures of typical performance are those designed to reflect a
person‟s typical behaviour. They fall into the general area of personality appraisal
such as interests, attitudes and various aspects of personal social adjustment.
Because testing instruments cannot adequately be used to measure these attributes
self-report and observational techniques, such as interviews, questionnaires,
anecdotal records, ratings are sometimes used. These techniques are used in relevant
combinations to provide the desired results on which accurate judgment concerning
learner‟s progress and change can be made.

5.2.1 Intelligence Test (or General Mental Ability Test)

You will recall that intelligence is the ability to reason and learn from experience. It
is thought to depend both on inherited ability (nature) and on surroundings in which
a person is brought up (nurture). The first intelligence tests were devised by Alfred
Binet in 1905 to give an Intelligence Quotient (IQ). Intelligence test provides an
indication of an individual‟s general mental capacity. An Intelligence test usually
includes a wide variety of tests so as to sample several aspects of cognitive function.
Some people believe that Intelligence can be expressed only in speech and writing
and therefore cannot be tested.

5.2.2 Aptitude Tests (Separate Ability)

When we talk about aptitude, we refer to the natural talent or ability especially
specified. Thus, aptitude tests measure specialized abilities and the potential to learn
or perform new tasks that may be relevant to later learning or performance in a
specific area. Hence, they are future oriented. Can you mention any one of such tests
that is familiar to you? An example is the Common Entrance Examination into
Vocational Schools and even Secondary Schools.

5.2.3 Achievement Test

Achievement tests are designed to measure the effects of a specific programme of


instruction or training which the learners attained usually by their effort. Generally,
they represent a terminal evaluation of the learner‟s status on the completion of a
course of study or training. That is, it is used to determine how much the learner has
learned from specified content via systemic and controlled instructions. End of term
examinations and classroom tests are mostly achievement tests.

5.3 Classification of Achievement Test

Achievement test may be classified in the following ways:

145
EDU 423 MEASUREMENT AND EVALUATION

By mode of Response
 Oral test
 Written test
 Practical test

By Purpose of Testing
 Placement test
 Formative test
 Diagnostic test
 Summative test

By Desired Speed of Response


 Power test
 Speed test

By Degree of Rigour Employed in Preparation and Scope of Applicability


 Teacher Made tests
 Standardized tests

By Mode of Interpreting Results


 Norm –referenced testing
 Criterion-referenced testing
 Self-referenced testing

By Format of Test Items


 Objective test items
 Essay test items

6.0 ACTIVITY

1. Classify test under the following categories:


a) Method of administration
b) Response demanded
c) Time required for the test
d) Time of scoring
e) Type of learning outcome
f) Method of interpreting the score
g) Type of performance displayed
h) Use in the classroom instruction
i) Method of preparation
2. Differentiate between objective and essay test
3. Outline any three guidelines for writing essay test

146
EDU 423 MEASUREMENT AND EVALUATION

7.0 SUMMARY

In this unit, you learned that:

 Test is an instrument or device used for finding out the presence or absence
of a particular phenomenon or trait possessed by an individual or group of
individuals.
 A test is the major and most commonly used instrument for the assessment of
cognitive behaviours.
 The limitations of test as a measuring instrument arise because a test
measures attributes indirectly.
 Tests are classified into two broad categories namely measures of maximum
performance and measures of typical performance.
 Examples of measures of maximum performance are aptitude test,
achievement test, and intelligence tests.

8.0 ASSIGNMENT

1. Construct the following test in measures of central tendency


a) two multiple choice question
b) two alternative question
c) Two essay questions (one extended type and one restricted type)
2. What are the guidelines for constructing objective test in the school system?

3. Test has been seriously criticized for it inadequacies state and explain any
three of these inadequacies

4. Differentiate between norm referenced and criterion reference test

9.0 REFERENCES

Iwuji, V. B. C. (1997). Measurement and Evaluation for Effective Teaching and


Learning. Owerri: CRC Publishing.

Onasanya, Kola (1991). Evaluation of Students Achievement. Pius Debo (Nigeria),


Ijebu –Ode.

Ughamadu, K. A (1991). Understanding and Implementing Continuous


Assessement. Benin City: World of Books Publishers, Nigeria.

147
EDU 423 MEASUREMENT AND EVALUATION

UNIT 2 ESSAY TEST

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Essay Test
5.1.1 Advantages of Essay Test
5.1.2 Disadvantages of Essay Test
5.1.3 When to Use Essay Test
5.2 Classification of Essay Test Items
5.3 Types of Essay Item Response
5.3.1 Extended Response Questions
5.3.2 Restricted Response Questions
5.4 Constructing the Essay Question.
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the last unit, you learned about types of test. There you learned that Achievement
Test may be classified by format of test items into essay and objective test items. In
this unit, you will learn about essay test, it advantages and disadvantages. Further
more you will learn about the classification and types of essay test. Finally, you will
learn how to construct an essay test.

2.0 OBJECTIVES

By the end of this unit, you should be able to:

 explain the meaning of essay test;


 state the advantages and disadvantages of essay test;
 enumerate when to use essay test;
 classify types of essay item response; and
 construct the essay questions.

3.0 HOW TO STUDY THIS UNIT


1. Read through this unit carefully.
2. Study the unit step by step as the points are well arranged.

148
EDU 423 MEASUREMENT AND EVALUATION

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Degree of correctness: The condition or quality of being true, correct, or


exact.

 Deficient in the skill: Not having enough of a specified quality, traits or


attribute required.

 Measurement quality: A planned and systematic means for assuring


management that the defined standards, practices, procedures, and methods
of the process are applied.

 Organizational and divergent thinking: This has to do with thought


process or method used to generate creative ideas.

 Synthesize and organize ideas: To make (something) by combining


different things together.

 In-depth knowledge: This is the complexity or depth of understanding.

 Variation: A change or difference in condition, amount, or level, typically


with certain limits.

 Invalidity: The condition of not being legally or officially acceptable.

 Open and distance learning: Refers to the provision of flexible educational


opportunities in terms of access and multiple modes of knowledge
acquisition.

 Eradication of illiteracy: Total removal of lack of knowledge or inability to


read and write in a particular.

5.0 MAIN CONTENT

5.1 Essay Test

Essay tests are tests consisting of questions (items) designed to elicit from the
learners through freedom of response, the extent to which they have acquired the
behaviour called for in the course objectives. The answers to such questions which
the learners are confronted vary in quality and degree of correctness. Most times,
these answers are not complete and thorough.

149
EDU 423 MEASUREMENT AND EVALUATION

They also have poor psychometric quality or measurement qualities although


popular among classroom teachers especially those who are deficient in the skill
required for item construction. For this reason we should examine how to construct
essay items in this unit and subsequently examine how to administer and score such
items in order to improve their reliability and validity.

5.1.1 Advantages of Essay Tests

 It measures complex learning outcomes that cannot be measured by other


means. For instance, it has the ability to measure learner‟s communication
skills. That is, the learner‟s ability to produce an answer, synthesize and
organize ideas and present them readably in a logical and coherent form.
This is the major advantage.
 It also enables the measurement of organizational and divergent thinking
skills by laying emphasis on the integration and application of thinking and
problem solving skills, creativity and originality.
 It is very applicable for measuring learning outcomes at the higher levels of
educational objectives such as application, analysis, synthesis and evaluation
of levels of the cognitive domain.
 It is easy and economical to administer. It can be easily and conveniently
written on the chalkboard because of the few items involved. This saves
materials and time for production.
 Essay item is easy to construct and does not take much time. This fact has to
be guarded seriously to avoid constructing questions that can be very
misleading by not asking for specific behaviours emphasized in a particular
set of learning outcomes.
 It can be used to measure in-depth knowledge especially in a restricted
subject matter area.
 It does not encourage guessing and cheating during testing.

5.1.2 Disadvantages of Essay Test

Despite the advantages already proffered for essay test, it does not satisfy the two
most important qualities of a good measuring instrument. Its disadvantages include:

 It is inadequate in sampling subject matter content and course objectives


since it provides limited sampling. The provision of few questions results in
the invalid and narrow coverage of subject matter and instructional
objectives. Also, as Nenty (1985), rightly pointed out, fewness of the number
of questions often asked encourages permutation of some content areas and
creaming of ideal responses to suspected questions. In this regard, essay
questions discourage the development of good study habit.

 In addition to the invalidity of the measurement, evaluating the answers to


carelessly developed questions tends to be confusing and time consuming
task. This results in poor reliability in scoring. Studies have shown that
150
EDU 423 MEASUREMENT AND EVALUATION

answers to essay questions are scored differently by different teachers and


that even same teachers score the answers differently at different times. A
variation which range from near perfect scores to those representing dismal
failure. This may be attributed to the inability of scorers to clearly identify
the learning outcomes being measured. When the evaluation of answers is
not guided by clearly defined outcomes, it tends to be based on less stable,
initiative judgments.

 Sometimes an essay question implies many skills other than that which the
item was intended to measure. The testee therefore perceives and reacts to the
same questions differently. The differences in the perception of the questions
encourage bluffing and hide differences in the knowledge of basic factual
material and the learner‟s ability to use and organize such facts.

 The essay test item does not readily lend itself to empirical study of item
qualities like difficulty and discrimination based on which improvements on
the item could be made.

5.1.3 When to use Essay Questions

 You should use essay questions in the measurement of complex achievement


when its distinctive feature of freedom of response is required. Learners are
free to select, relate and present ideas in their own words. This freedom
enhances the value of essay questions as a measure of complex achievement
but it introduces scoring difficulties that make them insufficient as measure
of factual knowledge.

 Essay questions should also be used to measure those learning outcomes that
cannot be measured by objective test items. The specific features of essay
questions can be utilized most fully when their shortcomings are offset by the
need for such measurement.

 They should be used when learning outcomes concerned with the abilities to
select, organize, integrate, relate, and evaluate ideas require the freedom of
response and the originality provided by essay questions. More so, when
these outcomes are of such great educational significance that the expenditure
of energy in the difficulty and time-consuming task of evaluating the answers
can be easily justified.

5.2 Classification of Essay Test Items

Essay questions are classified into two types, namely, the restricted response type
and the extended response type. The classification is based on the degree of freedom
of response associated with the question. For instance, an essay question may
require just a few short sentences as answer as in the short-answers objective item
where a sentence or two could be all that is required. Whereas, another essay

151
EDU 423 MEASUREMENT AND EVALUATION

question may give the examinees complete freedom in making their responses and
their answers may require several pages.

However, there are variations in freedom of response that fall within these extreme
conditions. But for convenience, essay questions are presently classified as
restricted response type in which examinees are given almost complete freedom in
making their responses and the restricted response type in which the nature, length
or organization of the response is limited.

5.3 Types of Essay Item Response

As already discussed, there are two types of essay item response. These are the
responses to restricted item questions and responses to extended response questions.

5.3.1 Extended Responses Essay Questions

These are responses to essay questions in which the examinee is only restricted by
time as no bound is placed as regards the depth, breadth and the organization of the
response. An example of question in this category is: “Open and Distance Learning”
is a viable option for the eradication of illiteracy in Nigeria. Discuss”

In response to such a question the examinee demonstrates his ability to select and
recall the facts which he thinks are pertinent, organize and present his ideas in a
logical and coherent form. This freedom to decide which facts he thinks is most
pertinent to select his own method of organization and to write as much as seems
necessary for a comprehensive answer tends to reveal the ability to evaluate ideas,
relate them, coherently and to express them succinctly. In addition, they expose the
individual differences in attitudes, values and creative ability.

This type of essay item is mostly useful in measuring learning outcomes at the
higher cognitive levels of educational objectives such as analysis, synthesis and
evaluation levels. Although, the extended response essay type are also limited by
two weaknesses which are:

 They are insufficient for measuring knowledge of factual materials because


they call for extensive details in selected content area at a time.

 Scoring such responses is usually difficult and unreliable since the examinees
have free will in the array of factual information of varying degree of
correctness, coherence and expression.

These limitations are minimized in the Restricted Response Type.

152
EDU 423 MEASUREMENT AND EVALUATION

5.3.2 Restricted Response Essay Questions

In this type, the examinee is limited to the nature, length or organization of response
to be made. The items are directional questions and are aimed at the desired
responses. This limits the examinee freedom to select, recall, and synthesize all that
he knows and to present them logically as he may wish. This type of essay item is
most useful in measuring learning outcomes at the lower cognitive levels of
educational objectives, that is, knowledge, comprehension and application levels.
An example of restricted response essay question is “state two advantages and two
disadvantages of essay questions”.

The restricted nature of the expected response in this type of items makes it more
efficient for measuring knowledge of factual material. It reduces to a reasonable
extent the difficulty of scoring and encourages more reliability in scoring. However,
the restriction makes it less effective as a measure of ability to select, organize and
integrate ideas and present them in an original and coherent form which is one of the
major advantages of essay test.

5.4 Constructing the Essay Questions

You are now aware of the handicaps of essay questions as a measuring instrument.
Therefore, an essay test is a useful measurement instrument only to the extent that it
is constructed, administered and scored to ensure a high level of objectivity. For this
reason, essay test items should consist of items that will ensure the same
understanding and elicit only the skill or ability one is interested in measuring from
every examinee. Also, the responses are to be such to which two or more examiners
would assign the same score and should attract consistent interpretation from
everybody.

You know that this is difficult to achieve and needs a lot of effort. Hence, the
following points are suggested as guide for construction of good essay test item that
call for the desired behaviour.

i. Restriction of the use of essay questions to only those learning outcomes that
cannot be satisfactorily measured by objective items. That is, essay questions
are to be used only when it‟s desirable and very adequate for measuring the
learning outcomes for full realization of learner‟s achievement. In other
words, they are to be used for questions that call for complex learning
outcomes that pertain to the organization, integration and expression of ideas
which would not have been possible without the use of essay test items.

ii. Formulation of questions that call forth the behaviour specified in the
learning outcomes. Essay questions should be designed to elicit only the skill
which the item was intended to measure. This can be achieved by expressing
clearly and precisely the question in line with clearly defined instructional
objective. In addition, an action verb like compare, contrast, illustrates,

153
EDU 423 MEASUREMENT AND EVALUATION

differentiates, criticized and so on could be used to give the test items more
focus.

iii. Phrase each question to clearly indicate the examinees task. An essay
question has to specify precisely what is required of the examinee. Ensure
that the testee‟s task is clearly indicated by delimiting the area covered by the
item, using descriptive words to give specific direction towards the desired
response. Indicate the score allotted to the test. This suggestion easily lend
itself to restricted response type and care should be taken not to narrow the
questions when constructing the extended response type in order not to
reduce its effectiveness as a measuring of the ability to select, organize and
integrate ideas. Also, adapt the length and complexity of the answer to the
testees‟ level of maturity.

iv. Indication of approximate time limit for each question. It is necessary to


indicate time allotted to each question to enable the testees to pace their
writing on each question and to allay any anxiety that might arise. The timing
should take care of slower testees writing speed so as not to put them at
disadvantage for a satisfactory response.

v. Avoidance of the use of optional questions. The provision of optional


questions although generally favoured by testees obviously means that they
are taking different tests and therefore the common basis for evaluating their
achievement is lost. Moreover, optional questions might also influence the
validity of test results since some examinees may be favoured in their
advanced preparation of selected areas of study. It is also not easy to
construct essay questions of the same difficulty level. Hence, making valid
comparisons of performance among them especially for norm reference
setting will not be possible.

6.0 ACTIVITY

1. State three advantages of using essay test in school.


2. Limitations of essay test.
3. State the two basic essay type of test.

7.0 SUMMARY

In this unit, you learned that:

 essay tests are tests consisting of questions (items) designed to elicit from the
learners, through freedom of response, the extent to which they have acquired
the behaviour called for in the course objectives.

 the main advantage of the essay question is that it measures complex learning
outcomes that cannot be measured by other means. The extended response

154
EDU 423 MEASUREMENT AND EVALUATION

question lays emphasis on the integration and application of thinking and


problem-solving skills. These questions have a desirable influence on
learners‟ study habits.

 it is inadequate for sampling subject matter content and course objectives


because of the limited sample it provides. The scoring is also unreliable.

 essay questions should be used in measurement of complex achievement


when its distinctive feature of freedom of response is required.

 there are two types of essay questions. These are the extended response type
and the restricted response type.

 essay test should be constructed, administered and scored to ensure a high


level of objectivity. It should consist of items that will ensure the same
understanding and elicit only the skill or ability one is interested in
measuring.

8.0 ASSIGNMENT

1. Explain restricted and non restricted essay type of test.


2. Suggest any four ways for improving essay questions.
3. Suggest three ways of scoring essay questions.

9.0 REFERENCES

Gronhund, N. E (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar Press.
Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and
Evaluation. Benin City: Perfect Touch.

155
EDU 423 MEASUREMENT AND EVALUATION

UNIT 3 OBJECTIVE TEST


CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Objective Test
5.1.1 Advantages of Objective Test Item
5.1.2 Disadvantages of Objective Test Item
5.1.3 When to Use Objective Test
5.2 Types of Objective Test
5.2.1 The Free Response Test Item
5.2.2 The Alternative Response Test Item
5.2.3 The Matching Test Item
5.2.4 The Multiple Choice Test Item
5.3 Constructing the Objective Test Items
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the last unit, you learned about essay test. In this unit, you will learn about the
objective test. You will learn about the advantages and disadvantages of objective
test. Furthermore, you will learn when to use objective test and types of objective
test. Finally, you will learn how to construct the various types of objective test.

2.0 OBJECTIVES
By the end of the lesson, you should be able to:
 explain the meaning of objective test;
 state the advantages and disadvantages of objective test;
 identify when to use objective test;
 enumerate the various types of objective test and their peculiar advantages
and disadvantages; and
 construct the various types of objective test.

3.0 HOW TO STUDY THIS UNIT


1. Read through this unit carefully.
2. Study the unit step by step as the points are well arranged.
NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
156
EDU 423 MEASUREMENT AND EVALUATION

4.0 WORD STUDY

 Essay Test: An essay test requires you to see the significance and meaning
of what you know. It tests your knowledge and understanding of the subject
matter.

 Specific skills: The ability to communicate effectively with people in a


friendly manner.

 Personal opinion: A personal view, attitude, or appraisal.

 Bias: Prejudice in favour of or against one thing, person, or group compared


with another, usually in a way considered to be unfair.

 Mood: A temporary state of mind or feeling.

 Subjectivity: Subjectivity refers to how someone's judgment is shaped by


personal opinions and feelings instead of outside influences.

 Pre-test: A preliminary test administered to determine a student's baseline


knowledge or preparedness for an educational experience or course of study.

 Valid ability: Refers to how well a test measures what it is purported to


measure.

 Synthesize ideas: This is defined as combining a number of different parts or


ideas to come up with a new idea or theory.

5.0 MAIN CONTENT

5.1 Objective Test

Objective tests are those test items that are set in such a way that one and only one
correct answer is available to a given item. In this case every scorer would arrive at
the same score for each item for each examination even on repeated scoring
occasions. This type of items sometimes calls on examinees to recall and write
down or to supply a word or phrase as an answer (free–response type). It could also
require the examinees to recognize and select from a given set of possible answers
or options, the one that is correct or most correct (fixed-response type). This implies
that the objective test consist of items measuring specific skills with specific correct
response to each of the items irrespective of the scorer‟s personal opinion, bias,
mood or health at the time of scoring.

157
EDU 423 MEASUREMENT AND EVALUATION

5.1.1 Advantages of Objective Test

 Objective test enhances the assessment of learner‟s response to test items


because the scoring is not influenced by the scorer‟s bias or disposition at the
time of scoring but by the correctness of the answer. By overcoming the
subjectivity of the essay test, the reliability of the test as measuring
instrument is enhanced.
 Scoring of objective test is easy and takes little time. It is also scored by a
machine and facilitates high efficiency in testing a large number of
examinees.
 The result of objective test especially the multiple choice items can be used
for diagnostic purposes since they provide clues for factual errors and
misunderstanding that need remediation.
 It is adequate for sampling the subject matter and instructional objectives of
the course because the relatively large number of items set enhances effective
coverage of the content areas on which the test is based. The result provides
a more valid and reliable ability of the examinees performance.
 It is efficient for measuring knowledge of facts. It can also be designed to
measure understanding, thinking skills and other complex outcomes.
 Objective test items can be pre-test, refined through item analysis,
standardized and reused a number of times if properly handled.
 It is fair to all examinees since it does not call on other skills outside the skill
it is intended to measure. That is, its validity is not affected by good
handwriting, bluffing or the verbiage.

5.1.2 Disadvantages of Objective Test

 It does not encourage the development of examinees‟ originality in desirable


skills such as the ability to select, organize or synthesize ideas and to present
them correctly in a logical and coherent form. The complete structuring of
task is not suitable for assessing learning abilities in this form.
 It tends to measure only factual knowledge. This disadvantage can be
overcome by developing items for the objective items rigorously following
the steps involved in item development process.
 Development of good objective test items requires training of test developers
in the skills necessary for constructing effective, valid and reliable items.
 It needs time, commitment and adequate planning.
 Objective test items lend themselves to guessing especially when the test
items are not skillfully developed. An examinee can guess correctly on few
items and earn some undeserved points even in a well-constructed objective
test. It is also easier to cheat in an objective test than in essay test if the test
is poorly administered.

158
EDU 423 MEASUREMENT AND EVALUATION

5.1.3 When to Use Objective Test

 It is used when highly structured task are needed to limit the type of response
the examinees can make and to obtain correct answers from learners by
demonstrating the specific knowledge or skill called for in the item.
 It is used to appraise more effectively the achievement of any of the
educational objectives of simple learning outcomes as well as the complex
outcomes in the knowledge, understanding, and application and even in
higher levels covering large content areas if skillfully constructed. It is
possible to set as many as 120 objective tests spread over many lesson units
and several cognitive levels of educational objective for one hour or two
hours.
 It is used when objective, quick, easy and accurate scoring is desired
especially when the number of examinees is large.
 It is used to measure understanding, thinking skills and other complex
learning outcomes of the learners.
 It can also be used for diagnosis of learning deficiency and the result used for
remediation process.

5.2 Types of Objective Test

The objective test can be classified into those that require the examinee to supply
the answer to the test items (free-response type) and those that require the examinee
to select the answer from a given number of alternatives (fixed response type). The
free-response type consists of the short answer and completion items while the fixed
response type is commonly further divided into true-false or alternative response
matching items and multiple-choice items.

5.2.1 The Free Response Test Items

The free response type of objective test tends to represent a compromise between
the essay and the objective items. The free response type namely short-answer item
and the completion item both are supply-type test items consisting of direct
questions which require a short answer (short-answer type) or an incomplete
statement or question to which a response must be supplied by an examinees
(completion type). The answers to such questions could be a word, phrase, number
or symbol. It is easy to develop and if well developed, the answers are definite and
specific and can be scored quickly and accurately. An example of question in the
class is:
Short Answer: (i) Who is the first Vice Chancellor of the National Open University
of Nigeria? (Professor Olugbemiro Jegede).
(ii) Who is the current Director General of the National Teachers‟ Institute (NTI)?
(Dr. Ladan Sharehu).

159
EDU 423 MEASUREMENT AND EVALUATION

Completion: (i) The name of the first Vice Chancellor of the National Open
University of Nigeria is (Professor Olugbemiro Jegede).
(ii) The name of the current Director General of the National Teachers‟ Institute is
(Dr. Ladan Sharehu).

The free – response type is very adaptable for item construction in mathematics,
physical sciences and other areas where questions are computational problems
requiring examinees to supply the solutions.
Uses
 It is suitable for measuring a wide variety of relatively simple learning
outcomes such as recall of memorized information and problem solving
outcomes measured in mathematics and sciences.
 It can be used to measure the ability to interprete diagrams, charts, graphs
and pictorial data.
 It is used when it is most effective for measuring a specific learning outcome
such as computational learning outcomes in mathematics and sciences.

Advantages
 It measures simple learning outcomes, which makes it easier to construct.
 It minimizes guessing because the examinees must supply the answer by
either think and recall the information requested or make the necessary
computations to solve the problem presented. It is unlike the selection item
where partial knowledge might enable the examinee to choose the correct
answer.

Disadvantages
 It is not suitable for measuring complex learning outcomes. It tends to
measure only factual knowledge and not the ability to apply such knowledge
and it encourages memorization if excessively used.
 It cannot be scored by a machine because the test item can, if not properly
worded, elicit more than one correct answer. Hence the scorer must make
decision about the corrections of various responses. For example, a question
such as “Where was Dr. Nnamdi Azikiwe born?” Could be answered by
name of the town, state, country or even continent. Apart from the multiple
correct answers to this question, there is also the possibility of spelling
mistakes associated with free-response questions that the scorer has to
contend with.

5.2.2 The Alternative Response Test Item

The alternative response test item commonly called the true-false test item because
the true-false option is commonly used consists of item with declarative statement to
which the examinee is asked to give either of two options concerning the item. The
two options could be true or false, right or wrong, correct or incorrect, yes or no,
fact or opinion, agree or disagree and so on.
160
EDU 423 MEASUREMENT AND EVALUATION

Most times, the alternative response item includes opinion statement and the
examinee is also required to response to them as merely true or false. The opinion
item is not desirable from the standpoint of testing, teaching and learning. If
opinion statement is to be used, it has to be attributed to some source thereby
making it possible to assign the option of true or false to the statement based on
knowledge concerning the belief held by an individual or the values supported by an
organization or institution. An example of alternative response item is as follows:

Read the following statement if the statement is true circle the T if it is false circle
the F.

T or F Solar Energy is the energy radiated from the sun.

The correct answer to the example above is true and is always true.

Uses

 It is commonly used to measure the ability to identify the correctness of


statements of fact, definitions of terms, statements of principles and other
relatively simple learning outcomes to which a declarative statement might
be used with any of the several methods of responding.
 It is also used to measure examinee ability to distinguish fact from opinion;
superstition from scientific belief.
 It is used to measure the ability to recognize cause – and – effect
relationships.
 It is best used in situations in which there are only two possible alternatives
such as right or wrong, more or less, and so on.

Advantages
 It is easy to construct alternative response item but the validity and reliability
of such item depend on the skill of the item constructor. To construct
unambiguous alternative response item, which measures significant learning
outcomes, requires much skill.
 A large number of alternative response items covering a wide area of
sampled course material can be obtained and the examinees can respond to
them in a short period of time.

Disadvantages
 It requires course material that can be phrased so that the statement is true or
false without qualification or exception as in the Social Sciences.
 It is limited to learning outcomes in the knowledge area except for
distinguishing between facts and opinion or identifying cause – and – effect
relationships.
 It is susceptible to guessing with a fifty-fifty chance of the examinee
selecting the correct answer on chance alone. The chance selection of correct
answer has the following effects.

161
EDU 423 MEASUREMENT AND EVALUATION

i. It reduces the reliability of each item thereby making it necessary to


include many items in order to obtain a reliable measure of
achievement.
ii. The diagnostic value of answers to guess test items is practically nil
because analysis based on such response is meaningless.
iii. The validity of examinees response is also questionable because of
response set.

Response set is a consistent tendency to follow a certain pattern in responding to test


items. For instance some examinees will consistently mark “true” those items they
do not know while others will consistently mark them “false. Any given test will
therefore favour one response set over another thereby introducing an element into
the test score that is irrelevant to the purpose of the test.

5.2.3 The Matching Test Items

The matching test items usually consist of two parallel columns. One column
contain a list of word, number, symbol or other stimuli (premises) to be matched to
a word, sentence, phrase or other possible answer from the other column (responses)
lists. The examinee is directed to match the responses to the appropriate premises.
Usually, the two lists have some sort of relationship. Although the basis for
matching responses to premises is sometimes self-evident but more often it must be
explained in the directions.

The examinees task then is to identify the pairs of items that are to be associated on
the basis indicated. Sometimes the premises and responses list is an imperfect match
with more list in either of the two columns and the direction indicating what to be
done. For instance, the examinee may be required to use an item more than once or
not at all, or once. This deliberate procedure is used to prevent examinees from
matching the final pair of items on the basis of elimination. An example of
matching item is given below.

Choose the most appropriate approach to Validity from the list in Column B that
matches each of the Validity Evidence on the list in Column A.

Column A Column B
Validity Evidence Approaches to Test Validation
Content-Related Compare test scores with another measure of
Evidence performance obtained at a later date
Criterion-Related Establish the meaning of the scores on the test by
Evidence controlling the development of the test and
experimentally determining the factors that influence test
performance.
Construct-Related Establish how well the sample of test tasks represents the
Evidence domain of tasks to be measured.
Compare the test tasks to the test specifications
162
EDU 423 MEASUREMENT AND EVALUATION

describing the test domain under consideration.


Uses

 It is used whenever learning outcomes emphasize the ability to identify the


relationship between things and a sufficient number of homogenous premises
and responses can be obtained.
 Essentially used to relate two things that have some logical basis for
association.
 It is adequate for measuring factual knowledge like testing the knowledge of
terms, definitions, dates, events, references to maps and diagrams.

Advantages

 The major advantage of matching exercise is that one matching item consists
of many problems. This compact form makes it possible to measure a large
amount of related factual material in a relatively short time.
 It enables the sampling of larger content, which results in relatively higher
content validity.
 The guess factor can be controlled by skillfully constructing the items such
that the correct response for each premise must also serve as a plausible
response for the other premises.
 The scoring is simple and objective and can be done by machine.

Disadvantages

 It is restricted to the measurement of factual information based on rote


learning because the material tested lend themselves to the listing of a
number of important and related concepts.
 Many topics are unique and cannot be conveniently grouped in homogenous
matching clusters and it is sometimes difficult to get homogenous materials
clusters of premises and responses that can sufficiently match even for
contents that are adaptable for clustering.
 It requires extreme care during construction in order to avoid encouraging
serial memorization rather than association and to avoid irrelevant clues to
the correct answer.

5.2.4 The Multiple Choice Test Items

The multiple choice item consists of two parts – a problem and a list of suggested
solutions. The problem generally referred to as the stem may be stated as a direct
question or an incomplete statement while the suggested solutions generally referred
to as the alternatives, choices or options may include words, numbers, symbols or
phrases. In its standard form, one of the options of the multiple choice item is the
correct or best answer and the others are intended to mislead, foil, or distract
examinees from the correct option and are therefore called distracters, foils or

163
EDU 423 MEASUREMENT AND EVALUATION

decoys. These incorrect alternatives receive their name from their intended function
– to distract the examinees who are in doubt about the correct answer. An example
of multiple-choice item is given below.

Which one of the following factors contributed most to the selection of Abuja as the
Federal Capital Territory of Nigeria?

(A) Central location


(B) Good climate
(C) Good highways
(D) Low population
(E) High population.

The best-answer form of Multiple Choice Item is usually more difficult than the
correct answer form. This is because such items are used to measure more complex
learning outcomes. It is especially useful for measuring learning outcomes that
require the understanding, application or interpretation of factual information. An
example is given below.

Which of these best describes the property of speed?

(A) It has magnitude.


(B) It has direction.
(C) It is a scalar quantity.
(D) It is a vector quantity.
(E) It has magnitude and direction.

Uses

 The multiple-choice item is the most widely used of the types of test
available. It can be used to measure a variety of learning outcomes from
simple to complex.
 It is adaptable to any subject matter content and educational objective at the
knowledge and understanding levels.
 It can be used to measure knowledge outcomes concerned with vocabulary,
facts, principles, method and procedures and also aspects of understanding
relating to the application and interpretation of facts, principles and methods.
 Most commercially developed and standardized achievement and aptitude
tests make use of multiple-choice items.
Advantages

 The main advantage of multiple-choice test is its wide applicability in the


measurement of various phases of achievement.
 It is the desirable of all the test formats being free of many of the
disadvantages of other forms of objective items. For instance, it present a
more well-defined problem than the short-answer item, avoids the need for
164
EDU 423 MEASUREMENT AND EVALUATION

homogenous material necessary for the matching item, reduces the clues and
susceptibility to guessing characteristics of the true-false item and is
relatively free from response sets.
 It is useful in diagnosis and it enables fine discrimination among the
examinees on the basis of the amount of what is being measured possessed
by them.
 It can be scored with a machine.
Disadvantages

 It measures problem-solving behaviour at the verbal level only.


 It is inappropriate for measuring learning outcomes requiring the ability to
recall, organize or present ideas because it requires selection of correct
answer.
 It is very difficult and time consuming to construct.
 It requires more response time than any other type of objective item and may
favour the test-wise examinees if not adequately and skillful constructed.

5.3 Constructing the Objective Test Items

You have seen that simple put a test item is a statement sometimes in question form
that tries to elicit a testee‟s level of knowledge, ability or understanding of a specific
subject matter. Therefore, writing a good test item is an art that requires some skill,
time, perseverance, and creativity. The following are some general guidelines for
the construction of any type of objective test item.
 The wording of the item should be clear and as explicit as possible.
 Avoid setting interrelated items
 Items should be designed to test important and not trivial facts or knowledge.
 Write an item to elicit discriminately the extent of examinees possession of
only the desired behaviour as stipulated in the course instructional objectives
answers.
 Ensure that there is one and only one correct or best answer to each item.
 Avoid unintentionally giving away the answer through providing irrelevant
clues.
 Use language appropriate to the level of the examinees.
 Items in an achievement test should be constructed to elicit specific course
content and not measure general intelligence.
 Have an independent reviewer to vet your test items.
(Nenty, 1985: 204)
6.0 ACTIVITY

1. Differentiate between objective and subjective test.


2. What is objective test?
3. State any six general suggestions in constructing objective test.

165
EDU 423 MEASUREMENT AND EVALUATION

7.0 SUMMARY

 Objective test are those test items that are set in such a way that one and only
one correct answer is available to a given item.
 The various types of objective test discussed in this unit are:

- The Free Response Test Item


- The Alternative Response Test Item
- The Matching Test Item
- The Multiple Choice Items.

 The Multiple Choice Item is the best and most widely used of all the
objective test items. It consists of two parts- a problem and a list of suggested
solutions. The problem is referred to as the stem while the options are
referred to as distracters.
 General guideline for the construction of valid and reliable objective test
items is also listed.

8.0 ASSIGNMENT

Differentiate the following terms in line with multiple choice objective test:
1. Stem and option.
2. Key and distractors.
3. Good and bad distractors.

9.0 REFERENCE/FURTHER READINGS

Gronhund, N. E. (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company,

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

166
EDU 423 MEASUREMENT AND EVALUATION

UNIT 4 TEST DEVELOPMENT– PLANNING THE CLASSROOM TEST


CONTENTS
1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Test Development – Planning the Classroom Test
5.1.1 Considerations in Planning the Classroom Test
5.1.2 Scrutiny of Instructional Objectives
5.1.3 Content Survey
5.1.4 Planning the Table of Specification/Test Blue Print.
5.2 Item Writing
5.3 Moderation of Test Items
5.4 Assembling the Test Items
5.4.1 Guide for Preparing the Objective Test for use
5.4.2 Guide for Preparing the Essay Test for use
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In this unit, you will learn how to plan a classroom test. You will learn what to
consider in the planning stage, how to carry out content survey and to scrutinize the
instructional objectives as relevant factors in the development of table of
specification/test blue print. Thereafter you will learn how to develop the test blue
print, moderate items generated and prepare the items for use

2.0 OBJECTIVES

By the time you finish this unit, you will be able to:

 identify the sequence of planning a classroom test;


 prepare a table of specification for classroom test in a given subject;
 carry out item moderation processes; and
 assemble moderated test items for use.

3.0 HOW TO STUDY THIS UNIT


1. Read through this unit carefully.
2. Study the unit step by step as the points are well arranged.
NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
167
EDU 423 MEASUREMENT AND EVALUATION

4.0 WORD STUDY

 Survey: A general view, examination, or description of someone or


something.

 Scrutinize: To examine in detail with careful or critical attention.

 Instructional objectives: Instructional objectives may also be called


performance objectives, behavioral objectives, or simply objectives. All of
these terms are used interchangeably. Objectives are specific, outcome based,
measurable, and describe the learner's behavior after instruction.

 Relevant factors: To consider something relevant when making a decision


or conclusion Appropriate to the current time, period, or circumstances; of
contemporary interest.

 Mastery: Comprehensive knowledge or skill in a subject or accomplishment.

 Weighting: A measure of the heaviness of an object:

 Proportion: A part, share, or number considered in comparative relation to a


whole.

5.0 MAIN CONTENT

5.1 Test Development – Planning the Classroom Test

The development of valid, reliable and usable questions involves proper planning.
The plan entails designing a framework that can guide the test developers in the
items development process. This is necessary because classroom test is a key factor
in the evaluation of learning outcomes. The validity, reliability and usability of such
test depend on the care with which the test are planned and prepared. Planning helps
to ensure that the test covers the pre-specified instructional objectives and the
subject matter (content) under consideration. Hence, planning classroom test entails
identifying the instructional objectives earlier stated, the subject matter (content)
covered during the teaching/learning process. This leads to the preparation of table
of specification (the test blue print) for the test while bearing in mind the type of test
that would be relevant for the purpose of testing. An outline of the framework for
planning the classroom test is as prescribed below.

5.1.1 Considerations in Planning a Classroom Test

To plan a classroom test that will be both practical and effective in providing
evidence of mastery of the instructional objectives and content covered requires
relevant considerations. Hence, the following serves as guide in planning a
classroom test.

168
EDU 423 MEASUREMENT AND EVALUATION

 determine the purpose of the test;


 describe the instructional objectives and content to be measured.
 determine the relative emphasis to be given to each learning outcome;
 select the most appropriate item formats (essay or objective);
 develop the test blue print to guide the test construction;
 prepare test items that is relevant to the learning outcomes specified in the
test plan;
 decide on the pattern of scoring and the interpretation of result;
 decide on the length and duration of the test; and
 assemble the items into a test, prepare direction and administer the test.

5.1.2 Scrutiny of the Instructional Objectives

The instructional objectives of the course are critically considered while developing
the test items. This is because the instructional objectives are the intended
behavioural changes or intended learning outcomes of instructional programmes
which students are expected to possess at the end of the course or programme of
study. The instructional objectives usually stated for the assessment of behaviour in
the cognitive domain of educational objectives are classified by Bloom (1956) in his
taxonomy of educational objectives into knowledge, comprehension, application,
analysis, synthesis and evaluation. The objectives are also given relative weight in
respect to the level of importance and emphasis given to them. Educational
objectives and the content of a course form the nucleus on which test development
revolves.

5.1.3 Content Survey

This is an outline of the content (subject matter or topics) of a course of programme


to be covered in the test. The test developer assigns relative weight to the outlined
content – topics and subtopics to be covered in the test. This weighting depends on
the importance and emphasis given to that content area. Content survey is necessary
since it is the means by which the objectives are to be achieved and level of
mastering determined.

5.1.4 Planning the table of specification/test blue print

The table of specification is a two dimensional table that specifies the level of
objectives in relation to the content of the course. A well-planned table of
specification enhances content validity of that test for which it is planned. The two
dimensions (content and objectives) are put together in a table by listing the
objectives across the top of the table (horizontally) and the content down the table
(vertically) to provide the complete framework for the development of the test
items. The table of specification is planned to take care of the coverage of content
and objectives in the right proportion according to the degree of relevance and
emphasis (weight) attached to them in the teaching learning process. A hypothetical
table of specification is illustrated in table 3.1 below:
169
EDU 423 MEASUREMENT AND EVALUATION

Table 3.1 A Hypothetical Test Blue Print/Table of Specification.


Content Objectives Total

Area Weight Knowledge Comprehension Application Analysis Synthesis Evaluation

10% 15% 15% 30% 10% 20% 100%

Set A 15% - 1 - 2 - - 3

Set B 15% - 1 - 2 - - 3

Set C 25% 1 - 1 1 1 1 5

Set D 25% 1 - 1 1 1 1 5

Set E 20% - 1 1 - - 2 4

Total 100% 2 3 3 6 2 4 20

i. The first consideration in the development of Test Blue-Print is the weight to


be assigned to higher order questions and the lower order questions (That is,
to educational objectives at higher and at lower cognitive levels). This is
utilized in the allocation of numbers of questions to be developed in each cell
under content and objective dimensions. In the hypothetical case under
consideration, the level of difficulty for lower order questions (range:
knowledge to application) is 40% while the higher order questions (range:
analysis to evaluation) is 60%. This means that 40% of the total questions
should be lower order questions while 60% of the questions are higher order
questions. The learners in this case are assumed to be at the Senior Secondary
Level of Education. Also, an attempt should be made as in the above to
ensure that the questions are spread across all the levels of Bloom‟s (1956)
Taxonomy of Educational Objectives.

ii. The blue-print is prepared by drawing 2-dimensional framework with the list
of contents vertically (left column) and objectives horizontally (top row) as
shown in table 3.1 above.

iii. Weights are assigned in percentages to both content and objectives


dimensions as desired and as already stated earlier.

iv. Decisions on the number of items to be set and used are basis for determining
items for each content area. For instance, in table 3-1, set A is weighted 15%
and 20 items are to be generated in all. Therefore, total number of items for
set A is obtained thus:

- Set A, weight: 15% of 20 items = 3 items


- Set B, weight: 15% of 20 items = 3 items
- Set C, weight: 25% of 20 items = 5 items
- Set D, weight: 25% of 20 items = 5 items
170
EDU 423 MEASUREMENT AND EVALUATION

- Set E, weight: 20% of 20 items = 4 items.

The worked out values are then listed against each content area at the
extreme right (Total column) to correspond with its particular content.

v. The same procedure is repeated for the objective dimension. Just like in the
above.

- Knowledge: weight 10% of 20 items = 2 items


- Comprehension: weight 15% of 20 items = 3 items
- Application: weight 15% of 20 items = 3 items
- Analysis: weight 30% of 20 items = 6 items
- Synthesis: weight 10% of 20 items = 2 items
- Evaluation: weight 20% of 20 items = 4 items.

Here also, the worked out values are listed against each objective at the last
horizontal row, alongside the provision for total.

vi. Finally, the items for each content are distributed to the relevant objectives in
the appropriate cells. This has also been indicated in the table 3.1 above.
The Table of Specification now completed, serves as a guide for constructing
the test items. It should be noted that in the table knowledge, comprehension
and application levels have 2, 3, and 3 items respectively. That is, 2+3+3 = 8
items out of 20 items representing 40% of the total test items. While
analysis, synthesis and evaluation have 6, 2 and 4 items respectively. That is,
6+2+4 = 12 items out of 20 items representing 60% of the total items.

vii. The development of table of specification is followed by item writing. Once


the table of specification is adhered to in the item writing, the item would
have appropriate content validity at the required level of difficulty. The table
of specification is applicable both for writing essay items (subjective
questions) and for writing objective items (multiple choice questions,
matching sets items, completion items, true/false items).

5.2 Item Writing

The next task in planning the classroom test is to prepare the actual test items. The
following is a guide for item writing:
i. Keep the test blueprint in mind and in view as you are writing the test items.
The blueprint represents the master plan and should readily guide you in item
writing and review.

ii. Generate more items than specified in the table of specification. This is to
give room for item that would not survive the item analysis hurdles.

iii. Use unambiguous language so that the demands of the item would be clearly
understood.
171
EDU 423 MEASUREMENT AND EVALUATION

iv. Endeavour to generate the items at the appropriate levels of difficulty as


specified in the table of specification. You may refer to Bloom (1956)
taxonomy of educational objectives for appropriate action verb required for
each level of objective.

v. Give enough time to allow an average student to complete the task.

vi. Build in a good scoring guide at the point of writing the test items.

vii. Have the test exercises examined and critiqued by one or more colleagues.
Then subject the items to scrutiny by relevant experts. The experts should
include experts in measurement and evaluation and the specific subject
specialist. Incorporate the critical comments of the experts in the
modification of the items.

viii. Review the items and select the best according to the laid down table of
specification/test blue print.

Also associated with test development is the statistical analysis –The Item analysis.
This is used to appraise the effectiveness of the individual items.

Another important factor is reliability analysis. Both item analysis and reliability
analysis would be treated in subsequent units. The item analysis and validity are
determined by trail testing the developed items using a sample from the population
for which the test is developed.

5.3 Moderation of Test Items

As earlier mentioned, one of the important stages of item development is to have


test exercise examined and critiqued by one or more colleagues and experts. This
process is known as “moderation” of items. After the item development phase, the
test items are moderated by an expert or panel of experts before using them
especially for school wide or class wide test such as end of term test. Before sending
the items to external moderators (assessors), the items are to be given first to the
subject head who should read through the items. Make intelligent input and some
modifications on areas of need identified. The subject expert may also deem it
necessary to engage others in the department who are knowledgeable in that
discipline to carry out similar exercise (subject experts‟ validation) before selecting
the most appropriate one for external assessors (subject specialists and evaluation
experts) to make final input before use.

The marking scheme and the marks allocated to various sections of the content
covered should be sent along the test items to the external assessors. When this
process is effectively carried out, the resulting items (items that survived the hurdles
of moderation exercise) would have face, construct and content validity as test
measuring instrument.
172
EDU 423 MEASUREMENT AND EVALUATION

5.4 Assembling the Test items

You are to assemble the test items for use after the moderation process. The
following are guides to enable you prepare and assemble both essay and objective
test items for use.

5.4.1 Guide for Preparing the Objective Test Items for Use:

 Arrange the items on the test so that they are easy to read.
 Plan the layout of the test in such a way as to be convenient for recording
answers and also scoring of the test items on separate answer sheets.
 Group items of the same format (true-false, multiple choice, matching items,
completion items) together with their relevant directions on what need to be
done by the testees.
 Group items dealing with the same content together within item types.
 Arrange the test items in progressive order of difficulty starting from simple
to complex questions.
 Ensure that one item does not provide clues to the answer of another item or
items in the same or another section of the test.
 Ensure that the correct responses from essentially a random pattern and in
each of the possible response positions about the same percentage of the time
for multiple choice items.

5.4.2 Guide for Preparing Essay Test for Use

 The test items should not be too many or two lengthy for the testees to
answer in the time available.
 Ensure that a range of complexity and difficulty are in the test items
especially when several essay items are given.
 It is preferable to give all testees same type of essay questions to answer in
classroom test.
 Write a set of general directions for the test.
 Specify the point value for each question on the test.

Once you are through with the preparation of the test items, you now assemble the
test items in the desired format for the testing exercise. The test items are now ready
in the usable format for test administration.

6.0 ACTIVITY

1. What is item analysis?


2. What are the three indices involved in tem analysis
3. Explain the three indices stated in two above

173
EDU 423 MEASUREMENT AND EVALUATION

7.0 SUMMARY

 Planning classroom test entails designing a framework that can guide the test
developers in the items development process.
 The outline of such a framework takes into account the content, the
instructional objective and the test blue print.
 The content and instructional objectives are given relevant weight in
planning the table of specification. This weighting depends on the importance
and emphasis given to that content and the instructional objective

8.0 ASSIGNMENT

1. What is a test blue print? (Table of specification).


2. State any 3 functions of the test blue print or table of specification.
3. Enumerate the three major steps in test construction.

9.0 REFERENCES

Gronhund N. E (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in


Education. Calabar: University of Calabar Press.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

174
EDU 423 MEASUREMENT AND EVALUATION

UNIT 5 THE ADMINISTRATION AND SCORING OF


CLASSROOM TEST

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 The Administration of Classroom Test
5.1.1 Ensuring Quality in Test Administration
5.1.2 Credibility and Civility in Test Administration
5.2 Scoring of Classroom Test
5.2.1 Scoring Essay Test
5.2.2 Scoring Objective Test
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In the last two units, you learned about test. Here you will learn test administration
and scoring of classroom test. You will learn how to ensure quality in test
administration as well as credibility and civility in test administration. Furthermore,
you will learn how to score essay and objective test items using various methods.

2.0 OBJECTIVES

By the time you are done with this unit you will be able to:

 explain the meaning of test administration;


 state the steps involved in test administration;
 identify the need for civility and credibility in test administration;
 state the factors to be considered for credible and civil test administration;
and
 score both essay and objective test using various methods.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
175
EDU 423 MEASUREMENT AND EVALUATION

4.0 WORD STUDY

 Procedure: An established or official way of doing something.

 Process: A series of actions or steps taken in order to achieve a particular


end.

 Physical environment: The part of the human environment that includes


purely physical factors (as soil, climate, water supply).

 Psychological environment: This is the interplay between individuals and


their surroundings.

 Credibility: The quality of being convincing or believable.

 Encumbrances: A burden, obstruction, or impediment.

 Anxiety: A feeling of worry, nervousness, or unease, typically about an


imminent event or something with an uncertain outcome.

 Assessment: Refers to the wide variety of methods or tools that educators


use to evaluate, measure a particular construct.

 Evaluation: Value judgment or worth of a programme.

 Outcome: Something that follows as a result or consequence

 Viabilities: The quality of being able to happen or having a reasonable


chance of success.

 Uniformity: The quality or state of being homogeneous.

 Standards: An idea or thing used as a measure.

5.0 MAIN CONTENT

5.1 Test Administration

Test Administration as you know refers to the procedure of actually presenting the
learning task that the examinees are required to perform in order to ascertain the
degree of learning that has taken place during the teaching-learning process. This
procedure is as important as the process of preparing the test. This is because the
validity and reliability of test scores can be greatly reduced when test is poorly
administered. While administering test all examinees must be given fair chance to
demonstrate their achievement of the learning outcomes being measured. This
requires the provision of a physical and psychological environment which is
176
EDU 423 MEASUREMENT AND EVALUATION

conducive to their making their best efforts and the control of such factors such as
malpractices and unnecessary threat from test administrators that may interfere with
valid measurement.

5.1.1 Ensuring Quality in Test Administration

Quality and good control are necessary components of test administration. The
following are guidelines and steps involved in test administration aimed at ensuring
quality in test administration.

 Collect the question papers in time from custodian to be able to start the test
at the appropriate time stipulated.
 Ensure compliance with the stipulated sitting arrangements in the test to
prevent collision between or among the testees.
 Ensure orderly and proper distribution of questions papers to the testees.
 Do not talk unnecessarily before the test. Testees‟ time should not be wasted
at the beginning of the test with unnecessary remarks, instructions or threat
that may develop test anxiety.
 It is necessary to remind the testees of the need to avoid malpractices before
they start and make it clear that cheating will be penalized.
 Stick to the instructions regarding the conduct of the test and avoid giving
hits to testees who ask about particular items. But make corrections or
clarifications to the testees whenever necessary.
 Keep interruptions during the test to a minimum.

5.1.2 Credibility and Civility in Test Administration


Credibility and Civility are aspects of characteristics of assessment which have day
to day relevance for developing educational communities. Credibility deals with the
value the eventual recipients and users of the results of assessment place on the
result with respect to the grades obtained, certificates issued or the issuing
institution. While civility on the other hand enquires whether the persons being
assessed are in such conditions as to give their best without hindrances and
encumbrances in the attributes being assessed and whether the exercise is seen as
integral to or as external to the learning process. Hence, in test administration, effort
should be made to see that the testees are given a fair and unaided chance to
demonstrate what they have learnt with respect to:

i. Instructions: Test should contain a set of instructions which are usually of


two types. One is the instruction to the test administrator while the other one
is to the testee. The instruction to the test administrator should explain how
the test is to be administered the arrangements to be made for proper
administration of the test and the handling of the scripts and other materials.
The instructions to the administrator should be clear for effective compliance.
For the testees, the instruction should direct them on the amount of work to
be done or of tasks to be accomplished. The instruction should explain how
the test should be performed. Examples may be used for illustration and to
177
EDU 423 MEASUREMENT AND EVALUATION

clarify the instruction on what should be done by the testees. The language
used for the instruction should be appropriate to the level of the testees. The
necessary administrators should explain the testees‟ instruction for proper
understanding especially when the ability to understand and follow
instructions is not part of the test.

ii. Duration of the Test: The time for accomplishing the test is technically
important in test administration and should be clearly stated for both the test
administrators and testees. Ample time should be provided for candidates to
demonstrate what they know and what they can do. The duration of test
should reflect the age and attention span of the testees and the purpose of the
test.

iii. Venue and Sitting Arrangement: The test environment should be learner
friendly with adequate physical conditions such as work space, good and
comfortable writing desks, proper lighting, good ventilation, moderate
temperature, conveniences within reasonable distance and serenity necessary
for maximum concentration. It is important to provide enough and
comfortable seats with adequate sitting arrangement for the testees‟ comfort
and to reduce collaboration between them. Adequate lighting, good
ventilation and moderate temperature reduce test anxiety and loss of
concentration which invariably affects performance in the test. Noise is
another undesirable factor that has to be adequately controlled both within
and outside the test immediate environment since it affects concentration and
test scores.

iv. Other necessary conditions: Other necessary conditions include the fact that
the questions and questions paper should be friendly with bold characters,
neat, decent, clear and appealing and not such that intimidates testee into
mistakes. All relevant materials for carrying out the demands of the test
should be provided in reasonable number, quality and on time.

All these are necessary to enhance the test administration and to make assessment
civil in manifestation.

On the other hand, for the credibility, effort should be made to moderate the test
questions before administration based on laid down standard. It is also important to
ensure that valid questions are constructed based on procedures for test construction
which you already know as we have earlier discussed this in units 2 and 3 of this
module. Secure custody should be provided for the questions from the point of
drafting to constituting the final version of the test, to provision of security and safe
custody of live scripts after the assessment, transmitting then to the graders and
provision of secure custody for the grades arising from the assessment against loss,
mutilation and alteration. The test administrators and the graders should be of
proven moral integrity and should hold appropriate academic and professional
qualifications. The test scripts are to be graded and marks awarded strictly by using

178
EDU 423 MEASUREMENT AND EVALUATION

itemized marking schemes. All these are necessary because an assessment situation
in which credibility is seriously called to question cannot really claim to be valid.

5.2 Scoring the Test

In the evaluation of classroom learning outcomes marking schemes are prepared


alongside the construction of the test items in order to score the test objectively.
The marking scheme describes how marks are to be distributed amongst the
questions and between the various parts of the question. This distribution is
dependent on the objectives stated for the learning outcome during teaching and the
weight assigned to the questions during test preparation and construction of the test
item. The marking scheme takes into consideration the facts required to answer the
questions and the extent to which the language used meets the requirement of the
subject. The actual marking is done following the procedures for scoring essay
questions (for essay questions) and for scoring objective items (for objective items)

5.2.1 Scoring Essay Test

As you are already aware the construction and scoring of essay questions are
interrelated processes that require attention if a valid and reliable measure of
achievement is to be obtained. In the essay test the examiner is an active part of the
measurement instrument. Therefore, the viabilities within and between examiners
affect the resulting score of examinee. This variability is a source of error, which
affects the reliability of essay test if not adequately controlled. Hence, for the essay
test result to serve useful purpose as valid measurement instrument conscious effort
is made to score the test objectively by using appropriate methods to minimize the
effort of personal biases and idiosyncrasies on the resulting scores; and applying
standards to ensure that only relevant factors indicated in the course objectives and
called for during the test construction are considered during the scoring. There are
two common methods of scoring essay questions. These are:

The Point or Analytic Method

In this method each answer is compared with already prepared ideal marking
scheme (scoring key) and marks are assigned according to the adequacy of the
answer. When used conscientiously, the analytic method provides a means for
maintaining uniformity in scoring between scorers and between scripts thus
improving the reliability of the scoring. This method is generally used satisfactorily
to score Restricted Response Questions. This is made possible by the limited
number of characteristics elicited by a single answer, which thus defines the degree
of quality precisely enough to assign point values to them. It is also possible to
identify the particular weakness or strength of each examinee with analytic scoring.
Nevertheless, it is desirable to rate each aspect of the item separately. This has the
advantage of providing greater objectivity, which increases the diagnostic value of
the result.

179
EDU 423 MEASUREMENT AND EVALUATION

The Global/Holistic of Rating Method

In this method the examiner first sorts the response into categories of varying
quality based on his general or global impression on reading the response. The
standard of quality helps to establish a relative scale, which forms the basis for
ranking responses from those with the poorest quality response to those that have
the highest quality response. Usually between five and ten categories are used with
the rating method with each of the piles representing the degree of quality and
determines the credit to be assigned. For example, where five categories are used,
and the responses are awarded five letter grades: A, B, C, D and E. The responses
are sorted into the five categories where A -quality responses, B – quality, C –
quality D- quality and E-quality. There is usually the need to re-read the responses
and to re-classify the misclassified ones. This method is ideal for the extended
response questions where relative judgments are made (no exact numerical scores)
concerning the relevance of ideas, organization of the material and similar qualities
evaluated in answers to extended response questions. Using this method requires a
lot of skill and time in determining the standard response for each quality category.
It is desirable to rate each characteristic separately. This provides for greater
objectivity and increases the diagnostic value of the results. The following are
procedures for scoring essay questions objectively to enhance reliability.

i. Prepare the marking scheme or ideal answer or outline of expected answer


immediately after constructing the test items and indicate how marks are to
be awarded for each section of the expected response.

ii. Use the scoring method that is most appropriate for the test item. That is, use
either the analytic or global method as appropriate to the requirements of the
test item.

iii. Decide how to handle factors that are irrelevant to the learning outcomes
being measured. These factors may include legibility of handwriting,
spelling, sentence structure, punctuation and neatness. These factors should
be controlled when judging the content of the answers. Also, decide in
advance how to handle the inclusion of irrelevant materials (uncalled for
responses).

iv. Score only one item in all the scripts at a time. This helps to control the
“halo” effect in scoring.

v. Evaluate the answers to responses anonymously without knowledge of the


examinee whose script you are scoring. This helps in controlling bias in
scoring the essay questions.

vi. Evaluate the marking scheme (scoring key) before actual scoring by scoring a
random sample of examinees actual responses. This provides a general idea
of the quality of the response to be expected and might call for a revision of
the scoring key before commencing actual scoring.
180
EDU 423 MEASUREMENT AND EVALUATION

vii. Make comments during the scoring of each essay item. These comments act
as feedback to examinees and a source of remediation to both examinees and
examiners.

viii. Obtain two or more independent ratings if important decisions are to be


based on the results. The result of the different scorers should be compared
and rating moderated to reflect the discrepancies for more reliable results.

5.2.2 Scoring Objective Test

Objective test can be scored by various methods with ease, unlike the essay test.
Various techniques are used to speed up the scoring and the techniques to use
sometimes depend on the type of objective test. Some of these techniques are as
follows:

i. Manual Scoring

In this method of scoring the answer to test items are scored by direct comparison of
the examinees answer with the marking key. If the answers are recorded on the test
paper for instance, a scoring key can be made by marking the correct answers on a
blank copy of the test. Scoring is then done by simply comparing the columns of
answers on the master copy with the columns of answers on each examinee‟s test
paper. Alternatively, the correct answers are recorded on scripts of paper and this
script key on which the column of answers are recorded are used as master for
scoring the examinees test papers.

ii. Stencil Scoring

On the other hand, when separate sheet of answer sheets are used by examinees for
recording their answers, it‟s most convenient to prepare and use a scoring stencil. A
scoring stencil is prepared by pending holes on a blank answer sheet where the
correct answers are supposed to appear. Scoring is then done by laying the stencil
over each answer sheet and the number of answer checks appearing through the
holes is counted. At the end of this scoring procedure, each test paper is scanned to
eliminate possible errors due to examinees supplying more than one answer or an
item having more than one correct answer.

iii. Machine Scoring

Usually for a large number of examinees, a specially prepared answer sheets are
used to answer the questions. The answers are normally shaded at the appropriate
places assigned to the various items. These special answer sheets are then machine
scored with computers and other possible scoring devices using certified answer key
prepared for the test items.

In scoring objective test, it is usually preferable to count each correct answer as one
point. An examinee‟s score is simply the number of items answered correctly.
181
EDU 423 MEASUREMENT AND EVALUATION

Sometimes examiners may prefer to correct for guessing. To do this the following
formula may be used.

Correction for Guessing

The most common correction – for – guessing formula although rarely used is:
Score = Right -

Where n is the number of alternatives for an item.

6.0 ACTIVITY

What is the role of the test administrator?


1. Before the test
2. During the test
3. After the test

7.0 SUMMARY

 Test Administration refers to the procedure of presenting the learning tasks


that the examinees are required to perform in order to ascertain the degree of
learning that has taken place during the teaching – learning process.
 Credibility and Civility in Test Administration are aspects of characteristics
of assessment, which have day-to-day relevance for developing educational
communities.
 Credibility deals with the value the eventual recipients and users of the result
of assessment place on the result with respect to grades obtained.
 Civility enquires whether the person being assessed is in such conditions as
to give their best without hindrances and encumbrances in the attributes
being assessed.
 Scoring test involves the preparation of marking scheme which describes
how marks are to be distributed amongst the questions and between the
various parts of the question.
 There are two methods of scoring essay questions:

Analytic method in which each answer is compared with already prepared ideal
marking scheme (scoring key) and marks are assigned according to the adequacy of
the answer. When used conscientiously, it provides a means for maintaining
uniformity in scoring between scorers and between scripts and thereby improving
the reliability of the scoring.

Rating Method is where the examiner first sort the responses into categories of
varying quality based on his general or global impression on reading the responses.
The standard of quality helps to establish a relative scale which forms the basis for
ranking response from those with the poorest quality to those that have the highest

182
EDU 423 MEASUREMENT AND EVALUATION

quality response. Using this method requires a lot of skill and time in determining
the standard response for each quality category

 Methods of Scoring Objective Test are by Manual Scoring, Stencil Scoring


and Machine Scoring.
 The Correction formula for guessing in Objective test is given by:

Score = Right -

Where n is the number of alternatives for an item.

8.0 ASSIGNMENT

1. State any three reasons why correction for guessing is important.


2. What are the rationales for correction for guessing?
3. It is expected that professional and ethical behaviour must be demonstrated
regarding all aspects of the test administration. Any help with answering
questions for a student that advantages them in any way will be considered
cheating. What are the ethical help a test administrator can offer to students
in an examination?

9.0 REFERENCES

Cottrell, S., (2001). Teaching Study Skills and Supporting Learning. (1st ed.) New
York: Palgrave.

Farrant, J. S., (2000). Principles and Practice of Education. (New ed.). London:
Longman.

Gronhund, N. E (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

183
EDU 423 MEASUREMENT AND EVALUATION

MODULE 4

Unit 1 Judging the Quality of a Classroom Test


Unit 2 Interpreting Classroom Test Scores
Unit 3 Reliability of a Test
Unit 4 Validity of Classroom Test
Unit 5 Problem of Marking Test and Quality Control in Marking System

UNIT 1 JUDGING THE QUALITY OF A CLASSROOM TEST

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Judging the Quality of a Classroom Test
5.1.1 Item Analysis
5.1.2 Purpose and Uses of Item Analysis
5.2 The Process of Item Analysis for Norm Reference Classroom Test
5.2.1 Computing Item Difficulty
5.2.2 Computing Item Discriminating Power
5.2.3 Evaluating the Effectiveness of Distracters
5.3 Item Analysis and Criterion – Referenced Mastery Tests.
5.3.1 Item Difficulty
5.3.2 Item Discriminating Power
5.3.3 Analysis of Criterion- Referenced Mastery Items
5.3.4 Effectiveness of Distracters
5.4 Building a Test Item File (Item Bank)
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In this unit you will learn how to judge the quality of a classroom test.
Specifically, you will learn about item analysis – purpose and uses. Furthermore you
will learn the process of item analysis for Norm-referenced classroom test and the
computations involved. In addition you will learn item analysis of Criterion-
referenced mastery items. Finally, you will learn about building a test item file.

184
EDU 423 MEASUREMENT AND EVALUATION

2.0 OBJECTIVES

By the end of this unit you will be able to:

 define and differentiate distinctively between item difficulty, item


discrimination and the distraction power of an option recognize the need for
item analysis, its place and importance in test development;
 conduct item analysis of a classroom test;
 calculate the value of each item parameter for different types of items; and
 appraise an item based on the results of item analysis.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Appraisal: The act of examining someone or something in order to judge


their qualities, success.

 Remediation: Proffering solution or correction of something bad or


defective.

 Learning deficiencies: These are neurologically-based processing problems.


These processing problems can interfere with learning basic skills such as
reading, writing. They can also interfere with higher level skills such as
organization, time planning, abstract reasoning, long or short term memory
and attention.

 Modification: Changing something from one phase to another.

 Psychometric: This is a field of study concerned with the theory and


technique of psychological measurement.

5.0 MAIN CONTENT

5.1 Judging the Quality of a Classroom Test

The administration and scoring of a classroom test is closely followed by the


appraisal of the result of the test. This is done to obtain evidence concerning the
quality of the test that was used such as identifying some of the defective items.
This helps to better appreciate the careful planning and hard work that went into the
185
EDU 423 MEASUREMENT AND EVALUATION

preparation of the test. Moreover, the identified effective test items are used to build
up a file of high quality items (usually called question bank) for future use.

5.1.1 Item Analysis

Item analysis is the process of “testing the item” to ascertain specifically whether
the item is functioning properly in measuring what the entire test is measuring. As
already mentioned, item analysis begins after the test has been administered and
scored. It involves detailed and systematic examination of the testees‟ responses to
each item to determine the difficulty level and discriminating power of the item.
This also includes determining the effectiveness of each option. The decision on the
quality of an item depends on the purpose for which the test is designed. However,
for an item to effectively measure what the entire test is measuring and provide
valid and useful information, it should not be too easy or too difficult. Moreover, its
options should discriminate validity between high and low performing learners in
the class.

5.1.2 Purpose and Uses of Item Analysis

Item analysis is usually designed to help determine whether an item functions as


intended with respect to discriminating between high and low achievers in a norm-
referenced test, and measuring the effects of the instruction in a criterion referenced
test items. It is also a means of determining items having desirable qualities of a
measuring instrument, those that need revision for future use and even for
identifying deficiencies in the teaching/learning process. In addition, item analysis
has other useful benefits amongst which are providing data on which to base
discussion of the test results, remediation of learning deficiencies and subsequent
improvement of classroom instruction. Moreover, the item analysis procedures
provide a basis for increase skill in test construction.

5.2 The Process of Item Analysis for Norm– Referenced Classroom Test

The method for analyzing the effectiveness of test items differs for Norm-referenced
and Criterion–referenced test items. This is because they serve different functions.
In Norm-referenced test, special emphasis is placed on item difficulty and item
discriminating power. The process of item analysis begins after the test has been
administered (or trial tested), scored and recorded. For most Norm – referenced
classroom tests, a simplified form of item analysis is used.

The process of Item Analysis is carried out by using two contracting test groups
composed from the upper and lower 25% or 27% of the testees on which the items
are administered or trial tested. The upper and lower 25% is the optimum point at
which balance is obtained between the sensitivity of the groups in making adequate
differentiation and reliability of the results for a normal distribution. On the other
hand, the upper and lower 27% when used are better estimate of the actual
discrimination value. They are significantly different and the middle values do not
discriminate sufficiently. In other to get the groups, the graded test papers are
186
EDU 423 MEASUREMENT AND EVALUATION

arranged from the highest score to the lowest score in a descending order. The best
25% or 27% are picked from the top and the poorest 25% or 27% from the bottom
while the middle test papers are discarded.

To illustrate the method of item analysis using an example with a class of 40


learners taking a 10 item test that have been administered and scored, and using
25% test groups. The item analysis procedure might follow this basic step.

i. Arrange the 40 test papers by ranking them in order, from the highest to the
lowest score.
ii. Select the best 10 papers (upper 25% of 40 testees) with the highest total
scores and the least 10 papers (lower 25% of 40 testees) with the lowest total
scores.
iii. Drop the middle 20 papers (the remaining 50% of the 40 testees) because
they will no longer be needed in the analysis.
iv. Draw a table as shown in table 3.1 in readiness for the tallying of responses
for item analysis.
v. For each of the 10 test items, tabulate the number of testees in the upper and
lower groups who got the answer right or who selected each alternative (for
multiple choice items).
vi. Compute the difficulty of each item (percentage of testees who got the item
right).
vii. Compute the discriminating power of each item (difference between the
number of testees in the upper and lower groups who got the item right).
viii. Evaluate the effectiveness of the distracters in each item (attractiveness of the
incorrect alternatives) for multiple choice test items.

Item Alternatives with Correct Total P- D- Option Distracter Index


No. Testees Option Starred Value Value
A B C D E Omit A B C D
E
1 Upper 25% 0 10* 0 0 0 0 10 0∙70 0∙60 0∙20 * 0∙10 0∙30
Lower 25% 2 4* 1 3 0 0 10 0∙00
2 Upper 25% 1 1 0 7* 1 0 10
Lower 25% 1 2 1 4* 1 1 10
3 Upper 25% 3 0 1 2 4* 0 10
Lower 25% 1 0 1 3 5* 0 10
4 Upper 25% 0 0 10* 0 0 0 10
Lower 25% 0 0 10* 0 0 0 10
5 Upper 25% 2 3 3 1 1* 0 10
Lower 25% 3 3 1 2 1* 0 10



10 Upper 25% 6* 1 1 1 0 1 10
Lower 25% 3* 2 2 2 1 0 10

Table 3.1 Format for Tallying Responses for Item Analysis

187
EDU 423 MEASUREMENT AND EVALUATION

5.2.1 Computing Item Difficulty

The difficulty index P for each of the items is obtained by using the formula:
Item Difficulty (P) =Number of testees who got item right (T)

Total number of testees responding to item (N)

i.e. P = T/N

Thus for item I in table 3.1,


P = 14 = 0.7
20
The item difficult indicates the percentage of testees who got the item right in the
two groups used for the analysis. That is 0.7 x 100% = 70%.

5.2.2 Computing Item Discriminating Power (D)

Item discrimination power is an index which indicates how well an item is able to
distinguish between the high achievers and low achievers given what the test is
measuring. That is, it refers to the degree to which it discriminates between testees
with high and low achievements. It is obtained from this formula:

Number of high scorers who Number of low scorers who


Item Discrimination Power (D) = got items right (H) - got item right (L)
Total Number in each group (n)

That is,
D = H–L
n
Hence for item 1 in table 3.1, the item discriminating power D is obtained thus:
D = H-L = 10-4 = 6 = 0∙60
n 10 10

Item discrimination values range from – 1∙00 to + 1∙00. The higher the
discriminating index, the better is an item in differentiating between high and low
achievers.

Usually, if item discriminating power is a:

 positive value when a larger proportion of those in the high scoring group get
the item right compared to those in the low scoring group.
 negative value when more testees in the lower group than in the upper group
get the item right.
 zero value when an equal number of testees in both groups get the item right;
and
 1.00 when all testees in the upper group get the item right and all the testees in
the lower group get the item wrong.
188
EDU 423 MEASUREMENT AND EVALUATION

5.2.3 Evaluating the Effectiveness of Distracters

The distraction power of a distractor is its ability to differentiate between those who
do not know and those who know what the item is measuring. That is, a good
distracter attracts more testees from the lower group than the upper group. The
distraction power or the effectiveness of each distractor (incorrect option) for each
item could be obtained using the formula:

Number of low scorers who Number of high scorers who


Option Distractor Power (Do) = marked option (L) - marked option (H)
Total Number in each group (n)

That is,
Do = L - H
n

For item 1 of table 3.1 effectiveness of the distracters are:

For option A: Do = L – H = 2 - 0 = 0∙20


n 10

B: The correct option starred (*)

C: Do = L – H = 1 - 0 = 0∙10
n 10

D: Do = L – H = 3 - 0 = 0∙30
n 10

E: Do = L – H = 0- 0 = 0∙00
n 10

Incorrect options with positive distraction power are good distracters while one with
negative distracter must be changed or revised and those with zero should be
improved on because they are not good. Hence, they failed to distract the low
achievers.

5.3 Item Analysis and Criterion – Referenced Mastery Tests

The item analysis procedures we used earlier for norm – referenced tests are not
directly applicable to criterion – referenced mastery tests. In this case, indexes of
item difficulty and item discriminating power are less meaningful because criterion
referenced tests are designed to describe learners in terms of the types of learning
tasks they can perform unlike in the norm-referenced test where reliable ranking of
testees is desired.

189
EDU 423 MEASUREMENT AND EVALUATION

5.3.1 Item Difficulty

In the Criterion–Referenced Mastery Tests, the desired level of item difficulty of


each test item is determined by the learning outcome it is designed to measure and
not as earlier stated on the items ability to discriminate between high and low
achievers. However, the standard formula for determining item difficulty can be
applied here but the results are not usually used to select test items or to manipulate
item difficulty. Rather, the result is used for diagnostic purposes. Also most items
will have a larger difficulty index when the instruction is effective with large
percentage of the testees passing the test.

5.3.2 Item Discriminating Power

As you know, the ability of test items to discriminate between high and low
achievers are not crucial to evaluating the effectiveness of criterion –referenced tests
this is because some of the best items might have low or zero indexes of
discrimination. This usually occurs when all testees answer a test item correctly at
the end of the teaching learning process implying that both the teaching/learning
process and the item are effective. Moreover, they provide useful information
concerning the mastery of items by the testees unlike in the norm-referenced test
where they would be eliminated for failing to eliminate between the high and the
low achievers. Therefore, the traditional indexes of discriminating power are of little
value for judging the test items quality since the purpose and emphasis of criterion –
referenced test is to describe what learners can do rather than to discriminate among
them.

5.3.3 Analysis of Criterion - Referenced Mastery Items

Ideally, a Criterion–Referenced Mastery Test is analyzed to determine extent to


which the test items measure the effects of the instruction. In other to provide such
evidence, the same test items is given before instruction (pretest) and after
instruction (posttest) and the results of the same test pre-and-post administered are
compared. The analysis is done by the use of item response chart. The item response
chart is prepared by listing the numbers of items across the top of the chart and the
testees names / identification numbers down the side of the chart and the record
correct (+) and incorrect (-) responses for each testee on the pretest (B) and the
posttest (A). This is illustrated in Table 3.2 for an arbitrary 10 testees.

190
EDU 423 MEASUREMENT AND EVALUATION

Table 3.2: An item – response chart showing correct (+) and incorrect (-) responses
for pretest and post test given before (B) and after (A) instructions (Teaching /
learning process) respectively.

Item Testee Identification Number Remark


001 002 003 004 005 … 010
Pretest (B) - - - - - … - Ideal
Posttest (A) + + + + + … +
Pretest (B) + + + + + … + Too easy
Posttest (A) + + + + + … +
Pretest (B) - - - - - … - Too
Posttest (A) - - - - - … - difficult
Pretest (B) + + + + + … + Defective
Posttest (A) - - - - - … -
Pretest (B) - + - - + … - Effective
Posttest (A) + + + + + … -

An index of item effectiveness for each item is obtained by using the formula for a
measure of Sensitivity to Instructional Effects (S) given by

S = RA– RB
T
Where
RA = Number of testees who got the item right after the teaching / learning process.

RB = Number of testees who got the item right before the teaching / learning process.

T = Total number of testees who tried the item both times.


For example, item 1 of table 3.2, the index of sensitivity to instructional
effect (S) is
S = RA– RB = 10-0 =1∙00
T 10

Usually for a criterion-referenced mastery test with respect to the index of


sensitivity to instructional effect,
 an ideal item yields a value of 1.00.
 effective items fall between 0.00 and 1.00, the higher the positive value, the
more sensitive the item to instructional effects; and
 items with zero and negative values do not reflect the intended effects of
instruction.

5.3.4 Effectiveness of Distracters

In a criterion-referenced test, it is important to note how well each alternative


function in a multiple – choice item. Ideally, testees should choose one of the
incorrect alternatives if they have not achieved the objective that the test item
measures. This is done by checking the frequency with which those failing an item
191
EDU 423 MEASUREMENT AND EVALUATION

select each distracter. This type of analysis is best done on the pretest in which a
relatively large proportion of pupils can be expected to fail the items. However,
items containing distracters that are not selected at all or rarely selected need to be
revised

5.4 Building a Test Item File (Item Bank)

This entails a gradual collection and compilation of items administered, analyzed


and selected based on their effectiveness and psychometric characteristics identified
through the procedure of item analysis over time. This file of effective items can be
built and maintained easily by recording them on item card, adding item analysis
information indicating both objective and content area the item measures and can be
maintained on both content and objective categories. This makes it possible to
select items in accordance with any table of specifications in the particular area
covered by the file. Building item file is a gradual process that progresses over
time. At first it seems to be additional work without immediate usefulness. But
with time its usefulness becomes obvious when it becomes possible to start using
some of the items in the file and supplementing them with other newly constructed
ones. As the file grows into item bank most of the items can then be selected from
the bank without frequent repetition. Some of the advantages of item bank are that:

 Parallel test can be generated from the bank which would allow learners who
were ill for a test or due to some other reasons were unavoidable absent to
take up the test later;
 They are cost effective since new questions do not have to be generated at the
same rate from year to year;
 The quality of items gradually improves with modification of the existing
ones with time; and
 The burden of test preparation is considerably lightened when enough high
quality items have been assembled in the item bank.

6.0 ACTIVITY

1. What is item analysis?


2. What are the three indices involved in tem analysis?
3. Explain the three indices stated in two above.

7.0 SUMMARY

 Item analysis is the process of “testing the item” to ascertain specifically,


whether the item is functioning properly in measuring what the entire test is
measuring.
 The method for analyzing the effectiveness of test items differs for norm-
referenced and criterion-referenced test items because they serve different
functions.

192
EDU 423 MEASUREMENT AND EVALUATION

 Item difficulty – The item difficulty (P) indicates the percentage of testees
who get the item right.
 Item discriminating power (D) - This is an index which indicates how well an
item is able to distinguish between the high achievers and low achievers
given what the test is measuring.
 The Effectiveness of Distracters - The distraction power of a distracter is its
ability to differentiate between those who do not know and those who know
what the item is measuring.
 In the criterion – referenced mastery test the desired level of item difficulty
of each test item is determined by the learning outcome it is designed to
measure and not on the items ability to discriminate between high and low
achievers.
 A criterion-referenced mastery test is analyzed to determine the extent to
which the test items measure the effects of the instruction. In doing this the
same test is given before (pretest) and after instruction (posttest) and the
results are compared using the item response chart.

8.0 ASSIGNMENT

1. What are the basic questions required in judging the quality of a test?
2. How can norm-referenced and criterion-referenced test are interpreted?
3. Explain the following in relationship to item analysis:
a) Floor effect b) Ceiling effect

9.0 REFERENCES

Gronhund, N. E (1985). Measurement and Evaluation in Teaching. New York.


Macmillan Publishing Company.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar University of Calabar.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

193
EDU 423 MEASUREMENT AND EVALUATION

UNIT 2 INTERPRETING CLASSROOM TEST SCORES

CONTENT
1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Methods of Interpreting Test Scores
5.1.1 Criterion – Referenced Interpretation
5.1.2 Norm – Referenced Interpretation
5.2 Norms – Most Common Types of Test Norm
5.2.1 Grade Norms
5.2.2 Age Norms
5.2.3 Percentile Norms
5.3 Standard Score Norms
5.3.1 The Normal Curve and the Standard Deviation Units
5.3.2 The Z – Scores
5.3.3 The T – Scores
5.3.4 Stanine Norms
3.3.5 Assigning Stanine to Raw Scores
5.4 Comparison of the Score System
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION
In the previous unit, you learned how to judge the quality of classroom test. In this
unit, you will learn how to interpret test scores. You will start by learning the
methods of interpreting test scores such as the criterion-referenced interpretation
and norm-referenced interpretation. Furthermore, you will learn about norms and
the most common types of test norms which are grade norms, age norms and
percentile norms. In addition you will learn about the standard scores which
comprise the standard deviation and the normal curve, the Z- score, the T – score
and the stanines. Finally, you will learn how to assign stanine to raw scores and how
to compare scores.
2.0 OBJECTIVES
By the end of this unit, you will be able to:

 interpret classroom test scores by criterion-referenced or norm-referenced


 use the common types of test norms such as grade norms, age norms,
percentile norms in interpreting classroom test scores;
 convert raw scores to z-scores, T-scores and stanines scores
194
EDU 423 MEASUREMENT AND EVALUATION

 convert from one standard score to another; and


 use the standard score in interpreting test scores.

3.0 HOW TO STUDY THIS UNIT


1. Read through this unit carefully.
2. Study the unit step by step as the points are well arranged.
NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.
4.0 WORD STUDY
 Testee: A person who is tested, as by a scholastic examination.
 True zero point: If a level of measurement has a true zero point, then a
value of 0 means you have nothing.
 Psychological decisions: The thought process of selecting a logical choice
from the available options.
 Missing property: Psychological construct intentionally left out by the
testee.
 Specific tasks: A usually assigned piece of work often to be finished within a
certain time frame.
 Percentage-correct score: The average of a set of scores obtained by adding
all scores in the distribution.
 Standardized: To evaluate by comparing with a standard.

5.0 MAIN CONTENT


5.1 Methods of Interpreting Test Scores
Test interpretation is a process of assigning meaning and usefulness to the scores
obtained from classroom test. This is necessary because the raw score obtained
from a test standing on itself rarely has meaning. For instance, a score of 50% in one
mathematics test cannot be said to be better than a score of 40% obtained by same
testee in another mathematics test. The test scores on their own lack a true zero
point and equal units. Moreover, they are not based on the same standard of
measurement and as such meaning cannot be read into the scores on the basis of
which academic and psychological decisions may be taken. To compensate for
these missing properties and to make test scores more readily interpretable various
methods of expressing test scores have been devised to give meaning to a raw score.
Generally, a score is given meaning by either converting it into a description of the
specific tasks that the learner can perform or by converting it into some type of
derived score that indicates the learner‟s relative position in a clearly defined
reference group. The former method of interpretation is referred to as criterion –
referenced interpretation while the later is referred to as Norm – referenced
interpretation.
195
EDU 423 MEASUREMENT AND EVALUATION

5.1.1 Criterion – Referenced Interpretation

Criterion - referenced interpretation is the interpretation of test raw score based on


the conversion of the raw score into a description of the specific tasks that the
learner can perform. That is, a score is given meaning by comparing it with the
standard of performance that is set before the test is given. It permits the description
of a learner‟s test performance without referring to the performance of others. This
is essentially done in terms of some universally understood measure of proficiency
like speed, precision or the percentage correct score in some clearly defined domain
of learning tasks. Examples of criterion- referenced interpretation are:
 Types 60 words per minute without error.
 Measures the room temperature within + 0∙1 degree of accuracy (precision).
 Defines 75% of the elementary concepts of electricity items correctly
(percentage-correct score).
Such interpretation is appropriate for tests that are focused on a single objective and
for which standards of performance can be either empirically or logically derived.
The percentage-correct score is widely used in criterion-referenced interpretation.
This type of interpretation is primarily useful in mastery testing where a clearly
defined and delimited domain of learning tasks can be most readily obtained.
For Criterion-referenced test to be meaningful, the test has to be specifically
designed to measure a set of clearly stated learning tasks. Therefore, in order to be
able to describe test performance in terms of a learner‟s mastery or non mastery of
the predefined, delimited and clearly specified task, enough items are used for each
interpretation to enable dependable and informed decisions concerning the types of
tasks a learner can perform.

5.1.2 Norm-Referenced Interpretation


Norm-referenced interpretation is the interpretation of raw score based on the
conversion of the raw score into some type of derived score that indicates the
learner‟s relative position in a clearly defined referenced group. This type of
interpretation reveals how a learner compares with other learners who have taken
the same test.
Norm – referenced interpretation is usually used in the classroom test interpretation
by ranking the testees raw scores from highest to lowest scores. It is then
interpreted by noting the position of an individual‟s score relative to that of other
testees in the classroom test. The interpretation such as third position from highest
position or about average position in the class provides a meaningful report for the
teacher and the testees on which to base decision. In this type of test score
interpretation, what is important is a sufficient spread of test scores to provide
reliable ranking. The percentage score or the relative easy / difficult nature of the
test is not necessarily important in the interpretation of test scores in terms of
relative performance.

196
EDU 423 MEASUREMENT AND EVALUATION

5.2 Norms – Most Common Types of Test Norm

Norms are reference frames on which interpretation of test scores are based. They
represent the typical performance of the testees in the reference frame on which the
test raw scores were standardized by the administration of the test to representative
sample of the testees for whom the test was designed. The resulting test norm
merely represents the typical performance of the participants and as such is not to be
seen as desired goals or standards. Nevertheless, the comparisons of the test scores
with these reference frames make it possible to predict a learner‟s probable success
in various areas like the diagnosis of strength and weakness, measuring educational
growth and the use of the test results for other instructional guidance purposes.
These functions of test scores derived from norm-referenced interpretation would
have been limited without test norms.
The most common types of these reference frames referred to as norms are grade
norms, age norms, percentile norms and standard score norms. These are informed
by the general ways in which we may relate a person‟s score to a more general
framework. One of these ways is to compare the individual learner with a graded
series of groups and see which one he matches. Usually each group in this series
represents a particular school grade or a particular chronological age. The other
way is to determine where the individual learner falls in terms of the percentage of
the group he surpasses or in terms of the groups mean and standard deviation.
Hence they form main patterns for interpreting the score of an individual. A
summary of these most common types of test norms is presented schematically in
table 3.1

Table 3.1: Most Common Types of Test Norms for Education and Psychological
Tests.
Type of Test Name of Type of Comparison Type of Group Meaning in Terms of Test
Norm Derived Score Performance
Grade Norms Grade Individual matched Successive grade Grade group in which the
equivalents to group whose groups testee‟s raw score is
performance he average
equals
Age Norms Age equivalents Individual matched Successive age Age group in which the
to group whose group testee‟s raw score is
performance he average
equals
Percentile Percentile ranks Percentage of group Single age or Percentage of testees in the
Norms (or percentile surpassed by grade group to reference group who fall
scores) individual which individual below the testee‟s raw
belongs score
Standard Standard Scores Number of standard Single age or Distance of testee‟s raw
Score Norms deviations individual grade group to score above or below the
falls above or below which individual mean of the reference
average of group belongs group in terms of standard
deviation units.

197
EDU 423 MEASUREMENT AND EVALUATION

5.2.1 Grade Norms


Grade norms are reference framework for interpreting the academic achievement of
learners in the elementary schools. They represent the typical (average)
performances of specific groups. Grades are like classes in our school system.
Grades norms are prepared for traits that show a progressive and relatively uniform
increase from one school grade (class) to the next higher grade. The norm for any
grade is then the average score obtained by individuals in that grade. The process of
establishing grade norms involves giving the test to a representative sample of
pupils in each of a number of consecutive grades to evaluate the (mean)
performance of individuals in their respective specific grades (classes) of the school
system. This is achieved by limiting the content and objectives of the test to the
content and objectives relevant to the class. The mean score obtained becomes the
typical score of the grade (class) against which the performance of the members of
the grade can be compared and are interpreted in terms of grade equivalents.

Grade equivalents are expressed in range of 10 consisting of two numbers. The first
indicates the year and the second the month. For example, if a test is given in
October to grade (class) 2 pupils in a school for a calendar year that runs September
to June, the average (mean) of the raw scores obtained by the representative sample
of these grades will be assigned a grade equivalent of 2.2. This is because the class
is 2 and is obtained in the second month of the school 10 month year. The grade in
this case ranges from 2∙0 to 2∙9.

If the same test is given to grades (classes) 3, 4, 5 and 6 respectively at same period,
the grade equivalents will be 3∙2, 4∙2, 5∙2 and 6∙2 respectively. Assuming the
average scores of these grades are respectively 20, 23, 26, 28, 30. These means
scores then become the typical scores of the grades (classes) respectively against
which the performance of members of the grades (classes) can be compared and
interpreted. The table of grade equivalent in this case will look like this:

Grade Typical Score Grade Equivalent


2 20 2.2
3 23 3.2
4 26 4.2
5 28 5.2
6 30 6.2

The grade equivalents and the typical scores indicate the average performance of
pupils at the various grade levels. The in between scores are obtained by
interpretation. If the test is given to learners in the same grade at any other time,
their scores would be compared with the table and the grade equivalent would be
read from the table on the basis of which result would be interpreted.

You will however note that for any particular grade equivalent typical score 50% of
the samples used are above while 50% are below since the typical score is an
average of their performances. Hence, grade norms have these limitations.
198
EDU 423 MEASUREMENT AND EVALUATION

 The units are not equal on different parts of the scale from one test to
another.
 The norm depends on
- the ability of the pupils used in the preparation;
- the extent to which the learning outcomes measured by test reflect the
curriculum emphasis (content and objective).
- the educational facilities at the users disposal.

Based on these limitations, grade norm is not to be considered as a standard of


excellence to be achieved by others. However, they are widely used at the
elementary school level largely because of the apparent ease with which they can be
interpreted.

5.2.2 Age Norms

Age Norms like the grade norms are based on the average scores earned by pupils at
different ages and are interpreted in terms of age equivalents. A test may be
prepared with a certain age or range of ages of testees in mind. Performance typical
for specific ages in these tests and determined and the performance of an individual
of the age group is compared against the typical performance. In this case, also the
typical performance is determined by giving the test to a very large sample of the
population for whom the test is meant and then after scoring, find the mean score of
the large sample. This mean score then becomes the typical performance for the
age. The performance of a pupil of this age in the test could then be interpreted to
be higher or lower than or the same as the performance for the age. Table of age
norms presents parallel columns of typical scores and their corresponding age
equivalents.
Age Norms have essentially same characteristic and limitations as do the grade
norms. The major differences between them are that:

 test performance is expressed in forms of age level rather than grade level;
and
 age equivalent divide the calendar year into twelve parts rather than ten.

That is for example, the age equivalent for 12 years old ranges from 12∙0 to 12∙12
unlike in grade norm where it ranges from 12∙0 to 12∙9.

Like grade norms, age norms present test performance in units that are
characteristically unequal although apparently easy to understand and are at such
subject to misinterpretation especially at high school level. Age norms are mostly
used in elementary schools in areas of mental ability test, personality test, reading
test and interest inventories where growth pattern tend to be consistent.

199
EDU 423 MEASUREMENT AND EVALUATION

5.2.3 Percentile Norms

Percentile Norms are test norms that deal with percentile ranks or scores. They are
used for comparison of percentage of group surpassed by individual in the single
age or grade group to which individual belongs. Percentile norms are very widely
adaptable and applicable and can be used whenever an appropriate normative group
can be obtained to serve as a yardstick. They are appropriate for young learners in
educational situations.

A percentile rank (or percentile score) describes a learner‟s performance in terms of


the percentage of learners in some clearly defined group that earn a lower score.
This might be a grade or age group or any other group that provides a meaningful
comparison. That is, a percentile rank indicates a learner‟s relative position in a
group in terms of the percentage of learners scoring lower. For instance, tables of
norms with a learner‟s raw score of 29 equaling a percentile rank of 70 means that
70 percent of the learners in the reference group obtained a raw score lower than 29.
In other words, this learner‟s performance surpasses that of 70 percent of the group.
To surpass 90 percent of a reference comparison group signifies a comparable
degree of excellence no matter the function being measured.

The limitation of percentile rank is that the units are typical and symmetrically
unequal. This means that equal percentile differences do not in general represent
equal differences in amount. Therefore, any interpretation of percentile ranks must
take into account the fact that such a scale has been pulled out at both ends and
squeezed in the middle. Hence, they must be interpreted in terms of the norm group
on which they are based. The limitation of unequal units can be offset by careful
interpretation since the inequality of units follows a predictable pattern.

5.3 Standard Score

Standard score is a method of indicating a testee‟s relative position in a group by


showing how far the raw score is above or below average. Standard scores express
test performance in terms of standard deviation units from the mean. The mean (M)
as you know is the arithmetic average obtained by adding all the scores and dividing
by the number of scores while the standard deviation (SD) is a measure of the
spread of scores in a group. The meaning of standard deviation and the standard
scores based on it is explained in terms of the normal curve.

5.3.1 The Normal Curve and the Standard Deviation Unit

The normal curve is a symmetrical bell-shaped curve that has many useful
mathematical properties. When it is divided into standard deviation units, each
portion under the curve contains a fixed percentage of cases under consideration.
This division is very useful and is utilized in test interpretation. Usually, 34 percent
of the cases under consideration fall between the mean and +1SD, 14 percent
between +1SD and +2SD and 2 percent between +2SD and 3SD. These same
proportions apply to the standard deviation intervals below the mean. About 0.13
200
EDU 423 MEASUREMENT AND EVALUATION

percent of the cases fall below -3SD or above +3SD and are usually neglected in
practice. That is, for all practical purposes, a normal distribution of scores falls
between – 3 and +3 standard deviations from the mean. This is illustrated in figure 1
below.

0∙13% 2% 14% 34% 34% 14% 2% 0∙13%

Mean

-4SD -3SD -2SD -1SD 0 +1SD +2 SD +3SD + 4SD

Figure 1: Normal curve indicating the approximate percentage of cases falling


within each standard deviation interval.

Standard deviation enables the conversion of raw scores to a common scale that has
equal units and that can be interpreted in terms of the normal curve.

The following characteristics possessed by normal curve makes it useful in test


interpretation. These are:

 It provides a handy bench mark for interpreting scores and the standard error
of measurement as both are based on standard deviation units.

 The fixed percentage in each interval makes it possible to convert standard


deviation units to percentile ranks. For instance, -2SD equals a percentile
rank of 2. This means that 2 percent of the cases under consideration fall
below that point. Likewise, each point on the base line of the curve can be
equated to percentile ranks starting from the left of the figure as:

-2SD = 2%
-1SD = 16% (2+14)
0 (M) = 50% (16+34)
+1SD = 84% (50 +34)
+2SD = 98% (84+14)

The relationship for a normal curve between standard deviation units and percentile
ranks makes it possible to interpret standard scores in simple and familiar terms.

5.3.2 The Z – Scores

The Z – score is the simple standard score which expresses test performance simply
and directly as the number of standard deviation units a raw score is above or below
the mean. The Z-score is computed by using the formula.

Z – Score = X-M
SD
201
EDU 423 MEASUREMENT AND EVALUATION

Where
X = any raw score
M = arithmetic mean of the raw scores
SD = standard deviation

When the raw score is smaller than the mean the Z -score results in a negative (-)
value which can cause a serious problem if not well noted in test interpretation.
Hence Z-scores are transformed into a standard score system that utilizes only
positive values.

5.3.3 The T - Score

This refers to any set of normally distributed standard scores that has a mean score
of 50 and a standard deviation of 10. The T – score is obtained by multiplying the Z-
score by 10 and adding the product to 50. That is, T – Score = 50 + 10(z).

Example

A test has a mean score of 40 and a standard deviation of 4. What are the T – scores
of two testees who obtained raw scores of 30 and 45 respectively in the test?

Solution

The first step in finding the T-scores is to obtain the z-scores for the testees. The z-
scores would then be converted to the T – scores. In the example above, the z –
scores are:

i. For the testee with raw score of 30, the Z – score is:
Z – Score = X – M, where the symbols retain their usual meanings.
SD
X = 30, M = 40, SD = 4. Substituting gives
Z – Score = 30 - 40 = -10 = -2∙5
4 4
The T - Score is then obtained by converting the Z – Score (-2∙5) to T – score
thus:
T – Score = 50 + 10 (z)
= 50 + 10 (-2∙5)
= 50 – 25
= 25

ii. For the testee with raw score of 45, the z – score is:
Z – Score = X – M, where the symbols retain their usual meanings.
SD
X = 45, M = 40, SD = 4. Substituting gives
Z – Score = 45 - 40 = -15 = -3∙75
4 4
The T-Score conversion is:
202
EDU 423 MEASUREMENT AND EVALUATION

T – Score = 50 + 10 (z)
= 50 + 10 (3∙75)
= 50 + 37.5
= 87∙5

The resulting T-Score are easily interpreted since T – Score always have a mean of
50 and a standard deviation of 10.

5.3.3 Stanine Norm

This is a single – digit standard scores which assumes that human beings are
normally distributed in their possession of most psychological characteristics. This
system of score is so named because the distribution of raw scores is divided into
nine parts (standard nine). The stanine, which is a kind of standard score which
divides a population according to some fixed proportions into nine parts, numbered
1 to 9 has a mean score of 5 and a standard deviation of 2. Each stanine corresponds
to a score or a range of scores. Each individual‟s score falls within a stanine and
such score can be described by reference to the stanine within which if falls.
Stanines are widely used for local norms because of the ease with which they can be
computed and interpreted. The strength of stanine norms include:

i. The use of nine point scale in which 9 is high, 1 is low and 5 is average or 1
is high, 9 is low and 5 is average. This later usage is employed in the
Ordinary Level School Certificate Examination Result. This is illustrated
below:

Stanine Letter Grade Remark


1 A1 Excellent
2 A2 Very Good
3 A3 Good
4 C4 Credit
5 C5 Credit
6 C6 Credit
7 P7 Pass
8 P8 Pass
9 F9 Fail

ii. Stanines are normalized standard scores. It is therefore possible to compare


an individual‟s performance on different tests especially when the tests are
based on a common group. An example is the comparison of West African
Senior School Certificate results with the National Examination Council
Senior School Certificate results for a group of examinees. A difference of 2
stanine represents a significant difference in test performance between tests.
iii. The system makes it possible to readily combine diverse types of data.
iv. It uses a single-digit system which makes it easy to record.
v. It also takes up less space than other scores.

203
EDU 423 MEASUREMENT AND EVALUATION

The limitation of stanine is that it plays down small differences in scores and
expresses performance in broad categories so that attention tends to be focused in
differences that are big enough to make a difference.

5.3.4 Assigning Stanine to Raw Scores

To assign stanines to raw scores, scores are ranked from high to low and frequency
distribution table showing the cumulative frequency column is used in the
construction of the stanine table. The following guide is then employed to assign
stanines to the raw score.

Guide for Assigning Stanines to Raw Scores

 Top 4% of the raw score are assigned a stanine score of 9.


 Next 7% of the raw score are assigned a stanine score of 8.
 Next 12% of the raw score are assigned a stanine score of 7.
 Next 17% of the raw score are assigned a stanine score of 6.
 Next 20% of the raw score are assigned a stanine score of 5.
 Next 17% of the raw score are assigned a stanine score of 4.
 Next 12% of the raw score are assigned a stanine score of 3.
 Next 7% of the raw score are assigned a stanine score of 2.
 Next 4% of the raw score are assigned a stanine score of 1.

The number of examinees who should receive each stanine score is determined by
multiplying the number of cases in the stanine level and rounding off the results.
Usually, the distribution of test scores contains a number of examinees with the
same raw score. Consequently, there are ties in rank that prevent obtaining of a
perfect match with the theoretical distribution. Thus all examinees with the same
raw score must be assigned the same stanine score. Hence, the actual grouping is
approximated as closely as possible to the theoretical grouping. The relationship of
stanine units to percentiles is illustrated below.
Stanine Percentile Range
9 96 – 99
8 89 – 95
7 77 – 88
6 60 – 76
5 41 – 59
4 24 – 40
3 12 – 23
2 5 - 11
1 1- 4

5.4 Comparison of the Score Systems

A normal distribution of scores makes it possible to convert back and forth between
standard scores and percentiles, thereby utilizing the special advantages of each.
204
EDU 423 MEASUREMENT AND EVALUATION

Standard scores can be used to draw on the benefits of equal units and it is also
possible to convert to percentile equivalents when interpreting test performances to
the testees, their parents and others who may need the information from the test
results. Figure 2 illustrates the equivalence of scores in various standard score
systems and their relation to percentiles and the normal curve.

0∙13% 2% 14% 34% 34% 14% 2% 0·13


%
-4 -3 -2 -1 0 +1 +2 +3 +4
Standard Deviation
-4∙0 -3∙0 -2∙0 -1∙0 0 +1∙0 +2∙0 +3∙0 +4∙ 0
Z-Scores
0∙1 2 16 50 84 98 99∙9
Percentiles
20 30 40 50 60 70 80
T-Scores
1 2 3 4 5 6 7 8 9
Stan ines
4% 7% 12% 17% 20% 17% 12% 7% 4%
Percent in Stan ines

Figure 2: A normal distribution curve showing corresponding standard scores and


percentiles.

6.0 ACTIVITY

1. How can norm-referenced and criterion-referenced test are interpreted?


2. Objective referenced and content referenced interpretation.
3. Self-referenced interpretation.

7.0 SUMMARY
 Test interpretation is a process of assigning meaning and usefulness to the
scores obtained from classroom test. This is because the test scores on their
own lack a true zero point and equal units.
 Criterion-referenced interpretation is the interpretation of test raw score
based on the conversion of the raw score into a description of the specific
tasks that the learner can perform.

205
EDU 423 MEASUREMENT AND EVALUATION

 Norm-referenced interpretation is the interpretation of raw score based on the


conversion of the raw score into some type of derived score that indicates the
learner‟s relative position in a clearly defined reference group.
 Norms are referenced frames on which interpretation of test scores are based.
They represent the typical performance of the testees in the reference frame
on which the test raw scores were standardized.
 Grade norms are reference frame work for interpreting the academic
achievement of learner‟s in the elementary schools. They represent the
typical performance of specific groups such as a class.
 Age norms like the grade norms are based on the average scores earned by
pupils of different ages and are interpreted in terms of age equivalents.
 Percentile norms are test norms that deal with percentile ranks or scores.
They are use for comparison of percentage of group to which individual
belongs.
 A percentile rank (or score) describes a learner‟s performance in terms of the
percentage of learners in some clearly defined group that earn lower scores.
 Standard score is a method of indicating a testee‟s relative position in a group
by showing how far the raw score is above or below average. Standard scores
express test performance in terms of standard deviation units from the means.

 The normal curve is a symmetrical bell-shaped curve that has many useful
mathematical properties utilized in test interpretation using standard scores.
 The Z-score is a simple standard score which expresses test performance
simply and directly as the number of standard deviation units a raw score is
above or below the mean.
 The T – score refers to any set of normally distributed standard scores that
has a means score of 50 and a standard deviation of 10.
 The stanine is a kind of standard score which divides a population according
to some proportions into nine parts numbered 1 to 9. It has a mean score of 5
and a standard deviation of 2. Each stanine corresponds to a score or a range
of scores. Each individual‟s score falls within a stanine and such score can be
described by reference to the stanine within which it falls. Stanines are
widely used for local norms because of the ease with which they can be
computed and interpreted.

8.0 ASSIGNMENT

1. State three characteristics of the normal curve.


2. What does it mean for a distribution to be skewed?
3. State the general formula for calculating skewness.

206
EDU 423 MEASUREMENT AND EVALUATION

9.0 REFERENCES
Gronhund, N. E (1985). Measurement and Evaluation in Teaching. New York:
Macmillan Publishing Company.

Kerlinger, F. N (1973). Foundations of Behavioural Research. New York: Holt,


Rinehart and Winston Inc.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

207
EDU 423 MEASUREMENT AND EVALUATION

UNIT 3 RELIABILITY OF A TEST

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Test Reliability
5.2 Types of Reliability Measurement
5.2.1 Test Retest Method – Measure of Stability
5.2.2 Equivalent Forms Method – Measure of
Equivalent
5.2.3 Spilt Half Method – Measure of Internal
Consistency
5.2.4 Kuder – Richardson Method – Measure of Internal Consistency
5.3 Factors that Influence Reliability Measures
5.3.1 Length of Test
5.3.2 Spread of Scores
5.3.3 Difficulty of Test
5.3.4 Objectivity
5.4 Methods of Estimating Reliability
6.0 Activity
7.0 Summary
8.0 Assignments
9.0 References

1.0 INTRODUCTION

In this unit, you will learn about test reliability and the methods of estimating
reliability. Specifically, you will learn about test retest method, equivalent form
method, split half method and Kuder Richardson Method.

Furthermore, you will learn about the factors influencing reliability measures such
as length of test, spread of scores, difficulty of test and objectivity. Finally you will
learn the methods of estimating reliability.

2.0 OBJECTIVES

By the end of this unit you will be able to:

 define reliability of a test;


 state the various forms of reliability;
 identify and explain the factors that influence reliability measures; and
 compare and contrast the different forms of estimating reliability.
208
EDU 423 MEASUREMENT AND EVALUATION

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit with care.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Consistency: Conformity in the application of something, typically that


which is necessary for the sake of logic, accuracy, or fairness.

 Variation: This is a change or difference in condition, amount, or level,


typically with certain limits (deviation from normality).

 Extraneous factors: These are undesirable variables that influence the


relationship between the variables that an experimenter is examining.

 Interval: The totality of points on a line between two designated points or


endpoints that may or may not be included.

5.0 MAIN CONTENT


5.1 Test Reliability

Reliability of a test may be defined as the degree to which a test is consistent, stable,
dependable or trustworthy in measuring what it is measuring. This definition
implies that the reliability of a test tries to answer questions like: How can we rely
on the results from a test? How dependable are scores from the test? How well are
the items in the test consistent in measuring whatever it is measuring? In general,
reliability of a test seeks to find if the ability of a set of testees are determined based
on testing them two different times using the same test, or using two parallel forms
of the same test, or using scores on the same test marked by two different
examiners, will the relative standing of the testees on each of the pair of scores
remain the same? Or, to what extent will their relative standing remain the same for
each of the pair of scores? Therefore, reliability refers to the accuracy, consistency
and stability with which a test measures whatever it is measuring. The more the pair
of scores observed for the same testee varies from each other, the less reliable the
measure is. The variation between this pair of scores is caused by numerous factors
other than influence test scores.

Such extraneous factors introduce a certain amount of error into all test scores.
Thus, methods of determine reliability are essential means of determining how
much error is present under different conditions. Hence, the more consistent our test
results are from one measurement to another, the less error there will be and
consequently, the greater the reliability.
209
EDU 423 MEASUREMENT AND EVALUATION

5.2 Types of Reliability Measures

There are different types of reliability measures. These measures are estimated by
different methods. The chief methods of estimating reliability measures are
illustrated in table 1 below.

Table 1: Methods of Estimating Reliability


Types of Reliability Measure Procedure
Test-retest Measure of stability Give the same test twice to the same group with
method any time interval between tests
Equivalent-forms Measure of equivalence Give two forms of the test to the same group in
methods close succession
Split – half Measure of internal consistency Give test once. Score two equivalent halves say
method odd and even number items, correct reliability
coefficient to fit whole test by Spearman-Brown
formula
Kuder- Measure of internal consistency Give test once. Score total test and apply Kuder-
Richardson Richardson formula
methods

5.2.1 Test Retest Method – Measure of Stability

Estimating reliability by means of test-retest method requires the same test to be


administered twice to the same group of learners with a given time interval between
the two administrations. The resulting test scores are correlated and the correlation
coefficient provides a measure of stability. How long the time interval should be
between tests is determined largely by the use to be made of the results. If the
results of both administrations of the test are highly stable, the testees whose scores
are high on one administration of the test will tend to score high on other
administration of the test while the other testees will tend to stay in the same relative
positions on both administration of the test. Such stability would be indicated by a
large correlation coefficient.

An important factor in interpreting measures of stability is the time interval between


tests. A short time interval such as a day or two inflates the consistency of the result
since the testees will remember some of their answers from the first test to the
second. On the other hand, if the time interval between tests is long about a year,
the results will be influenced by the instability of the testing procedure and by the
actual changes in the learners over a period of time. Generally, the longer the time
interval between test and retest, the more the results will be influenced by changes
in the learners‟ characteristics being measured and the smaller the reliability
coefficient will be.

210
EDU 423 MEASUREMENT AND EVALUATION

5.2.2 Equivalent Forms Method - Measure of Equivalence

To estimate reliability by means of equivalent or parallel form method involves the


use of two different but equivalent forms of the test. The two forms of the test are
administered to the same group of learners in close succession and the resulting test
scores are correlated. The resulted correlation coefficient provides a measure of
equivalence. That is, correlation coefficient indicates the degree to which both
forms of the test are measuring the same aspects of behaviour. This method reflects
the extent to which the test represents an adequate sample of the characteristics
being measured rather than the stability of the testee. It eliminates the problem of
selecting a proper time interval between tests as in test retest method but has the
need for two equivalent forms of the test. The need for equivalent forms of the test
restricts its use almost entirely to standardized testing where it is widely used.

5.2.3 Split-Half Method – Measure of Internal Consistency

This is a method of estimating the reliability of test scores by the means of single
administration of a single form of a test. The test is administered to a group of
testees and then is divided into two halves that are equivalent usually odd and even
number items for scoring purposes. The two split half produces two scores for each
testee which when correlated, provides a measure of internal consistency. The
coefficient indicates the degree to which equivalent results are obtained from the
two halves of the test. The reliability of the full test is usually obtained by applying
the Spearman-Brown formula.

That is, Reliability on full test = 2 x Reliability on ½ test


1 + Reliability on ½ test

The split-half method, like the equivalent forms method indicates the extent to
which the sample of test items is a dependable sample of the content being
measured. In this case, a high correlation between scores on the two halves of a test
denotes the equivalence of the two halves and consequently the adequacy of the
sampling. Also like the equivalent-forms method, it tells nothing about changes in
the individual from one time to another.

5.2.4 Kuder-Richardson Method – Measure of Internal Consistency

This is a method of estimating the reliability of test scores from a single


administration of a single form of a test by means of formulas such as those
developed by Kuder and Richardson. Like the spilt-half method, these formulas
provide a measure of internal consistency. However, it does not require splitting the
test in half for scoring purposes. Kuder-Richardson formula 20 which is one of the
formulas for estimating internal consistency is based on the proportion of persons
passing each item and the standard deviation of the total scores. The result of the
analysis using this formula is equal to all possible split-half coefficients for the
group tested. However, it is rarely used because the computation is rather
cumbersome unless information is already available concerning the proportion of
211
EDU 423 MEASUREMENT AND EVALUATION

each item. Kuder-Richardson formula 21, a less accurate but simpler formula to
compute can be applied to the results of any test that has been scored on the basis of
the number of correct answers. A modified version of the formula is:
Reliability estimate (kR21) = k ﴾1 – (m (k-m)))
K–1 ks2
Where
K = the number of items in the test
M = the mean of the test scores
S = standard deviation of the test scores.

The result of this formula approximates that of Kuder-Richardson formula 20. It has
a smaller reliability estimate in most cases.

This method of reliability estimate test whether the items in the test are
homogenous. In other words, it seeks to know whether each test item measures the
same quality or characteristics as every other. If this is established, then the
reliability estimate will be similar to that provided by the split-half method. On the
other hand, if the test lacks homogeneity an estimate smaller than split-half
reliability will result. The Kuder-Richardson method and the Split-half method are
widely used in determining reliability because they are simple to apply.
Nevertheless, the following limitations restrict their value. The limitations are:

 They are not appropriate for speed test in which test retest or equivalent form
methods are better estimates.

 They, like the equivalent form method, do not indicate the constancy of a
testee response from day to day. It is only the test-retest procedures that
indicate the extent to which test results are generalizable over different
periods of time.

They are adequate for teacher-made tests because these are usually power tests.

5.3 Factors Influencing Reliability Measures

The reliability of classroom tests is affected by some factors. These factors can be
controlled through adequate care during test construction. Therefore, the knowledge
of the factors are necessary to classroom teachers to enable them control them
through adequate care during test construction in order to build in more reliability in
norm referenced classroom tests.

5.3.1 Length of Test

The reliability of a test is affected by the length. The longer the length of a test is,
the higher its reliability will be. This is because longer test will provide a more
adequate sample of the behaviour being measured, and the scores are apt to be less
distorted by chance factors such as guessing. If the quality of the test items and the

212
EDU 423 MEASUREMENT AND EVALUATION

nature of the testees can be assumed to remain the same, then the relationship of
reliability to length can be expressed by the simple formula:

rnn = nr 11 .
1 + (n – 1) r 11

Where

rnn is the reliability of a test n times as long as the original test;


rii is the reliability of the original test; and
n is as indicated, the factors by which the length of the test is increased.

Increase in length of a test brings test scores to depend closer upon the
characteristics of the person being measured and more accurate appraisal of the
person is obtained. However, we all know that lengthen a test is limited by a number
of practical considerations. The considerations are the amount of time available for
testing, factors of fatigue and boredom on part of the testees, inability of classroom
teachers to constructs more equally good test items. Nevertheless, reliability can be
increased as needed by lengthening the test within these limits.

5.3.2 Spread of Scores

The reliability coefficients of a test are directly influenced by the spread of scores in
the group tested. The larger the spread of scores is, the higher the estimate of
reliability will be if all other factors are kept constant. Larger reliability coefficients
result when individuals tend to stay in same relative position in a group from one
testing to another. It therefore follows that anything that reduces the possibility of
shifting positions in the group also contributes to larger reliability coefficient. This
means that greater differences between the scores of individuals reduce the
possibility of shifting positions. Hence, errors of measurement have less influence
on the relative position of individuals when the differences among group members
are large when there is a wide spread of scores.

5.3.3 Difficulty of Test

When norm-referenced test are too easy or too difficult for the group members
taking it, it tends to produce scores of low reliability. This is so since both easy and
difficult tests will result in a restricted spread of scores. In the case of easy test, the
scores are closed together at the top of the scale while for the difficult test; the
scores are grouped together at the bottom end of the scale. Thus for both easy and
difficult tests, the differences among individuals are small and tend to be unreliable.
Therefore, a norm-referenced test of ideal difficulty is desired to enable the scores to
spread out over the full range of the scale. This implies that classroom achievement
tests are to be designed to measure differences among testees. This can be achieved
by constructing test items with at least average scores of 50 percent and with the
scores ranging from zero to near perfect scores. Constructing tests that match this
level of difficulty permits the full range of possible scores to be used in measuring

213
EDU 423 MEASUREMENT AND EVALUATION

differences among individuals. This is because the bigger the spread of scores, the
greater the likelihood of its measured differences to be reliable.

5.3.4 Objectivity

This refers to the degree to which equally competent scorers obtain the same results
in scoring a test. Objective tests easily lend themselves to objectivity because they
are usually constructed so that they can be accurately scored by trained individuals
and by the use of machines. For such test constructed using highly objective
procedures, the reliability of the test results is not affected by the scoring
procedures. Therefore the teacher made classroom test calls for objectivity. This is
necessary in obtaining reliable measure of achievement. This is more obvious in
essay testing and various observational procedures where the results of testing
depend to a large extent on the person doing the scoring. Sometimes even the same
scorer may get different results at different times. This inconsistency in scoring has
an adverse effect on the reliability of the measures obtained. The resulting test
scores reflect the opinions and biases of the scorer and the differences among testees
in the characteristics being measured.

Objectivity can be controlled by ensuring that evaluation procedures selected for the
evaluation of behaviour required in a test is both appropriate and as objective as
possible. In the case of essay test, objectivity can be increased by careful framing of
the questions and by establishing a standard set of rules for scoring. Objectivity
increased in this manner will increase reliability without undermining validity.

5.4 Methods of Estimating Reliability

Different methods of estimating reliability of a test yield different values of


reliability estimates even for the same test. This is because of the differences in the
way each of the procedures defines measurement error. In other words, the size of
the reliability coefficient is related to the method of estimating reliability. This is
clarified below:
 Test-retest method: Reliability Coefficient based on test retest method is
always lower than reliability obtained from split-half method but higher than
those obtained through equivalent forms method. This is because test-retest
method is affected by time to time fluctuation.

 Equivalent forms method: This method has the least value of reliability
which real reliability could possibly take. Equivalent forms method is
affected by both time to time and form to form fluctuations.

 Spilt-half method: Estimation of reliability from the spilt-half method gives


the largest value which the real reliability can possibly take. This is because
any reliability estimate that is based on the result of a single testing will
result in an overestimation of the reliability index.

214
EDU 423 MEASUREMENT AND EVALUATION

 Kuder-Richardson methods: This method like the spilt-half method is based


on single testing and so the reliability index is over estimation. Its value is
however lower than the value obtained for the spilt-half method.
It is clear from the above illustration that the size of the reliability coefficient
resulting from the method of estimating reliability is directly attributable to the type
of consistency included in each method. Thus, the more rigorous methods of
estimating reliability yield smaller reliability coefficient than the less rigorous
methods. It is therefore essential that when estimating the reliability of a
measurement instrument, the method used, the time lapse between repeated
administration and the intervening experience must be noted as well as the
assumptions and limitations of the method used for a clearly understanding of the
resulting reliability estimate.

6.0 ACTIVITY
1. Explain the concept of Cronbach alpha with practical example.
2. State one condition to apply Kuder-Richardson and Cronbach alpha.
3. Differentiate between test re-test and split half.

7.0 SUMMARY

 Reliability of a test is defined as the degree to which a test is consistent,


stable, dependable or trustworthy in measuring what it is measuring.
 Measure of stability is estimated by test-retest method.
 Measure of equivalence is estimated by equivalent form method.
 Measure of internal consistency is measured by split-half method and Kuder-
Richardson method.
 Factors affecting reliability measures are length of test, spread of scores,
difficulty of test and objectivity.

8.0 ASSIGNMENTS
1. Why is reliability important in psychological measurement?
2. What is inter-rater reliability?
3. Which of the coefficient of internal consistency is best to be used?

9.0 REFERENCES

Gronhund, N. E (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company.

Nenty, H. J (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar.

Thorndike, R. L and Hagen, E. P (1977). Measurement & Evaluation in Psychology


and Education. 4e New York: John Wiley and Sans.

215
EDU 423 MEASUREMENT AND EVALUATION

UNIT 4 VALIDITY OF CLASSROOM TEST

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Validity
5.2 Types of Validity
5.3 Content Validation
5.3.1 Content Validation and Test Development
5.3.2 Content Validation and Face Validity
5.4 Criterion Validation
5.4.1 Methods of Computing Correlation Coefficient
5.4.2 Interpretation of Correlation Coefficient (r) values
5.5 Construct Validation
5.5.1 Methods Used in Construct Validation
5.6 Validity of Criterion Referenced Mastery Tests
5.7 Factors Influencing Validity
5.8 Self-Assessment Exercise
6.0 Activity
7.0 Summary
8.0 Assignment
9.0 References

1.0 INTRODUCTION

In this unit, you will learn about validity, which is the single most important
criterion for judging the adequacy of a measurement instrument. You will learn
types of validity namely: content, criterion and construct validity. Furthermore, you
will learn about content validation, criterion validation and construct validation.
Finally, you will learn validity of criterion-referenced mastery tests and factors
influencing validity.

2.0 OBJECTIVES

By the end of this unit, you will be able to:

 define validity as well as content, criterion and construct validity;


 differentiate between content, criterion and construct validity;
 describe how each of the three types of validity are determined;
 interpret different validity estimates;
 identify the different factors that affect validity; and
 assess any test based on its validity.
216
EDU 423 MEASUREMENT AND EVALUATION

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit with absolute concentration.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Meaningfulness: Having a meaning or purpose.

 Appropriateness: Suitable for a particular person, condition, occasion, or


place.

 Validation: To validate is to prove that something is based on truth or fact,


or is acceptable.

 Specifications: An act of describing or identifying something precisely or of


stating a precise requirement.

 Instructionally relevant tasks: Process of analyzing and articulating the


kind of learning that you expect.

 Construct: An idea or theory containing various conceptual elements,


typically one considered to be subjective and not based on empirical
evidence.

 Representative: Typical of a class, group, or body of opinion.

5.0 MAIN CONTENT

5.1 Validity

Validity is the most important quality you have to consider when constructing or
selecting a test. It refers to the meaningfulness or appropriateness of the
interpretations to be made from test scores and other evaluation results. Validity is
therefore, a measure or the degree to which a test measures what it is intended to
measure. For example, a classroom essay test designed to measure students‟
understanding of the role of FIFA in World Cup, besides measuring the ability of
the testees understanding of the role of FIFA also calls for ability to read and use
language of expression as well as the technical terms appropriate in expressing
oneself including grammar and spelling abilities.

Although these aspects were not originally intended, they are consciously or
unconsciously considered when scoring the responses to the question. But the
217
EDU 423 MEASUREMENT AND EVALUATION

original design of the question was not to measure these skills. If students‟ skill in
these other areas were of interest, appropriate tests should be designed to measure
each of them.

With the present test, their influence impairs the validity of the resulting scores as
accurate indicators of the students‟ understanding of the role of FIFA in World Cup.
In general validity refers to the appropriateness of the interpretations made from test
scores and other evaluation results, with regard to a particular use. Thus validity is
always concerned with the specific use of the results and the soundness of our
proposed interpretations. Hence, to the extent that a test score is decided by factors
or abilities other than that which the test was designed or used to measure, its
validity is impaired.

5.2 Types of Validity

The concern of validity is basically three, which are:

 determining the extent to which performance on a test represents level of


knowledge of the subject matter content which the test was designed to
measure – content validity of the test;
 determining the extent to which performance on a test represents the amount
of what was being measured possessed by the examinee – construct validity
of the test; and
 determining the extent to which performance on a test represents an
examinee‟s probable task – criterion (concurrent and prediction) validity of a
test.

These concerns of validity are related and in each case the determination is based on
a knowledge of the interrelationship between scores on the test and the performance
on other task or test that accurately represents the actual behaviour. Validity may
therefore be viewed as a unitary concept based on various kinds of evidence. The
three approaches to test validation are briefly discussed in figure 1 below.

Procedure Meaning
Content – Related Compare the test tasks to the test How well the sample of test
Evidence specifications describing the task domain tasks represents the domain
under consideration. of tasks to be measured
Criterion – Compare test scores with another measure How well test performance
Related Evidence of performance obtained at a later date (for predicts future performance
prediction) or with another measure of or estimates current
performance obtained concurrently (for performance on some valued
estimating present status) measures other than the test
itself (a criterion)
Construct – Establish the meaning of the scores on the How well test performance
Related Evidence test by controlling (or examining) the can be interpreted as a
development of the test and experimentally meaning of some
determining what factors influence test characteristics or quality
performance
Figure 1: Approaches to Test Validation
218
EDU 423 MEASUREMENT AND EVALUATION

5.3 Content Validation

This is the process of determining the extent to which a set of test tasks provides a
relevant and representative sample of the domain of tasks under consideration.
Content validation procedure is especially important in achievement testing and is of
primary concern in constructing classroom tests. The procedure involves writing test
specifications that define a domain of instructionally relevant tasks and then
constructing or selecting test items that provide a representative and relevant
measure of the tasks. In classroom testing, the domains of achievement tasks are
determined by the instruction, and test development involves:

 clearly specifying the domain of instructionally relevant tasks to be


measured; and
 constructing or selecting a representative set of test tasks.

Therefore, to obtain a valid measure of learning outcomes, we start from what has
been taught to what is to be measured and finally to a representative sample of the
relevant tasks. That is, for a validity measure, you have to consider the instruction,
the achievement domain before the test itself.

5.3.1 Content Validation and Test Development

You know that the most relevant type of validity associated with teacher-made
classroom or achievement test is content validity. You also know that content
validation typically takes place during test development. Test development is
primarily a matter of preparing detailed test specifications and then constructing a
test that meets these specifications. In the table of specification the content of a
course may be defined to include both subject matter content and instructional
objectives. While the subject matter topics is concerned with the topics to be
learned, the instructional objectives is concerned with the type of performance the
learners are expected to demonstrate. This is done so that the test we construct using
this specification will aid in constructing test that will produce results that represents
both the content areas and the objectives we wish to measure.

You will recall that in Module 3 Unit 4 you learned about the use of table of
specification (table 3.1) in test development. The primary function of that table is
for content validation. The percentages in the table indicate the relative degree of
emphasis that each content area and each instructional objective is to be given in the
test. Thus, to ensure content validity using the table, the specifications describing
the achievement domain to be measured should reflect what was taught and the test
items must function as intended if valid results are to be obtained.

5.3.2 Content Validation and Face Validity

Face Validation is a quick method of establishing the content validity of a test after
its preparation. This is done by presenting the test to subject experts in the field for

219
EDU 423 MEASUREMENT AND EVALUATION

their experts‟ opinion as to how well the test “looks like” it measures what it was
supposed to measure. This process is referred to as face validity. It is a subjective
evaluation based on a superficial examination of the items, of the extent to which a
test measures what it was intended to measure.

Face Validity also applies to some extent to construct validity. It goes beyond the
superficial examination of the items to determine how the test looked to the extent
to which appropriate phrasing is used in the construction of an item to appear more
relevant to the test taker. The validity of the test would be determined by how well it
sampled the domain of task using appropriate phrases relevant to the test taker‟s
level and /or field of study by using the appropriate subject language.

5.4 Criterion Validation

Criterion Validation is the process of determining the extent to which test


performance is related to some other valued measure of performance. It tries to
determine the level to which one can confidently tell, based on an individual‟s score
on a given test, his performance on a current or future related task. The current or
future related task is called the criterion. This may involve studies of how well test
scores predict future performance (predictive validation study) or estimate some
current performance (concurrent validation study). It can be a test on the job which
requires skills and knowledge similar to those called for by the test whose criterion
validity is being estimated. These validity studies are typically reported by means
of correlation coefficient called Validity Coefficient.

Predictive Validity tells us the relationship between measures over an extended


period of time. While concurrent validity, is interested in estimating present status
and this tells us the relationship between the measures obtained concurrently. In
other words, while concurrent validity indicates the extent to which the test can be
used to estimate an examinee‟s present standing on a criterion, predictive validity
involves the extent to which scores from the test can be used to estimate or predict
level of future performance on a criterion. That is, in both cases the intention is to
predict present or future performance on a criterion based on observed performance
on a given test. A typical example is how well trainees‟ performance on a written
test relates to their performance on-the-job practical test which calls on the same
skills and knowledge involved in the written test is an indication of concurrent
validity. But how well students‟ performance on Mock SSCE relates to their
performance in the actual SSCE is an indication of the predictive validity of the
Mock SSCE.

The major difference between Concurrent Validity and Predictive Validity is that
while for a Concurrent Validity, both the test information and the criterion data are
gathered at the same time, in the Predictive Validity test information is to be
obtained first and then at some later time, performance on the criterion would be
obtained. Then the correlation between the two sets of scores would be determined.
The usual procedure is to correlate statistically the two sets of scores and to report

220
EDU 423 MEASUREMENT AND EVALUATION

the degree of relationship between them by means of a correlation coefficient which


enables validity to be presented in precise and universally understood terms.

5.4.1 Methods of Computing Correlation Coefficient

A correlation coefficient expresses the degree of relationship between two sets of


scores by numbers ranging from +1∙00 from – 1.00 to + 1.00 to -1∙00. A perfect
positive correlation is indicated by a coefficient of +1∙00 and a perfect negative
correlation by a coefficient of -1∙00. A correlation coefficient of ∙00 lies midway
between these extremes and indicates no relationship between the two sets of scores.
The larger the coefficient (positive or negative), the higher the degree of relationship
expressed. There are two common methods of computing correlation coefficient.
These are:

 Spearman Rank-Difference Correlation


 Pearson Product-Moment Correlation.

You will note that correlation indicates the degree of relationship between two
scores but not the causation of the relationship. Usually, further study is needed to
determine the cause of any particular relationship. The following are illustrations of
the computation of correlation coefficients using both methods.

Spearman Rank-Difference Correlation: This method is satisfactory when the


number of scores to be correlated is small (less than 30). It is easier to compute with
a small number of cases than the Pearson Product-Moment Correlation. It is a
simple practical technique for most classroom purposes. To use the Spearman Rank-
Difference Method, the following steps listed in table 3.1 should be taken.

Table 5.1: Computing Guide for the Spearman Rank-Difference Correlation

Step Procedure Result in Table 3.2


1 Arrange pairs of scores for each examinee in Columns 1 and 2
columns
2 Rank examinees from 1 to N (number in group) for Columns 3 and 4
each set of scores
3 Rank the difference (D) in ranks by subtracting the Column 5
rank in the right hand column (4) from the rank in
the left-hand column (column 3)
4 Square each difference in rank (column 5) to obtain Column 6
difference squared (D2)
5 Sum the squared differences in column 6 to obtain Bottom of Column 6
D2
6 Apply the following formula ρ = 1 - 6 x 514
ρ (rho) = 1 - 6 x D2 20(20² – 1)
N (N2 – 1) = 1 - 3084
Where 7980
 = Sum of = 1 - ∙ 39
D = Difference in rank = ∙ 61
N = Number in group

221
EDU 423 MEASUREMENT AND EVALUATION

Table 5.2: Computing Spearman Rank Difference


Correlation for a Pair of Hypothetical Data
1 2 3 4 5 6
Student Mathematics Physics Mathematics Physics D D2
Number Score Score Rank Rank
1 98 76 1 2 -1 1
2 97 75 2 3 -1 1
3 95 72 3 4 -1 1
4 94 70 4 5 -1 1
5 93 68 5 6 -1 1
6 91 66 6 7 -1 1
7 90 64 7 8 -1 1
8 89 60 8 10 -2 4
9 88 58 9 11 -2 4
10 87 57 10 12 -2 4
11 86 56 11 13 -2 4
12 84 54 12 14 -2 4
13 83 52 13 15 -2 4
14 81 50 14 16 -2 4
15 80 48 15 17 -2 4
16 79 46 16 18 -2 4
17 77 44 17 20 -3 9
18 76 45 18 19 -1 1
19 75 62 19 9 10 100
20 74 77 20 1 19 361
D2
= 514

Pearson Product-Moment Correlation: This is the most widely used method and
the coefficient is denoted by the symbol r. This method is favoured when the
number of scores is large and it‟s also easier to apply to large group. The
computation is easier with ungrouped test scores and would be illustrated here. The
computation with grouped data appears more complicated and can be obtained from
standard statistics test book. The following steps listed in table 3.3 will serve as
guide for computing a product-moment correlation coefficient (r) from ungrouped
data.
Table 5.3: Computing Guide for the Pearson Product-Moment
Correlation Coefficient (r) from Ungrouped Data
Step Procedure Results in Table 3.4
1 Begin by writing the pairs of score to be studied in two columns. Make Columns 1 and 2
certain that the pair of scores for each examinee is in the same row. Call
one Column X and the other Y
2 Square each of the entries in the X column and enter the result in the X 2 Column 3
column
3 Square each of the entries in the Y column and enter the result in the Y 2 Column 4
column
4 In each row, multiply the entry in the X column by the entry in the Y Column 5
column, and enter the result in the XY column
5 Add the entries in each column to find the sum of () each column.
Note the number (N) of pairs of scores From Table 3.4, then
X = 1717 X² = 148487
Y = 1200 Y² = 74184
XY = 103984
222
EDU 423 MEASUREMENT AND EVALUATION

6 Substitute the obtained values in the formula:


XY - X Y
r = N N N__________

X2 - X 2
Y2 - Y 2

N N N N
OR
7 Divide the sum of each column by N before putting the data into the formula.
Thus for data from Table 3.4,
X = 1717 = 85∙85 Y2 = 74184 = 3709∙20
N 20 N 20

Y = 1200 = 60∙00 XY = 103984 = 5199∙20


N 20 N 20

X2 = 148487 = 7424∙35


N 20

Then, substituting in the formula (no.6)

r = _________5199∙20 - (85∙85)(60∙00)_______
7424∙35 – (85∙85)2 3709∙20 –(60∙00)²

r = _____________5199∙20 - 5151∙00___________
7424∙35 - 7370∙22 3709∙20 - 3600∙00

r = _____48∙20____ = __48∙20_____ = 48∙20


54.13 109∙20 7∙36 x 10·45 76∙91

r = ∙63
9 Note that the computations involve finding the mean and standard deviation of
each set of scores (X and Y). Hence the formula can also be written thus:

XY - (Mx)(My)
r = __N________________
(SDx)(SDy)
where
Mx = mean of scores in X column
My = mean of scores in Y column
SDx = standard deviation of scores in X column
SDy = standard deviation of scores in Y column

Thus for the same data

r = 5199∙20 - (85∙85)(60∙00)
7∙36 x 10∙45

= ∙63

223
EDU 423 MEASUREMENT AND EVALUATION

Table 5.4: Computing Pearson Product-Moment Correlation for a pair of the


Hypothetical Ungrouped Data

1 2 3 4 5
Student Mathematics Physics X2 Y2 XY
Number Score (X) Score (Y)
1 98 76 9604 5776 7448
2 97 75 9409 5625 7275
3 95 72 9025 5184 6840
4 94 70 8836 4900 6580
5 93 68 8649 4624 6324
6 91 66 8281 4356 6006
7 90 64 8100 4096 5760
8 89 60 7921 3600 5340
9 88 58 7744 3364 5104
10 87 57 7569 3249 4959
11 86 56 7396 3136 4816
12 84 54 7056 2916 4536
13 83 52 6889 2704 4316
14 81 50 6561 2500 4050
15 80 48 6400 2304 3840
16 79 46 6241 2116 3634
17 77 44 5929 1936 3388
18 76 45 5776 2025 3420
19 75 62 5625 3844 4650
20 74 77 5476 5929 5698
N=20 (X)=1717 (Y)=1200 (X2)=148487 (Y2)=74184 (XY)=103984

5.5 Interpretation of Correlation Coefficient (r) Values

You know that correlation coefficient indicates the degree of relationship between
two sets of scores by numbers ranging from +1·00 to -1·00. You also know that a
perfect correlation is indicated by a coefficient of +1·00 and a perfect negative
correlation by a coefficient of -1·00. Thus, a correlation coefficient of ·00 lies
midway between these extremes and indicates no relationship between the two sets
of scores. In addition to the direction of relationship which is indicated by a positive
sign (+) for a positive (direct relation) or a negative sign (-) for a negative (inverse
relation), correlation also has a size, a number which indicates the level or degree of
relationship. The larger this number the more closely or highly two set of scores
relate. We have said that correlation coefficient or index take on values between
+1·00 and -1·00. That is, the size of relationship between two set of scores is never
more than +1·00 and never less than -1·00. The two values have the same degree or
level of relationship but while the first indicates a direct relation the second
indicates an inverse relation. A guide for interpreting correlation coefficient (r)
values obtained by correlating any two set of test scores is presented in figure 2
below:

224
EDU 423 MEASUREMENT AND EVALUATION

Perfect positive relationship


Very high positive relationship
+1·.0
High positive relationship
+ ·.8

+ ·.6 Moderate positive relationship

Low positive relationship


+· .4
Negligible relationship
+ ·.2
No relationship at all
Negligible relationship
0·.0
Low negative relationship
- ·.2
Moderate negative relationship
- ·.4
High negative relationship
- ·.6
Very high negative relationship
-·8
Perfect negative relationship
-1·0

Figure 2: Interpretation of r-values

The value of criterion validity index is affected by:

 the validity of the criterion variable itself (validity of the instrument used in
obtaining criterion scores).
 the time interval between the administration of the two tests (predictor and
criterion measures; and
 the selection of criterion group based on performance on the predictor test
scores which results in a homogeneous group that is no more a good
representative of the original population to which inference based on the test
validity is to be made.

5.6 Construct Validation

A Construct is a psychological quality that we assume exists in order to explain


some aspect of behaviour. Examples of construct include reasoning ability,
intelligence, creativity, honesty and anxiety. They are called construct because they
are theoretical constructions that are used to explain behaviour. When we interpret
test scores as a measure of a particular construct, we are implying that there is such
a construct, that it differs from other constructs, and that the test scores provide a
measure of the construct that is little influenced by extraneous factors. Instruments
through which attempts are made to define or give meaning to the construct are
developed and construct validity is the degree to which such an instrument
accurately defines or gives meaning to the construct.

The advantage of being able to interpret test performance in terms of the


psychological construct is that each construct has an underlying theory that can be
225
EDU 423 MEASUREMENT AND EVALUATION

brought to bear in describing and predicting a person‟s behaviour. For instance,


when we say that a person is highly intelligent, has good reading comprehension or
is sociable, it gives idea to what type of behaviour might be expected in various
specific situations. Hence, construct validation is defined as the process of
determining the extent to which test performance can be interpreted in terms of one
or more psychological constructs. Construct Validation has implications for the
practical use of test results. Usually, when a test is to be interpreted as measure of a
particular construct, the various types of evidence useful for construct validation
should be considered during its development and selection. This includes evidence
from content and criterion referenced validation. The process of Construct
Validation usually involves:

 identifying and describing by means of a theoretical framework, the meaning


of the construct to be measured;
 deriving hypotheses, regarding test performance from the theory
 underlying the construct; and
 verifying the hypotheses by logical and empirical means.

Primarily, Construct Validation takes place during the development and try-out of a
test and is based on an accumulation of evidence from many different sources. Thus,
the development and selection of test items on the basis of construct validation
focuses on the types of evidence that it seems reasonable to obtain. Moreover,
special attention is given to interpretation to be made.

5.6.1 Methods Used in Construct Validation

You have seen that construct is a complex variable and so, sometimes, simpler
variables or factors are often postulated to explain them. This involves both logical
analysis and various comparative and correlation studies accumulation of which is
endless. Nevertheless, in practical situations we focus on the types of evidence that
it seems reasonable to obtain and that are most relevant to the types of
interpretations to be made.

The following methods are commonly used:

i. Determining the domain of tasks to be measured: State the test blue print
clearly to bring out the meaning of the construct and the extent to which the
test provides a relevant and representative measure of the task domain.

ii. Analyzing the mental process required by the test items: Determine the
mental process called forth by the test by examining the test items themselves
and by administering the test to individual learners and having them “think
aloud” as they answer.

iii. Comparing the scores of known groups: This entails checking a prediction
of differences for a particular test between groups that are known to differ
and the results are to be used as partial support for construct validation.
226
EDU 423 MEASUREMENT AND EVALUATION

iv. Comparing results before and after some particular treatment: This
involves verification of test results for the theory underlying the trait being
measured. This is done by administering the test before and after treatment
to determine the effect of the treatment.

v. Correlating the scores with other tests: The scores of any particular test are
expected to correlate substantially with the scores of other tests that
presumably measure the same ability or treat and vice versa. Therefore, the
construct validity estimate of a new test could be got by correlating scores
from the test with scores from each of two or more existing valid tests that
measure the same construct. If the resulting correlation coefficients are high
then the new test would be said to measure the same construct with the older
test. This is based on the fact that a test that is a valid measure of one
construct should correlate highly with other construct that are known to relate
to the construct it is measuring. The construct validity of a given test can also
be estimated by correlating scores from it with scores from two or more valid
tests that measure each of the related construct. In this case, the resulting
coefficients might not be as high as those in the first case but moderately
high. The resulting correlation coefficients are indications of the construct
validity of the new test. This correlation coefficient must always be qualified
by indicating the criterion variable. The population used and information
regarding the type of inference or decision that could be made are based on
the test scores.

The construct validity of a test based on correlation method could be determined


with the use of multi trait – multi method and the factor analysis.

5.6.2 Validity of Criterion-Referenced Mastery Tests

You know that Criterion-Referenced Mastery tests are not designed to discriminate
among individuals; hence, statistical validation procedures play a less prominent
role here. However, all three types of validity evidence are still important in
constructing and selecting this type of test. Content-related evidence is of primary
concern when criterion-referenced tests are used for instructional purposes. This
process also involves specifying the performance domain to be measured and
constructing or selecting a set of tasks that is both relevant and representative. The
procedure depends on logical analysis and judgment; hence score variability is not
crucial. For other uses of this test both criterion – related and construct related
evidence are likely to receive increased emphasis.

5.7 Factors Influencing Validity

Many factors tend to influence the validity of test interpretation. These factors
include factors in the test itself. The following are the list of factors in the test itself
that can prevent the test items from functioning as intended and thereby lower the
validity of the interpretations from the test scores.
227
EDU 423 MEASUREMENT AND EVALUATION

They are:

 unclear directions on how examinees should respond to test items;


 too difficult reading vocabulary and sentence structure;
 inappropriate level of difficulty of the test items;
 poorly structured test items;
 ambiguity leading to misinterpretation of test items;
 inappropriate test items for the outcomes being measured;
 test too short to provide adequate sample;
 improper arrangement of items in the test; and
 identifiable pattern of answer that leads to guessing.

Other factors that influence validity include:

 functioning content and teaching procedures;


 factors in the test administration and scoring;
 factors in examinees‟ response; and
 nature of the group and the criterion.

Thus, in order to ensure validity conscious effort should be made during


construction, selection and use of test and other evaluation instruments to control
those factors that have adverse effect on validity and interpretation of results.

6.0 ACTIVITY

1. Explain the concept of validity of a research instrument.


2. What do you understand by threats to internal and external validity of a
research instrument?
3. State any three external threats to the validity of a research instrument.

7.0 SUMMARY

Validity is a measure or the degree to which a test measures what it is intended to


measure.

The three concerns of validity are:

 determining the extent to which performance on a test represents level of


knowledge of the subject matter content which the test was designed to
measure (content validity of the test);
 determining the extent to which performance on a test represents the amount of
what was being measured possessed by the examinee (construct validity of the
test); and
 determining the extent to which performance on a test represents an
examinee‟s probable performance on some other related test or task (criterion
validity of a test).
228
EDU 423 MEASUREMENT AND EVALUATION

 Content validation is the process of determining the extent to which a set of


test tasks provided a relevant and representative sample of the domain of tasks
under consideration.
 Criterion validation is the process of determining the extent to which test
performance is related to some other valued measure of performance.
 Correlation coefficient expresses the degree of relationship between two sets
of scores by numbers ranging from +1.00 to -1.00.
 Two common methods of computing correlation coefficients are:
 Spearman Rank-Difference Correlation
 Pearson Product-Moment Correlation
 Construct Validation is the process of determining the extent to which test
performance can be interpreted in terms of one or more psychological
construct.
 A construct is a psychological quality that are assumed exists in order to
explain some aspect of behaviour
 Criterion – referenced mastery tests are not designed to discriminate among
individuals. Therefore statistical validation procedures play a less prominent
role.
 Many factors influence the validity of a test. These factors include factors in
the test itself, factors in the test administration, factors in the examinee‟s
response, functioning content and teaching procedures as well as nature of the
group and the criterion.

8.0 ASSIGNMENT

1. Why is validity important in measurement and evaluation?


2. State two assumptions for the application of Pearson‟s Product Moment
Correlation and Spearman‟s Rank order correlation.
3. Explain briefly, the following type of validity in your own understanding
face validity, concurrent validity, predictive validity and construct
validity.

229
EDU 423 MEASUREMENT AND EVALUATION

9.0 REFERENCES

Brewer, M. (2000). Research Design and Issues of Validity. In Reis, H. and Judd, C.
(eds.) Handbook of Research Methods in Social and Personality Psychology.
Cambridge: Cambridge University Press.

Shadish, W., Cook, T., and Campbell, D. (2002). Experimental and Quasi-
Experimental Designs.

Gronhund, N. E. (1985). Measurement and Evaluation in Teaching. New York:


Macmillan Publishing Company.

Nenty, H. J. (1985). Fundamentals of Measurement and Evaluation in Education.


Calabar: University of Calabar.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

230
EDU 423 MEASUREMENT AND EVALUATION

UNIT 5 PROBLEM OF MARKING TEST AND QUALITY CONTROL


IN MARKING SYSTEM

CONTENTS

1.0 Introduction
2.0 Objectives
3.0 How to Study this Unit
4.0 Word Study
5.0 Main Content
5.1 Marks and Marking
5.2 Types of Marking Systems
5.2.1 Traditional Marking Systems
5.2.2 Pass-Fail System
5.2.3 Checklists of Objectives
5.3 Quality Control: Current Marking Systems
5.3.1 Multiple Marking System
5.3.2 Guidance for Developing a Multiple Marking System
5.4 Assigning Letter Grades
5.4.1 Determining what to include in a Grade
5.4.2 Combing Data in Assigning Grades
5.4.3 Selecting the Proper Frame of Reference for Grading
5.4.4 Determining the Distribution of Grades.
6.0 Activity
7.0 Summary
8.0 Assignments
9.0 References

1.0 INTRODUCTION

From Module 3 Unit 1 of this course, we have discussed test and test related issues.
In this Unit we shall round-up our discussion with the problem of marking tests and
quality control in marking system. Specifically, you will learn about marks and
marking, types of marking systems namely the traditional marking system, the pass-
fail system and checklist of objectives. Furthermore, you will learn quality control:
current marking systems – multiple marking systems and guidelines for developing
a multiple marking system. Finally, you will learn about assigning letter grades
which involves determining what to include in a grade, combining data in assigning
grades, selecting the proper frame of reference for grading and determining the
distribution of grades.

2.0 OBJECTIVES

By the end of this unit, you will be able to:

 explain the meaning of mark and marking systems;


231
EDU 423 MEASUREMENT AND EVALUATION

 identify the various types of marking systems and their peculiar


characteristics,
 state the importance of the multiple marking system;
 enumerate the guidelines for developing a multiple marking system;
 assign letter grades to marks using the range and the stanine weighting
systems; and
 select the proper frame of reference for grading.

3.0 HOW TO STUDY THIS UNIT

1. Read through this unit very carefully.


2. Study the unit step by step as the points are well arranged.

NOTE: All answers to activities and assignment are at the end of this book.
This applies to every other unit in this book.

4.0 WORD STUDY

 Supplementing: Something that is added to something else in order to


improve it.

 Rigorous processes: Characterized by or adhering to strict standards.

 Low-achieving group: Those students who aren't yet reading at their


potential.

 Absolute: Viewed or existing independently and not in relation to other


things; not relative or comparative.

 Grade: A particular level of rank, quality, proficiency, intensity, or value.

5.0 MAIN CONTENT


5.1 Marks and Marking

Marks and marking have been very deeply rooted in our educational system. This is
to the extent that they have become the basis in whole or in part for a wide range of
actions and decisions within a given educational institution, between levels in the
educational structure, and in the relations of the educational system in the outside
world. For instance, eligibility for admission to certain programmes, for scholarship
and, for continuing in school… are determined by academic standing. For these
reasons, marking system have been so desirable, durable and so resistant to change.

Marks with all these technical limitations remain one of the best predictors of later
achievement and so are important in conveying information about likelihood of
success in college generally or in specific institutions or programmes. Marks have a

232
EDU 423 MEASUREMENT AND EVALUATION

significant role to play in informing the individual as well as the institution to which
he may later apply of his prospects for academic success. Marking practices on the
other hand are expressions of individual and group value systems as much as they
are impartial report of student behaviour. Nevertheless, the marking practices are
faced with some challenges such as:

 Should students be penalized for handling in paper late or get extra credit for
doing optional work?
 Should students with the same level of scholastic aptitude get on the average
the same grades irrespective of the subject or department involved?

These are questions of value that rarely are examined in detail and upon which
members of a faculty seldom come to genuine agreement. As long as disagreement
exists within the local educational culture on questions of value such as these, the
technical efforts of the psychometrician to introduce a consistent rational into
grading practices will be ineffectual.

5.2 Types of Marking Systems

Any effective system of marking has to satisfy two principles which are:

 providing the type of information needed, and


 presenting it in an understandable form.

5.2.1 Traditional Marking Systems

In this system, a single letter grade such as A, B, C, D, F or a single number such as


5, 4, 3, 2, 1 is assigned to represent a learner‟s achievement in each subject. This
system although brief but comprehensive in expression and convenient has several
defects which are:

 the meaning of such marks is often not easy to understand because they are a
conglomerate of diverse factors like achievement, effort and good behaviour;
 the interpretation of such marks is difficult even when it is possible to limit
the mark to achievement only ; and
 letter grades as typically used have resulted in an undesirable emphasis on
marks as ends in themselves.

Moreover, the lack of information to clarify the single letter grade may likely
contribute to this misuse. As a result of these, letter grades are viewed as goals to be
achieved rather than as means for understanding and improving the student learning.

5.2.2 Pass-Fail System


The Pass-Fail system is an alternative to the traditional letter grade. In this case,
students are permitted to take some elective courses under pass-fail option since the

233
EDU 423 MEASUREMENT AND EVALUATION

pass-fail grade is not included in their grade point average. Thus, the system
encourages students to explore new areas of study without the fear of lowering their
grade – point average. The pass-fail system offers less information than the
traditional system. It can also encourage students to reduce their study efforts in
these courses by shifting study time from these courses to those in which letter
grades are to be assigned. However, these limitations can be minimized if the
number of courses taken under the pass-fail option is restricted and made
compulsory. The pass-fail grade marking system is typical of courses taught under a
pure mastery learning approach. In this case learners are expected to demonstrate
mastery of all course objectives before receiving credit for a course and a simple
pass is all that is needed to indicate mastery. This pass-fail no grade system when
used, nothing is recorded on a learner‟s school record until mastery of the course is
demonstrated.

5.2.3 Checklists of Objectives

This is a means of supplementing the traditional marking system with a list of


objectives to be checked or rated. This provides more informative progress report.
Some schools supplement the traditional marking system with a list of objectives to
be checked or rated. These reports typically include rating of progress towards the
major objectives in each subject matter area. There are variations in the symbols
used for the rating from school to school. While some school retain the traditional
A, B, C, D, F lettering system other more commonly shift to fewer symbols such as
O (outstanding), S (Satisfactory) and N (Needs improvement). This form of
reporting has the advantage of presenting a detailed analysis of learners‟ strengths
and weaknesses. Moreover, they are used to highlight the objectives. But, keeping
the list of objectives down to a workable number and stating them in terms that are
understood by all users is a problem.

5.3 Quality Control: Current Marking Systems

You know that quality issues and quality assurance cannot be overlooked in any
measurement procedure. Hence, the rigorous processes of validity and reliability in
the process of test development and administration. These quality issues are
necessarily imbedded in every step and every segment of the current marking
systems to overcome the problems of the traditional marking system. Traditional
marking system has always played a major role in marking and reporting learner‟s
progress. The letter grades are favoured more than any other system. This is
evidenced by its frequent use by teachers perhaps because of the ease with which
such marks can be assigned may have contributed to their widespread use. Despite
the widespread use and for Quality Control, some teachers use more than one
method. This is because a multiple marking and reporting system tends to combine
the advantages of each marking and reporting system and to overcome some of the
inherent limitations of using any single marking and reporting method.

234
EDU 423 MEASUREMENT AND EVALUATION

5.3.1 Multiple Marking System

The traditional letter grades (A, B, C, D, F) have been in use for several decades,
despite efforts to replace them with a more meaningful system. Their continued
patronage indicates that they are serving some useful functions in the school.
Moreover, they are a simple and convenient means of maintaining permanent school
records. Hence, the emphasis has moved from replacing them to trying to improve
on them by supplementing with more detailed and meaningful reports of learners
progress. This has led to the use of multiple marking systems. The multiple marking
systems retain the use of traditional marking (letter and number grades) and
supplement the marks with checklists of objectives. In some cases, two marks are
assigned to each subject, one for achievement and the other for effort, improvement
or growth.

5.3.2 Guidelines for Developing a Multiple Marking System

There is no marking system that can be found equally satisfactory in all schools.
Therefore each school has to develop methods that fit it own particular needs and
peculiarities. In general, the following principles for devising a multiple marking
and reporting system are guidelines for this purpose. The development of the
marking and reporting system should be:

i. Guided by the functions to be served: This typically requires a study of the


functions for which the resulting marks are to be used. A satisfactory
compromise is then made to accommodate them as much as possible. The
letter grade should be retained as a pure measure of achievement while a
separate report could supplement course objectives and other desired
outcomes.
ii. Developed corporately by parents, pupils and school personnel: This is done
by a committee consisting of representative of these various groups. This
creates avenue for the groups to make useful contribution towards evolving
adequate marking system and understanding of the marking system while in
use.
iii. Based on a clear statement of educational objectives: The same objectives
that have guided instruction and evaluation should serve as a base for
marking.
iv. Based on adequate evaluation: The need for objective evaluation of
performance makes it necessary to take into account the types of evaluation
data needed at the planning stage of a marking system. The data to be
collected and used should contain valid and reliable information.
v. Detailed enough to be diagnostic and yet compact enough to be practical:
This can be achieved by supplementing the letter grade system with more
detailed report on other aspects of pupil development. The detailed report
will bring out the picture of their strengths and weaknesses as much as
possible.
vi. Provide for parent–teacher conferences as needed. These serve as supplement
to the marking system in use and are typically arranged as needed.
235
EDU 423 MEASUREMENT AND EVALUATION

5.4 Assigning Letter Grades

Effective use of A, B, C, D, F marking system calls for answers to questions such


as:

 What should be included in a letter grade?


 How should achievement data be combined in assigning letter grades?
 How should the distribution of letter grades be determined?
 What frame of reference should be used in grading?

Attempts are made below to discuss answers to the questions.

5.4.1 Determining What to Include in a Grade

Letter grades have been said to be most meaningful and useful when they represent
achievement alone. When they are combined with various aspects of learner‟s
development, they lose their meaningfulness as a measure of achievement.

In addition, they suppress aspects of development. The description of student


learning and development can be enhanced if the letter grades are made as pure a
measure of achievement as possible while reporting on other aspects of learning
separately. Moreover, if letter grades are to serve as valid indicators of achievement,
they must be based on valid measure of achievement.

This involves the process of defining the course objectives as intended learning
outcomes and developing or selecting tests and other evaluation devices that
measure these outcomes most directly. How much emphasis should be given tests,
rating and other measures of achievement in the letter grades is determined by the
nature of the course and the objective being stressed.

Also, the type of evaluation data to be included in a course grade and the relative
emphasis to be given to each type of evidence are determined primarily by
examining the instructional objectives. The more important the objective is, the
greater the weight it should receive in the course grade. Thus, letter grades should
reflect the extent to which learners have achieved the learning outcomes specified in
the course objectives, and these should be weighted according to their relative
importance.

5.4.2 Combining Data in Assigning Grades

After determining what to include in a letter grade and the relative emphasis to be
given to each aspect, the next step is to combine the various elements so that each
element receives its intended weight. For instance, if we decide that the final
examination should count 40 percent, the midterm 30 percent, laboratory
performance 20 percent and written reports, 10 percent, then our course grades will
be designed to reflect this emphasis. A typical procedure is to combine the elements

236
EDU 423 MEASUREMENT AND EVALUATION

into a composite score by assigning appropriate weights to each element and then
use these composite scores as a basis for grading.

To weight composite in a composite score, the variability of the scores must be


taken into account. This is illustrated by the simple example that follows. In this
example we assume that we are to combine scores on a final examination and a term
report and we want them to be given equal weight. If our range of scores on the two
measures are as follows:

i. The range weighting system

Range of Scores
Final Examination 60 to 100
Term Report 20 to 40

The range of scores provides a measure of scores variability or spread and this is
used to equate the two sets of scores. The final examination and the term report is
given equal weight in the composite score by using a multiplier that makes the two
ranges equal. In the above example, the final examination scores have a range of 40
(100-60) and the term report scores a range of 20 (40 – 20). Thus, we need to
multiply each term report by 2 to obtain the desired equal weight. On the other hand,
if we wanted our final examination to count twice as much as the term report, then
we need not multiply each of the term report by 2 because the final examination
range is already twice that of the term report. Also, if our desire is for the term
report to count twice as much as the final examination, then we would have to
multiply each term report by 4. The range system is satisfactory for most classroom
purposes.

ii. A more refined weighting system can be obtained by using the standard
deviation as the measure of variability.

iii. The stanine weighting system: The components in a composite score can also
be weighted properly by converting all sets of scores to stanines (standard
scores, 1 through 9) as we did in unit 2 of this module. When all scores have
been converted to the same stanine system, the scores in each set will have
the same variability. They are then weighted by simply multiplying each
stanine score by the desired weight. Using this system, a learner composite
score is determined thus:

Desired Weight Learner‟s Stanine Weighted Score


Examination 2 8 16
Laboratory Work 2 9 18
Written Reports 1 6 6

These composite scores can be used to rank learners according to an overall


weighted measure of achievement in order to assign letter grades.

237
EDU 423 MEASUREMENT AND EVALUATION

5.4.3 Selecting the Proper Frame of Reference for Grading


Letter grades are typically assigned on the basis of norm referenced frame of
reference or both norm and criterion referenced frames of reference. You know that
assigning grades on a norm-referenced basis involves comparing a learner‟s
performance with that of a reference group, typically one‟s classmates. Therefore,
in this system, the learner‟s relative ranking in the total group determines the grade
rather than by some absolute standard of achievement. This grading, because it‟s
based on relative performance, the grade is influenced by both the learner‟s
performance and the performance of the group. Thus, a learner will fare much better
grade wise in a low-achieving group than in a high achieving group. The
disadvantage of norm-referenced grading is that it has shifting frame of reference
(where grades depend on the group‟s ability). Despite this disadvantage, it is widely
used in the schools because much of classroom testing is norm-referenced. The tests
having been designed typically to rank learners in order of achievement rather than
to describe achievement in absolute terms.

You also know that assigning grades on a criterion-referenced basis involves


comparing a learner‟s performance to pre-specified standards set by the teacher.
These standards are usually concerned with the degree of mastery to be achieved by
the learners and may be specified as:

 tasks to be performed (type 60 words per minute without error); or


 the percentage of correct answers to be obtained on a test designed to
measure a clearly defined set of learning tasks.

Hence, in this system letter grades are assigned on the basis of an absolute standard
of performance rather than on a relative standard. If all learners demonstrate a high
level of mastery, all will receive high grades. The use of absolute level of
achievements as a basis for grading as implied in criterion-referenced system
requires that:

 the domain of learning tasks be clearly defined;


 the standard of performance be clearly specified and justified; and
 the measures of learner‟s achievement be criterion referenced.

These conditions are difficult to meet except in a mastery-learning situation.


Usually, when mastering learning is the goal, the learning tasks tend to be more
limited and easily defined. Moreover, percentage –correct scores, which are widely
used in setting absolute standards, are most meaningful in mastery learning since
they indicate how far a learner is from complete mastery.

5.4.4 Determining the distribution of Grades

There are two ways of assigning letter grades to measure the level of learner
achievement. These are: the norm-referenced system based on relative level of

238
EDU 423 MEASUREMENT AND EVALUATION

achievement and the criterion-referenced system based on absolute level of


achievement.

Norm-Referenced Grading: Essentially, the ranking of norm-referenced grades is


a matter of ranking the learners in order of overall achievement and assigning letter
grades on the basis of each learner‟s rank in the group. The ranking might be
limited to a single classroom group or on combined distributions of several
classrooms groups taking the course. The proportion of As, Bs, Cs, Ds and Fs to be
used are pre-determined before letter grades can be assigned. Grades can be
assigned on the basis of the normal curve. Grading on the normal curve results in an
equal percentage of As and Fs, and Bs and Ds regardless of the group‟s level of
ability. That is, the proportion of high grades is balanced by an equal proportion of
low grades. The limitations of assigning grades on the basis of the normal curve are
that:

 the groups are usually small to yield a normal distribution;


 classroom evaluation instruments are usually not designed to yield normally
distributed scores; and
 the learner population becomes more select as it moves through the grades and
the less-able learners fail or drop out of school.

Thus, grading is seldom defensible. Grading on normal curve can be defended only
when a course or combined courses have a largely and unselected group of learners.
Nevertheless, the credibility of basing the decision concerning the distribution of
grades on a statistical model (normal curve) rather than on a more rational basis is
still questionable.

A credible approach is to have the school staff set general guidelines for the
approximate distributions of marks for the letter grades. This might involve
separate distributions for introductory and advanced courses, for gifted and slow
learning classes and so on. That is, the distributions should be flexible enough to
allow for variation in the caliber of learners from one course to another and from
one time to another in the same course. This entails indicating ranges rather than
fixed percentages of learners to receive each letter grade. There is no simple or
scientific means of determining the ranges for a given situation. The decision is to
be made by the school staff by taking into account the school‟s philosophy, the
learner population and the purposes of the grades. A hypothetical suggestion is
presented for your understanding and assimilation of the assigning grades.

A = 10 to 20 percent of learners
B = 20 to 30 percent of learners
C = 30 to 50 percent of learners
D = 10 to 20 percent of learners
F = 0 to 10 percent of learners

239
EDU 423 MEASUREMENT AND EVALUATION

You should note that the distribution should provide for no failing grades. This is
because whether learners pass or fail a course should be based on their absolute
level of learning rather than, their relative position in some group. That is, even
when grading is done on a relative basis, the pass-fail decision must be based on an
absolute standard of achievement if it is to be educationally sound.

Criterion-Referenced Grading: This system of grading is most useful when a


mastery learning approach is used because mastery learning provides the necessary
conditions for grading on an absolute basis. The process includes:

 delimiting the domain of learning tasks to be achieved;


 defining the instructional objectives in performance terms;
 specifying the standards of performance to be attained; and
 measuring the intended outcomes with criterion-referenced instruments.

In the criterion-referenced grading system, if the course‟s objectives have been


clearly specified and the standards for mastery appropriately set, the letter grades
may be defined as the degree to which the objectives have been attained as indicated
below:

A= Outstanding: Learner has mastered all of the course‟s major and minor
instructional objectives.
B= Very Good: Learner has mastered all of the course‟s major instructional
objectives and most of the minor objectives.
C= Satisfactory: Learner has mastered all the course‟s major instructional
objectives but just a few of the minor objectives.
D= Very Weak: Learner has mastered just a few of the course‟s major and minor
instructional objectives and barely has the essentials needed for the next
highest level of instruction. Remedial work would be desirable.
F= Unsatisfactory: Learner has not mastered any of the course‟s major
instructional objectives and lacks the essentials needed for the next highest
level of instruction. Remedial work is needed.

Furthermore, if the test and other evaluation instruments have been designed to yield
scores in terms of the percentage of correct answers; criterion-referenced grading
then might be defined as follows:

A = 95 to 100 percent correct


B = 85 to 94 percent correct
C = 75 to 84 percent correct
D = 65 to 74 percent correct

Here also, you should note that defining letter grades in this manner is defensible
only if the necessary conditions of criterion – referenced system have been met. In
general, the distribution of grades in criterion-referenced grading systems is not
predetermined. If all learners demonstrate a high level of mastery, all will receive
high grades. Also, if some learners demonstrate a low level of performance, they
240
EDU 423 MEASUREMENT AND EVALUATION

will receive low grades. Hence, the distribution of grades is determined by each
learner‟s absolute level of performance, and not by the learner‟s relative position in
the group.

The two procedures for determining letter grades in a criterion-referenced system


are:

i. The One-short System: This provides a single opportunity to achieve the


pre-specified standards. In this system the learner is assigned whatever grade
is earned on the first attempt. This results in some failing grades.

ii. The Repeated-Attempts System: This procedure permits the learner to


make repeated attempts to achieve the pre-specified standards. The learner is
given corrective help and enough additional learning time as he progresses in
the learning process to achieve a satisfactory level of mastery. This system
systematically eliminates failure. Typically, only the letter grades A, B, C are
used and learners are permitted to repeat examinations until a satisfactory
level of performance is achieved.

In the criterion-referenced system, the letter grades are supplemented with


comprehensive report which consists of a checklist of objectives to inform learners
and parent the progress made by the end of the marking period.

6.0 ACTIVITY

1. What is quality assurance?


2. Which areas does quality assurance measure in marking?
3. What are the challenges of test marking in Nigeria?
4. Suggest any four ways in which the government of you country can use to
employ quality marking in examination.

7.0 SUMMARY

 Marks with all these technical limitations remain one of the best predictors of
later achievement and so are important in conveying information about the
learners progress.
 Marking practices are expressions of individual and group value systems.
 Types of marking systems include:
- the traditional marking system;
- pass-fail system; and
- checklist of objectives
 The multiple marking system retains the use of traditional marking system
and supplemented the marks with checklists of objectives.
 To assign letter grades the following issues have to be addressed:
- What should be included in a letter grade?
- How should achievement data be combined in assigning letter grades?
241
EDU 423 MEASUREMENT AND EVALUATION

- What frame of reference should be used in grading?


- How should the distribution of letter grade be determined?

8.0 ASSIGNMENT

1. What are the qualities of a good test?


2. State any three principles for developing a good marking guide.
3. Explain any three disadvantages of pass-fail system of marking.

9.0 REFERENCES

Ajayi, T. and Adegbesan, S. O. (2007). Quality Assurance in the Teaching


Profession. Paper presented at a forum on emerging issues in teaching
professionalism in Nigeria (14-16 March) Akure, Ondo State.

Arikewuyo, M. O. (2004). Effective Funding and Quality Assurance in the Nigerian


Education System – A paper presented at the 1st National Conference of the
Institute of Education, Olabisi Onabanjo University, Ago-Iwoye – Jan. pp.12-15.

Federal Ministry of Education (FME, 2009). Roadmap for the Nigerian Education
Sector. Abuja: Federal Ministry of Education.

Federal Republic of Nigeria (FRN,2004). National Policy on Education. 4th Edition,


Yaba, Lagos: NERDC Press. WAEC (2011). Diary of the Wes

Gronhund, N. E. (1985). Measurement and Evaluation in teaching. New York


Macmillan Publishing Company.

Thorndike, R. L and Hagen, E. P (1977). Measurement and Evaluation in


Psychology. New York: 4e John Wiley and Sons.

Omoroginwa, O. K. (2010). An Introduction to Educational Measurement and


Evaluation. Benin City: Perfect Touch.

242

You might also like