Tps 201 Test & Measurements
Tps 201 Test & Measurements
CREDIT HOURS: 3
PURPOSE OF THE UNIT: To help the students gain proper skills in constructing and
interpreting tests results for quality teaching.
COURSE CONTENT:-
Measurements, assessment, evaluation, purpose of evaluation, types of evaluation, factors
to consider for successful evaluation, purpose of assessment, continuous assessment,
characteristics of continuous assessment, Taxonomy of educational objectives, test in the
classroom, purpose of tests, objectives of classroom test, types of tests, discrete point test,
integrative tests, characteristics of a good test, construction of classroom test. different
kinds of tests to be constructed, basic principles of constructing multiple choice question,
principles of constructing essay tests, test contingencies,, test validity, types of validity,
factors affecting validity, test reliability, factors affecting reliability, test scoring, correction
formula, using test results, measures of central tendency, measures of variability, derived
scores, standard score Z- score, T- score, credit units, Grade points,
This unit introduces you to some important concepts associated with ascertaining
whether objectives have been achieved or not. Basically, the unit takes you through the
meanings of test, measurement assessment and evaluation in education.
Their functions are also discussed. You should understand the fine distinctions between
these concepts and the purpose of each as you will have recourse to them later in this
course and as a professional teacher.
Objectives:
These concepts are often used interchangeably by practitioners as if they have the same
meaning. This is not so. As a teacher, you should be able to distinguish one from the
other and use any particular one at the appropriate time to discuss issues in the
classroom.
Measurement
Measurement stops at ascribing the quantity but not making value judgment on the
child’s performance.
Assessment
Assessment is a fact finding activity that describes conditions that exists at a particular
time. Assessment often involves measurement to gather data. However, it is the domain
of assessment to organize the measurement data into interpretable forms on a number of
variables.
Assessment in educational setting may describe the progress students have made
towards a given educational goal at a point in time. However, it is not concerned with
the explanation of the underlying reasons and does not proffer recommendations for
action. Although, there may be some implied judgment as to the satisfactory in less or
otherwise of the situation.
In the classroom, assessment refers to all the processes and products which are used to describe
the nature and the extent of pupils’ learning. This also takes cognizance of the degree of
correspondence of such learning with the objectives of instruction.
Some educationists in contrasting assessment with evaluation opined that while evaluation is
generally used when the subject is not persons or group of persons but the effectiveness or
otherwise of a course or programme of teaching or method of teaching, assessment is used
generally for measuring or determining personal attributes (totality of the student, the
environment of learning and the student’s accomplishments).
A number of instruments are often used to get measurement data from various sources. These
include Tests, aptitude tests, inventories, questionnaires, observation schedules etc. All these
sources give data which are organized to show evidence of change and the direction of that
change. A test is thus one of the assessment instruments. It is used in getting quantitative data.
Evaluation
Evaluation adds the ingredient of value judgment to assessment. It is concerned with the
application of its findings and implies some judgment of the effectiveness, social utility or
desirability of a product, process or progress in terms of carefully defined and agreed upon
objectives or values. Evaluation often includes recommendations for constructive action. Thus,
evaluation is a qualitative measure of the prevailing situation. It calls for evidence of
effectiveness, suitability, or goodness of the programme.
The Purposes of Evaluation
According to Oguniyi (1984), educational evaluation is carried out from time to time for the
following purposes:
(i) To determine the relative effectiveness of the programme in terms of students’
behavioral output;
(xi) to predict the general trend in the development of the teaching-learning process;
(xiii) to provide an objective basis for determining the promotion of students from one class to
another as well as the award of certificates;
(xiv) to provide a just basis for determining at what level of education the possessor of a
certificate should enter a career.
TASK
Distinguish clearly between Test, Assessment, Measurement and Evaluation.
3. The process of delineating, obtaining and providing useful information for judging
decision alternatives (Stuffle Beametal 1971)
In line with this fine distinction between assessment and evaluation, we shall briefly deliberate
a little more here on evaluation and leave the discussion on assessment to the latter units.
TASK
Discuss the importance of evaluation to the classroom teacher?
Types of Evaluation
There are two main levels of evaluation viz: programme level and student level. Each of the two
levels can involve either of the two main types of evaluation– formative and summative at
various stages. Programme evaluation has to do with the determination of whether a
programme has been successfully implemented or not. Student evaluation determines how well
a student is performing in a programme of study.
Formative Evaluation
The purpose of formative evaluation is to find out whether after a learning experience, students
are able to do what they were previously unable to do. Its ultimate goal is usually to help
students perform well at the end of a programme. Formative evaluation enables the teacher to:
1. Draw more reliable inference about his/her students performance than an external
assessor, although he may not be as objective as the latter;
In other words, formative evaluation provides the evaluator with useful information about the
strength or weakness of the student within an instructional context.
Summative Evaluation
Summative evaluation often attempts to determine the extent the broad objectives of a
programme have been achieved. It is concerned with purposes, progress and outcomes of the
teaching-learning process.
Summative evaluation is judgmental in nature and often carries threats with it in that the student
may have no knowledge of the evaluator and failure has a far reaching effect on the students.
However, it is more objective than formative evaluation. Some of the underlying assumptions of
summative evaluation are that:
4. the teaching techniques, learning materials and audio-visual aids are adequate and have
been judiciously dispensed; and
Task
With suitable examples, distinguish between formative and summative evaluation.
- treatment
- conducive atmosphere
- Intended and un-intended outcomes and their implications considered.
Summary
In this section, we have distinguished clearly between measurement, assessment and evaluation.
Measurement is seen as a process of assigning numbers to objects, quantities or events in
order to give quantitative meanings to such qualities.
Assessment is the process of organizing measurement data into interpretable forms. It
gives evidence of change and the direction of change without value judgment.
Evaluation is the estimation of the worth of a thing, process or programmes in order to
reach meaningful decisions about that thing, process or programme. It calls for evidence
of effectiveness, suitability of goodness of the programme or process.
Evaluation serves a number of purposes in education
Evaluation could be formative or summative. The two serve different purposes in the
classroom.
A number of factors such as sampling techniques, organization, objectivity etc must be
considered for successful evaluation.
Task
Distinguish clearly between measurement, test and student evaluation
References
Ogunniyi, M. B. (1984) Educational Measurement and Evaluation: Longman Nig. Mc.
Ibadan.
In this section, we shall discuss the purpose of assessment and tests in the classroom. You should
pay particular attention here as you may have to construct special types of tests in the latter
sections.
Objectives
At the end of this unit, you should be able to:
i. Give the purpose of assessment;
ii. Explain Bloom’s taxonomy of educational objectives;
iii. Give stages in assessment practice;
iv. Compare old and modern assessment practices;
v. Explain what a test is ;and
vi. State the aims and objectives of classroom tests
Purpose of Assessment
Assessment involves deciding how well students have learnt a given content or how far the
objective we earlier set out has been achieved quantitatively. The data so obtained can serve
various educational functions in the school viz:
(a) Classroom function
This includes
(i) Determination of level of achievement
(ii) effectiveness of the teacher, teaching method, learning situation and
instructional materials
(iii) Motivating the child by showing him his progress i.e. success breeds success.
(iv) It can be used to predict student’s performance in novel situations.
“A mechanism whereby the final grading of a student in the cognitive, affective and
psychomotor domains of behavior takes accounting systematic way, of all his
performances during a given period of schooling. Such an assessment involves the
use of a great variety of models of evaluation for the purpose of finding and
improving the learning and performance of the students.”
It provides useful information about the academic progress of the learner;
(a) Motivation
The effectiveness of efforts to help people learn depends on the learner’s activities and
the achievement those results. Feedback regarding one’s effectiveness is positively
associated with perceived locus of causality, proficiency and intrinsic motivation (Deci,
1980).
When assessment is carried out systematically and in a purposive manner and the
feedback of such is given immediately, it can go a long way in correcting any anomaly
in the teaching-learning continuum. In the past, students often do have last minute
preparation towards final examinations. This neither helps them to have a thorough grasp
of the learning experiences nor does it allow the teacher to apply remedial measures to
the areas of deficiency or improve on his teaching methods.
However, using Continuous Assessment appropriately, students study more frequently
and retain what they study for longer period of time. This generally improves their
learning which goes a long way in motivating them to study further.
Using Continuous Assessment, the teacher will be able to identify these differences and
apply at the appropriate time, the necessary measure to improve not only his teaching
but the learning of the students and hence their performances.
(c) Record-Keeping
Continuous Assessment affords the teacher the opportunity to compile and accumulate
student’s record / performances over a given period of time. Such records are often
essential not only in guidance and counseling but also in diagnosing any problem that
may arise in future.
Task
Which type of test do you think the Kenyan education system support most: is It
continuous assessment tests or one duration (e.g.3-hour) examination that is all in all?
iii. Continuous assessment tests are often based on what has been learnt within a particular
period. Thus, they should be a series of tests.
iv. In Nigerian educational system, continuous assessment tests are part of the scores used
to compute the overall performance of students. In most cases, they are 40% of the final
score. The final examination often carries 60%.
v. Invariably, continuous assessment tests are designed and produced by the classroom
teacher. Some continuous assessment tests are centrally organized for a collection of
schools or for a particular University.
vi. All continuous assessment tests should meet the criteria stated in Units three and five for
a good test: validity, reliability, variety of tests items and procedure, etc.
Task
What are the disadvantages of continuous assessment tests designed and organized by a
classroom teacher?
If you have done Activity II very well, you might have put down the following disadvantages of
continuous assessment tests organized by a classroom teacher. As often reported, continuous
assessment tests have been abused by some dishonest teachers. This is done by:
Making the test extremely cheap so that undeserving students in their school can pass;
Inflating the marks of the continuous assessment tests so that undeserving students can
pass the final examinations and be given certificates not worked for.
Conducting few
(less than appropriate) continuous assessment tests and thus making the process not a
continuous or progressive one;
Reducing the quality of the tests simply because the classes are too large for a teacher to
examine thoroughly;
Exposing such tests to massive examination malpractices, e.g. giving the test to favored
students before-hand, inflating marks, or recording marks for continuous.
Assessment not conducted or splitting one continuous assessment tests core to four or five
to represent separate continuous assessment tests; etc
Indeed all these wrong application of continuous assessment tests make some public
examination bodies to reject scores submitted for candidates in respects of such assessment. For
continuous assessment tests to be credible, the teachers must be:
Honest and
firm;
Be fair and just in their assessments;
Task
Write three questions out on a piece of paper or in your exercise book to reflect a credible
continuous assessment test in your field or subject area.
1. Inadequacy of qualified teachers in the respective fields to cope with the large number of
students in our classroom. Sometime ago, a Minister of Education lamented the
population of students in classrooms in some parts of the country.
2. The pressure to cover a large part of the curricula, probably owing to the demand of
external examinations, often makes teachers concentrate more on teaching than
Continuous Assessment. There is no doubt that such teachings are not likely to be very
effective without any form of formative evaluation.
3. The differences in the quality of tests and scoring procedures used by different teachers
may render the results of Continuous Assessment incomparable.
Benjamin Bloom et al classified all educational objectives into three, namely: cognitive,
affective and psychomotor domains.
2.0.0 Comprehension
2.1.0 Translation
2.2.0 Interpretation
2.3.0 Explanation
3.0 Application
4.0.0 Analysis
4.1.0 Analysis of Elements
4.2.0 Analysis of Relationships
4.3.0 Analysis of Organizational principles
5.0.0 Synthesis
5.1.0 Production of a unique communication.
5.2.0 Production of a plan or proposed set of operations.
6.0.0 Evaluation
6.1.0 Judgment in terms of internal evidence
6.2.0 Judgment in terms of External Criteria.
B. Practice:
i. Give quality instruction
ii. Engage pupils in activities designed to achieve objectives or give them tasks to
perform.
iii. Measure their performance and assess them in relation to set objectives.
C. Use of Outcome::
i. Take note of how effective the teaching has been; feedback to teacher and pupils.
ii. Record the result
iii. Cancel if necessary
iv. Result could lead to guidance and counseling and/or re-teaching.
- Planting of crops/Experiments
However, elements of cognitive behavior are also present in these activities because for one to do
something well, one must know how since the activities are based on knowledge.
These learning outcomes include feelings, beliefs, attitudes, interests, social relationships etc.
which, at times are referred to as personality traits. Some of these that can be assessed indirectly
include:
ii. Respect – tolerance, respect for parents, elders, teachings, constituted authority, peoples’
feelings.
The most appropriate instrument for assessment here is observation. Others like self-reporting
inventories; questionnaires, interviews; rating scales, projective technique and socio-metric
technique may as well be used as the occasion demands. In assessing students’ personality traits,
it is necessary to assume that every student possesses good personality characteristics until the
contrary is proved.
Note that the purpose of assessing students’ personality traits in the school is to give feedback to
the students to help them adjust in the right direction rather than the assignment of grades.
ii. It was unable to cover all that was taught within the period the examination
covered.
iii. Schools depended much on the result to determine the fate of pupils.
vii. Learning and teaching were regarded as separate processes in which, only
learning could be assessed.
viii. It did not reveal students’ weakness early enough to enable teacher to help
students overcome them.
B. Modern Practice
ii. It forms the basis for guidance and counseling in the school.
vi. The attainment of objectives of teaching and learning can be perceived and
confirmed through continuous assessment.
Task
1. What is test?
2. What is assessment?
3. What is evaluation?
Purpose of Tests
Why do we have to test you? At the end of a course, why do examiners conduct tests? Some of
the reasons are:
i. We conduct tests to find out whether the objectives we set for a particular course, lesson
or topic have been achieved or not. Tests measure the performance of a candidate in a
course, lesson, or topic and thus, tell the teacher or course developer that the objectives
of the course or lesson have been achieved or not. If the person taught performed badly,
we may have to take a second look at the objectives of the course of lesson.
ii. We test students in the class to determine the progress made by the students. We want to
know whether or not the students are improving in the course, lesson, or topic. If
progress is made, we reinforce the progress so that the students can learn more. If no
progress is made, we intensify teaching to achieve progress. If progress is slow, we slow
down the speed of our teaching.
iii. We use tests to determine what students have learnt or not learnt in the class. Tests show
the aspects of the course or lesson that the students have learnt. They also show areas
where learning has not taken place. Thus, the teacher can re-teach for more effective
learning.
iv. Tests are used to place students / candidates into a particular class, school, level, or
employment. Such tests are called placement tests. The assumption here is that an
individual who performs creditably well at a level can be moved to another level after
testing. Thus, we use tests to place a pupil into primary two, after he/ she has passed the
test set for primary one, and soon.
v. Tests can reveal the problems or difficulty are as of a learner. Thus, we say we use tests
to diagnose or find out the problems or difficulty is as of a student or pupil.
Test may reveal whether or not a learner, for example, has a problem with pronouncing
a sound, solving a problem involving decimal, or constructing a basic shape, e.g. a
triangle, etc.
vi. Tests are used to predict outcomes. We use tests to predict whether or not a learner will
be able to do a certain job, task, and use language to study in a university or perform
well in a particular school, college, or university. We assume that if Ali can pass this test
or examination, he will be able to go to level 100 of a university and study engineering.
This may not always be the case, though. There are other factors that can make a
student do well other than high performance in a test.
In this section, we will discuss aims and objectives of classroom tests. But before we do this,
what do we mean by classroom tests ? These can be tests designed by the teacher to determine
or monitor the progress of the students or pupils in the classroom. It may also be extended to all
examinations conducted in a classroom situation. Whichever interpretation given, classroom
tests have the following aims and objectives:
ii. Show progress that the learners are making in the class.
iii. Compare the performance of one learner with the other to know how to classify them
either as weak learners who need more attention, average learners, and strong or high
achievers that can be used to assist the weak learners.
vi. For certification –we test in order to certify that a learner has completed the course and
can leave. After such tests or examinations, certificates are issued.
vii. Conduct a research sometimes we conduct class tests for research purposes. We want
to experiment whether a particular method or technique or approach is effective or not.
In this case, we test the students before (pre-test) using the technique. We then teach
using the technique on one group of a comparative level, (i.e. experimental group)
and do not use the technique but another in another group of a
Comparative level, (i.e. control- group). Later on, you compare outcomes (results) on
the experimental and control groups to find out the effectiveness of the technique on the
performance of the experimental group.
Summary
In this unit, we have explained what assessment is and its purpose in education.
Bloom’s cognitive domain was briefly summarized and stages in assessment practice
were discussed. We also compared old and modern assessment practices. Attempt was
also made to define tests, show the purpose of testing, the aims and objectives of
classroom tests. You will agree with me that tests and examinations are “necessary evils”
that cannot be done without. They must still be in our educational system if we want to
know progress made by learners, what has been learnt, what has not been learnt and how
to improve learning and teaching.
References
Bloom, B. S. – (1966) Taxonomy of Educational Objectives, Handbook 1 – Cognitive
Domain,David Mckay Co.Inc.
Macintosh H.G. etal(1976), Assessment and the Secondary School Teacher: Routledge &Kegan
Paul.
Martin Haberman – (1968)“Behavioural Objectives: Band wagon or Breakthrough”
TheJournalofTeacherEducation19,No.1:91–92.
Davies,A. (1984) –Validating Three Testsof English Language Proficiency:
LanguageTestingI(1)50–69
Deci.E.L.(1975)–Intrinsic Motivation. NewYork: PlenumPress.
MitchellR.J.(1972),Measurement in the Classroom: A Work test: Kendall/Hunt Publishing Co.
Introduction
. The unit is based on the premise that there are different kinds of tests that a teacher can use.
There are also various reasons why tests are conducted. The purpose of testing determines the
kind of test. Each test also has its own peculiar characteristics.
Objectives
By the end of this unit, you should be able to:
a. List different kinds of tests;
Types of Tests
Types of tests can be determined from different perspectives. You can look at types of tests in
terms of whether they are discrete or integrative. Discrete point tests are expected to test one
item or skills at a time, while integrative tests combine various items, structures, skills into one
single test.
Task
1. What is a discrete point test?
2. What is an integrative test?
From the words lettered A-D, choose the word that has the same vowel sound as the one
represented by the letters underlined.
Milk
a. quarry
b. exhibit
c. excellent
d. oblique
Of course, the answer is (d.) because it is only the sound/i/ in the word oblique that has the same
sound as “i” in milk.
As you can see in this test, only one item or sound is tested at a time. Such a test is a discrete
point test.
Let’s have another example in English. Fill
in the gap with the correct verb.
John………………..to the market yesterday.
Indeed, only one item can fill the gap at a time. This maybe went, hurried, strolled, etc. The gap
can only be filled with one item.
In mathematics, when a teacher asks the pupil to fill in the blank space with the correct answer,
the teacher is testing a discrete item. For example:
Fill the box with the correct answer of the multiplication stated below:
2*7 =
Only one item and that is‘14’can fill the box. This is a discrete point test. All tests involving fill
in blanks, matching, completion, etc are often discrete point tests.
Integrative Tests
As you have learnt earlier on, tests can be integrative, that is, testing many items together in an
integrative manner. In integrative tests, various items, structures, discourse types, pragmatic
forms, construction types, skills and so on, are tested simultaneously. Popular examples of
integrative tests are essay tests, close tests, reading comprehension tests, working of a
mathematical problem that requires the application of many skills, or construction types that
require different skills and competencies.
A popular integrative test is the close test which deletes a particular nth word. By nth word we
th
mean the fourth, (4 word of a passage), fifth word (5thword of a passage) or any number deleted
in a regular or systematic fashion. For example, I may require you to fill in the words deleted in
this passage.
Firstly, he has to understand the as the speaker says . He must not
stop the in order to look up a or an unfamiliar sentence.
The tests require many skills of the candidate to be able to fill in the gaps. The candidate needs
to be able to read the passage, comprehend it, and think of the appropriate vocabulary items.
That will fill in the blanks; learn the grammatical forms, tense and aspects in which the passage
is written. When you test these many skills at once, you are testing integratively.
Task
Fill in the gaps in the passage given as an example of a close integrative test. What
nth words was deleted in each case throughout the passage?
Task
Which type of test, out of the ones described in this unit are stated below:
i. End of term examination;
ii. Test before the beginning of a course;
iii. Test during the end of a
programme;
iv. School certificate
examination;
Task
Think of what can make a good student to fail a test that a poor student passes with flying
colours.
iv. A good test should combine both discrete point and integrative test procedures for a
fuller representation of teaching-learning points. The test should focus on both discrete
points of the subject area as well as the integrative aspects. A good test should integrate
all various learners’ needs , range of teaching-learning situations, objective and
subjective items
v. A good test must represent teaching – learning objectives and goals: the test should
be conscious of the objectives of learning and objectives of testing . For example, if the
objective of learning is to master a particular skill and apply the skill, testing should be
directed towards the mastery and application of the skill.
Task
List three objectives of testing school certificate students in mathematics. Are
theseobjectivesalwaysfollowedinthe‘O’levelmathematicsexaminations?
vi. Test materials must be properly and systematically selected: the test materials must
be selected in such away that they cover the syllabus, teaching course outlines or the
subject area. The materials should be of mixed difficulty levels (not too easy or too
difficult) which represent the specific targeted learners’ needs that were identified at the
beginning of the course.
vii. Variety is also a characteristic of a good test. This includes a variety of test type:
multiple choice tests, subjective tests and so on.It also includes variety of tasks and so
on.It also includes variety of tasks within each test: writing, reading, speaking, listening,
rewriting, trans coding, solving, organizing and presenting extended information,
interpreting, backfilling, matching, extracting points, distinguishing, identifying,
constructing, producing, designing, etc. In most cases, both the tasks and the materials to
be used in the tests should be real to the life situation of what the learner is being trained
for.
Task
Why do you think variety should be major characteristic for a test?
Cross check your answer with the following. Do not read my own reasons until you have
attempted the activity. Variety in testing is crucial because:
It allows tests to cover a large area;
It makes tests authentic;
Variety brings out the total knowledge of the learner; and
With a variety of tasks, the performance of the learner can be better assessed.
Summary
In this unit, you have studied:
Discrete point tests and integrative tests. Discrete point tests focus on just one item or
skill, concept etc, while integrative tests focus on many items, skills and tasks.
Different types of tests that are determined by the purpose or aim of the test
construction. Some of the tests studied are placement achievement, diagnostic,
aptitude, predictive, standardized and continuous assessment tests.
Task
a. You are requested to construct a good test in your field. Your test must be reliable, valid
and full of a variety of test procedure and test types or
b. Assess a particular test available to you in terms of how good and effective the test is.
What areas of the test that you have assessed, you think improvements are most needed?
Supply the necessary improvements.
c. “Test types are determined by the purpose and aim for which the test hopes to achieve?.
Discuss this statement in the light of tests that are taught in this unit.
Teachers need to know how to construct different kinds of tests. Indeed, tests are not just
designed casually or in a haphazard manner. There are rules and regulations guiding this
activity.
Objectives
By the end of this unit, you should be able to:
a. Recognize how different types of tests are constructed;
b. Determine the basic principles to follow in constructing tests; and
c. Apply these principles in the practical construction of tests.
Defining Objectives
As a competent teacher, you should be able to develop instructional objectives that are
behavioral, precise, and realistic and at an appropriate level of generality that will serve as a
useful guide to teaching and evaluation.
However, when you write your behavioral objectives, use such action verbs like define,
compare, contrast ,draw, explain, describe, classify, summarize, apply, solve, express, state, list
and give. You should avoid vague and global statements involving he use of verbs such as
appreciate, understand, feel, grasp, think etc.
It is important that we state objectives in behavioral terms so as to determine the terminal
behavior of a student after having completed a learning task. Martin Haberman (1964) says the
teacher receives the following benefits by using behavioral objectives:
1. Teacher and students get clear purposes.
2. Broad content is broken down to manageable and meaningful pieces.
3. Organizing content into sequences and hierarchies is facilitated.
4. Evaluation is simplified and becomes self-evident.
5. Selecting of materials is clarified (The result of knowing precisely what youngsters are
to do leads to control in the selection of materials, equipment and the management of
resources generally).
Examples:
Behavioral objectives: To determine whether students are able to define technical terms
by giving their properties, relations or attributes.
Question:
Volt is a unit of
(a) weight (b) force (c) distance (d) work (e) volume
You can also use picture tests to test knowledge of classification and matching tests to
test knowledge of relationships.
(iii) Application
Here you want to test the ability of the students to use principles; rule and
generalizations in solving problems in novel situations, e.g. how would you recover
table salt from water?
(iv) Analysis
This is to analyze or break an idea into its parts and show that the student understands
their relationships.
(v) Synthesis
The student is expected to synthesize or put elements together to form a new matter and
produce a unique communication, plan or set of abstract relations.
(vi) Evaluation
The student is expected to make judgments based upon evidence.
The proportion of test items on each topic depends on the emphasis placed on it during teaching
and the amount of time spent. Also, the proportion of items on each process objectives depends
on how important you view the particular process skill to the level of students to be tested.
However, it is important that you make the test a balanced one in terms of the content and the
process objectives you have been trying to achieve through your series of lessons.
Percentages are usually assigned to the topics of the content and the process objectives such that
each dimension will add up to 100%. (see the table below).
After this, you should decide on the type of test you want to use and this will depend on the
process objective to be measured, the content and your own skill in constructing the different
types of tests.
At this stage, you consider the time available for the test, types of test items to be used (essay or
objective) and other factors like the age, ability level of the students and the type of process
objectives to be measured.
When this decision is made, you then proceed to determine the total number of items for each
topic and process objectives as follows:
(i) To obtain the number of items per topic, you multiply the percentage of each by the total
number of items to be constructed and divide by 100. This you will record in the column
in front of each topic in the extreme right corner of the blueprint. In the table below,
25% was assigned to soil. The total number of items is 50 hence 12 items for the topic
(25%of50 items =12items).
(ii) To obtain the number of items per process objective, we also multiply the percentage of
each by the total number of items for test and divide by 100. These will be recorded in
the bottom row of the blue print under each process objective. In the table below:
(a) The percentage assigned to comprehension is 30% of the total number of items
which is 50. Hence, there will be 15 items for this objective (30% of 50 items).
(2) Choosing what will be covered under each combination of content and process
objectives.
(3) Assigning percentage of the total test by content area and by process objectives
and getting an estimate of the total number of items.
(4) Choosing the type of item format to be used and an estimate of the number of
such items per cell of the test blue print.
Task
Fill in the gaps
Test can be grouped into------------------major groups. These are------------------and---------
The-------------test has alternatives called-----, which usually follow the--------------------of the
question.
Basic Principles for Constructing Multiple-Choice Questions
Multiple-choice questions are said to be objective in two ways. First is that each student has an
equal chance. He/she merely chooses the correct options from the list of alternatives. The
candidates have no opportunity to express a different attitude or special opinion. Secondly, the
judgment and personality of the marker cannot influence the correction in any way. Indeed,
many objective tests are scored by machines. This kind of test may be graded more quickly and
objectively than the subjective or the easy type.
In constructing objective tests, the following basic principles must be borne in mind.
1. The instruction of what the candidate should do must be clear, unambiguous and precise.
Do not confuse the candidates. Let them know whether they are to choose by
ticking(), by circling (o) or shading the box in the answer sheet.
ANSWERSHEET
A B C D Shading the
correct
answer by shading the
1 boxes corresponding to
2
the correct options
3
4
An example of fairly unambiguous instructions are stated below: read the instructions
carefully.
i. Candidates are advised to spend only 45 minutes on each subject and attempt
all questions.
ii. A multiple-choice answer sheet for the four subjects has been provided. Use the
appropriate section of the answer sheet for each subject.
iii. Check that the number of each questions you answer tallies with the number
shaded on your answer sheet.
iv. Use an HB pencil throughout.
Task
Study the instruction above and put on a piece of paper or in your exercise book the
characteristics of the instructions that were presented for the examination.
Cross check your answers with the ones below after you have attempted the activity.
As could be seen in the example just presented, instructions of at test must be:
unambiguous
clear
written in short sentences
numbered in sequence
Underlined or bold- faced to show the most important part of the instruction or call
attention to areas of the instruction that must not be overlooked or forgotten.
1. The options (or alternatives) must be discriminating: some may be obviously wrong but
there must be options that are closely competing with the correct option in terms of
characteristics, related concept or component parts.
If you have done Activity III very well, you will agree with me that option C is
the correct answer and that options A and B are competing. A if the candidate
remembers that the answer should be in the opposite and B, If he/she has
forgotten this fact.
ii. The correct option should not be longer or shorter than the rest, i.e. the in correct
options. Differences in length of options may call the attention of the candidate.
The stem of an item must clearly state the problem. The options should be brief.
iv. Only one option must be correct. Do not set objective tests where two or more
options are correct. You confuse a brilliant student and cause undeserved failure.
v. The objective tests should be based on the syllabus, what is taught, or expected
to be taught. It must provoke deep reasoning, critical thinking, and value
judgments.
vi. Avoid the use of negative statements in the stem of an item. When used, you
should underline the negative word.
vii. Every item should be independent of other items.
viii. Avoid the use of phrases like “all of the above, all of these, none of these or none
of the above”
ix. There adding difficulty and vocabulary level must be as simple as possible.
Task
Answer the questions in this short answer test and bring out the characteristics of the test. Fill
in the gaps with the appropriate words or expressions.
1. In multiple choice tests each student has an----------------.Candidates have no
opportunity to--------a different---------or special--------. But in short answer test,
candidates are allowed to write-----by filling------or writing----------sentences.,
ii. Your Essay Questions Should Be In Layers. The First Layer Tests The Concept, Fact, Its
Definition And Characteristics. The Second Layer Tests The Interpretation Of And
Inferences From The Concept, Fact Or Topic ,Concept, Structure, Etc To Real Life
Situation. In The Third Layer, You May Be Required To Construct, Consolidate,
Design, Or Produce Your Own Structure, Concept, Fact, Scenario Or Issue.
iii. Essays should not merely require registration of facts learnt in the class. They should not
also be satisfied with only the examples given in class.
iv. Some of the words that can be used in an essay type of test are: compare and contrast,
criticize, critically examine, discuss, describe, outline, enumerate, define, state, relate,
illustrate, explain, summarize, construct, produce, design, etc. Remember, some of the
words are mere words that require regurgitation of facts, while others require application
of facts.
Summary
In this unit, you have been exposed to the basic principles for constructing multiple-
choice, short answer and essay types of tests. In all tests, instructions must be clear,
unambiguous, precise, and goal-oriented. All tests must be relevant to what is learnt or
expected to be learnt. They must meet the learning needs and demands of the candidates.
Tests should not be too easy or difficult.
Task
Construct three multiple-choice, short answers and essay-tests each. Use each test constructed
to analyze the basic principles of testing.
Test Contingencies
There are some factors that affect tests, which are referred to as test contingencies.
In this unit, you will learn what is meant by validity and reliability of tests. As you already know
through your study of unit 3, validity and reliability are essential components of a good test.
Objectives
By the end of this unit, you are expected to be able to:
a. Define and illustrate validity and reliability as test and measurement terms;
b. Describe validity and reliability of tests;
and
c. Construct valid and reliable tests.
Test Contingencies
A number of factors can affect the outcome of the test in the classroom. These factors may be
student, teacher, environmental or learning materials related:
Test Validity
Validity of tests means that a test measures what it is supposed to measure or a test is suitable
for the purposes for which it is intended. There are different kinds of validity that you can look
for in a test. Some of these are: content validity, face validity, criterion-referenced validity and
predictive validity.
Content Validity
This validity suggests the degree to which a test adequately and sufficiently measures
the particular skills, subject components, items function or behavior it sets out to
measure. To ensure content validity of a test, the content of what the test is to cover must
be placed side- by- side with the test itself to see correlation or relationship. The test
should reflect aspects that are to be covered in the appropriate order of importance and
in the right quantity.
Task
Take a unit of a course in your subject area. List all the things that are covered in the unit.
Construct a test to cover the unit. In constructing a test, you should list the items covered in the
particular course and make sure the test covers the items in the right quantity.
1. Face Validity
This is a validity that depends on the judgment of the external observer of the test. It is
the degree to which a test appears to measure the knowledge and ability based on the
judgment of the external observer. Usually, face validity entails how clear the
instructions are, how well-structured the items of the test are, how consistent the
numbering, sections and sub-section etc are:
2. Criterion-Referenced Validity
This validity involves specifying the ability domain of the learner and defining the end
points so as to provide absolute scale. In order to achieve this goal, the test that is
constructed is compared or correlated with an outside criterion, measure or judgment. If
the comparison takes place the same time, we call this concurrent validity. For example,
the English test may be compared with the JAMB English test. If the correlation is high,
i.e. r=0.5 and above, we say the English test meets the criterion-referenced, i.e. the
JAMB test, validity. For criterion-referenced validity to satisfy the requirement of
comparability, they must share common scale or characteristics.
3. Predictive Validity
Predictive validity suggests the degree to which a test accurately predicts future
performance. For example, if we assume that a student who does well in a particular
mathematics aptitude test should be able to undergo a physics course successfully;
predictive validity is achieved if the student does well in the course.
Construct Validity
This refers to how accurately a given test actually describes an individual in terms of a stated
psychological trait.
A test designed to test feminist should show women performing better than males in tasks
usually associated with women. If this is not so, then the assumptions on which the test was
constructed are not valid.
Factors Affecting Validity
Cultural beliefs
Attitudes of candidates
- Values–students often relax when much emphasis is not placed on education
- Maturity–students perform poorly when given tasks above their mental age.
Reliability of Tests
If candidates get similar scores on parallel forms of tests, this suggests that test is reliable. This
kind of reliability is called parallel form of reliability or alternate form of reliability. Split-half is
an estimate of reliability based on coefficient of correlation between two halves of a test. It
maybe between odd and even scores or between first and second half of the items of the test. In
order to estimate the reliability of a full test rather than the separate halves, the Spearman-Brown
Formula is applied. Test and re-test scores are correlated. If the correlation referred to as r is
equal to 0.5 and above, the test is said to be of moderate or high correlation, depending on the
value of r along the scale (i.e.0.5 –0.9) –1 is a perfect correlation which is rare.
Task
How can reliability of a test be obtained? Describe two possible ways.
Internal consistency reliability is a measure of the degree to which different examiners or test
raters agree in their evaluation of the candidates’ ability. Inter-rater (two or more different raters
of a test) reliability is said to be high when the degree of agreement between the raters is high or
very close. Intra-rater (one rater rating scripts at different points in time or at different intervals
is the degree to which a marker making a subjective rating of, say, an essay or a procedure or
construction gives the same evaluation on two or more different occasions.
Positive correlations are between 0.00 and +1.00.While negative correlations are
between 0.00 and – 1.00. Correlation at or close to zero shows no reliability;
Correlation between 0.00 and +1.00, some reliability; correlation at +1.00 perfect
reliability.
Some of the procedures for computing correlation coefficient include:
Product – moment correlation method which uses the derivations of students’
(x ) (y
x y
ddy
R= 2 2
(d dy )
(x x) (y y)
2 2
Pearson product–moment Correlation coefficient
N( Xy) ( x)( y)
R=
N( x ) (X) N(Y )
2 2 2
(Y) 2
Where for the two equations
= Sum of
x = a raw score in test A
= the mean score in test
Ay = a raw score in test B
= the mean score in test B
= deviation from the mean.
N = total number of scores.
Cases X Y X- X Y-Y x2 y2 xy
1 13 11 +5.5 +3 30.25 9 +16.5
2 12 14 +4.5 +6 20.25 36 +27.0
3 10 11 +2.5 +3 6.25 9 +7.5
4 10 7 +2.5 -1 6.25 1 -2.5
5 8 9 +0.5 +1 0.25 1 +0.5
6 6 11 -1.5 +3 2.25 9 -4.5
7 6 3 -1.5 -5 2.25 25 +7.5
8 5 7 -2.5 -1 6.5 1 +2.5
9 3 6 -4.5 -2 20.25 4 +9.0
10 2 1 -5.6 -7 30.25 49 +38.5
Sum
75 80 0 0 124.50 144 102.0
Task
(i) Calculate the mean for X, cases and Y cases.
X Y
? ?
N N
Do you get 7.5 and 8.0 respectively? If not check your calculations again.
(ii) Calculate the products (XY) for each of the cases and sum up
XY = ? Use the last column on the table to check your answer. You should
get 102.0.
(iii) Calculate the sum of x2 = (X X) 2
2 2
(iv) Calculate the sum of y = (Y -Y)
Your answers in (iii) and (iv) should be 124.50 and 144 respectively. Check.
(v) Find the square root as follows:
(124.50)(144)
(vi) Then divide your answer in question (ii) by your answer in question (v).
Thus, you have applied the formula for calculating (R), the product-moment correlation between
the two sets of scores (x) and (y).
Task
Try the following formula for the same problem.
N XY ( X)(Y)
(R =
[N X2 (X) 2][NY2 (Y) 2]
In this formula, you do not need to calculate the mean and find deviations. You just work with
pairs of the scores following the steps:-
Task
Try your hand on the ungrouped data below using your calculator.
Cases X Y X2 Y2 XY
1 13 7 169 49 91
2 12 11 144 121 132
3 10 3 100 9 30
4 8 7 64 49 56
5 7 2 49 4 14
6 6 12 36 144 72
7 6 6 36 36 36
8 4 2 16 4 8
9 3 9 9 81 27
10 1 6 1 36 6
Sum
70 65 624 533 472
SUMMARY
In this unit, you have been exposed to the concept of validity and reliability of tests.
Any good test must achieve these two characteristics. A test is said to be valid if it
Measures what it is supposed to measure. A test is reliable if it measures what it is
supposed to measure consistently.
Task
i. Take any test designed either by you or by somebody else and assess the face and
content validity of the test.
ii. Construct a test of three items. Assess the reliability of the test by administering it to
three persons at different points or intervals. Compute the coefficient of correlation of
the test.
INTRODUCTION
The unit will explain how tests are scored and interpreted.
OBJECTIVES
By the end of this unit, you should be able to:
a. Score and interpret tests in general and continuous assessment in particular;
b. Analyze test
items;
c. Compute some measures of general tendency and variability; and
d. Compute Z–score and the Percentile.
SCORING OF TESTS
This section introduces to you the pattern of scoring of tests, be they continuous assessment tests
or other forms of tests. The following guidelines are suggested for scoring of tests:
i. You must remember that multiple choice tests are difficult to design, difficult to
administer, especially in a large class, but easy to score. In some cases, they are scored
by machines. The reasons for easy scorability of multiple-choice tests are because they
usually have one correct answer which must be accepted across the board.
ii. Essay or subject types of tests are relatively easy to set and administer, especially in a
large class. They are, however, difficult to mark or assess. The reason is because easy
questions require a lot of writing of sentences and paragraphs. The examiner must read
all these.
iii. Whether an objective or subjective tests, all tests must have marking schemes.
Marking schemes are the guide for marking any test. They consist of the points,
demands and issues that must be raised before the candidate can be said to have
responded satisfactorily to the test. Marking schemes should be drawn before testing not
after the test has been taken. All marking schemes should carry marks allocation. They
should also indicate scoring points and how the scores are totaled up to represent the
total score for the question or the test.
iv. Scoring or marking on impression is dangerous. Some students are very good at
impressing examiners with flowery language without real academic substance. If you
mark on impression, you may be carried away by the language and not the relevant
facts. Again, mood may change impression; your impression can be changed by joy,
sadness, tiredness, time of the day and so on. That is why you must always insist on a
comprehensive marking scheme.
v. Scoring can be done question-by-question or all questions at a time. The best way is to
score or mark one question across the board for all students. Sometimes, this may be
feasible and tedious, especially in a large class.
vi. Scores can be interpreted into grades, A, B, C, D, E and F. They may be interpreted in
terms of percentages: 10%, 20%, 50% etc. Scores may be presented in a comparative
nd
way in terms of 1stposition,2 position, and3rdposition to the last. Scores can be coded in
what is called BAND. In band system, certain criteria are used to determine those who
will be in Excellent, Very Good categories, etc. An example of a band system is the one
given by the International English Testing Services (IETS) and the one by Teaching
English as a Foreign Language (TOEFL) test.
Find the corrected scores of two candidates. A and B who both scored 35 in an objective test of
50, if A attempted 38 questions while B attempted all the questions.
SA = 35 - ¾ = 34 and SB = 35–15/4 = 31)
Note that under rights only, each of the students gets 35 out of 50.
Item Analysis
Item analysis helps to decide whether a test is good or poor in two ways:
i. It gives information about the difficulty level of a question.
ii. It indicates how well each question shows the difference (discriminate) between the
bright and dull students. In essence, item analysis is used for reviewing and refining a
test.
Difficulty Level
By difficulty level we mean the number of candidates that got a particular item right in any
given test. For example, if in a class of 45 students, 30 of the students got a question correctly,
then the difficulty level is 67% or 0.67. The proportion usually ranges from 0 to 1 or 0 to 100%.
An item with an index of 0 is too difficult hence everybody missed it while that of 1 is too easy
as everybody got it right. Items with index of 0.5 are usually suitable for inclusion in a test.
Though the items with indices of 0 and 1 may not really contribute to an achievement test, they
are good for the teacher in determining how well the students are doing in that particular area of
the content being tested. Hence, such items could be included. However, the mean difficult level
of the whole test should be 0.5 or 50%.
nx100
Usually, the formula for their difficulty is p where
N
P = item difficult
n = the no of students who got the item correct.
N = the number of students involved in the test.
1
However, in the classroom setting, it is better to use the upper of the students that got the
3
1
Item right(U)and the lower Of the students that got it right (L).
3
Hence difficulty level is given by
ULN
1
Where N is the number of students actually involved in the item analysis (upper +lower
3
1/3 of the tests).
1
Consider a class of two arms with a population of 60 each. If 36 candidates of the upper
3
population and 20 of the lower 1/3 got question number 2 correctly, what is the index of
Difficulty (difficulty level) of the question?
Index of difficulty =P=
1 1
N = 40=40( upper lower)
3 3
U = 36
L = 20
(36 20) 56
P = =
80 80
= 0.7
i.e. P= 70%
If P=0 (0%) or 1 (100%) then the test is said to be either too difficulty or too simple
respectively. As much as possible, teachers should avoid administering test items with 0 or 1
difficulty levels.
Item Discrimination
The discrimination index shows how a test item discriminates between the bright and the dull
students. A test with many poor questions will give a false impression of the learning situation.
Usually, a discrimination index of 0. 4 and above are acceptable. Items which discriminate
negatively are bad. This may be because of wrong keys, vagueness or extreme
difficulty. The formula for discrimination index is: UL or UL
1 0.5N
N
2
Where
U = the number of students that got it right in upper group.
L = the number of students that got it right in the lower group.
N = the number of students usually involved in the item analysis.
In summary, to carry out item analysis:
i) Arrange the scored papers in order of merit–highest and lowest
ii) Select the upper 33%
iii) Select the lower 33%)
Note that the number of students in the lower and upper groups must be equal.
iv) Item by item; calculate the number of students that got each item correct in each group.
v) Estimate
U L
(a) Item difficulty =
N
UL
(b) item discrimination index =
1N
2
Task
In the table below, determine the P and D for items 2,3 & 4. Item 1 has been calculated as an
example. Total population of students is 60.
Mode
The mode is the most frequent or popular score in the population. It is not frequently used as the
median and mean in the classroom because it can fall anywhere along the distribution of scores
(top, middle or bottom) and a distribution may have more than one mode (bi-model).
Median
This is the middle score after all the scores have been arranged in order of magnitude i.e. 50% of
the score are on either side of it. Median is very good where there are deviant or extreme scores
in a distribution, however, it does not take the relative size of all the scores into consideration.
Also, it cannot be used for further statistical computations.
The Mean
This is the average of all the scores and it is obtained by adding the scores together and dividing
the sum by the number of scores.
Sum of all Scores
M or = X
Number of Scores
Though, the mean is influenced by deviant scores, it is very important in that it takes into
cognizance the relative size of each score in the distribution.
Task
The mean score is the same as the average score i.e. Sum of all scores/the number of scores.
This is the most common statistical instrument used in our classroom
If in a class of 9, the scores are 29, 85, 78, 73, 40, 35, 20, 10 and 5. Find the mean.
Measures of Variability
Measure of variability indicates the spread of the scores. The usual measures of variability are
Range, Quartile Deviation and Standard Deviation.
Range
The range is usually taken as the difference between the highest and the lowest scores in a set of
distribution. It is completely dependent on the extreme scores and may give a wrong picture of
the variability in the distribution. It is the simplest measure of variability.Example:7, 2, 5, 4, 6,
3, 1, 2, 4, 7, 9, 8, 10. Lowest score =1, Highest=10. Range=
10-1=9
Quartile Deviation
Quartiles are points on the distribution which divide it into “quartiles”, thus, we have1st, 2nd and
3rd quartiles.
Inter-quartile range is the difference between Q3 and Q1 i.e. Q3- Q1. This is often used than the
Range as it cuts off the extreme score. Semi inter-quartile Range is thus half of inter-quartile
range.
This is also known as the semi-inter quartile range. It is half the difference between the upper
quartile (Q3) and the lower quartile (Q1) of the set of scores.
Q3Q1
2
Where Q3 = P 75 = point in the distribution below which lie75% of the scores.
Q1=P25=Point in the frequency distribution below which lies 25% of the scores.
In cases where there are many deviant scores, the quartile deviation is the best measure of
variability.
Standard Deviation
This is the square root of the mean of the squared deviations. The mean of the squared
deviations is called the variance (S2).The deviation is the difference between each score and the
mean.
x 2
SD () =
N
x = X - X -deviation of each score from the mean
N = Number of scores.
The SD is the most reliable of all measures of variability and lends itself for use in other
statistical calculations.
Deviation is the difference between each score (X) and the mean (M). To calculate the standard
deviation:
(i) Find the mean (m)
(ii) Find the deviation (x-m) and square each.
(iii) Sum up the squares and divide by the number of the population (N)
(iv) Find the positive square root.
Deviations
Squareddeviation(
Students Marks obtained X–m
X– m)2 = x2
Takem=54
A 68 14 196
B 58 4 16
C 47 -7 49
D 45 -9 81
E 54 0 0
F 50 4 16
G 62 8 64
H 59 5 25
I 48 -6 36
J 52 -2 4
487
N = 10
x 2 (X M)
2
487
SD () = = 6.97
N N 10
Task
Find the mean and standard deviation for the following marks.
20, 45.39, 40,42,48,30,46 and 41.
Derived Scores
In practice, we report on our students after examinations by adding together their scores in the
various subjects and there after calculate the average or percentage as the case may be. This
does not give a fair and reliable assessment. Instead of using raw scores, it is better to use
derived scores”. A derived score usually expresses every raw score in terms of other raw score
on the test. The commonly used ones in the classroom are the Z- Scores, T-Score and
Percentiles.
T-Score
This is another derived score often used in conjunction with the Z- score. It is defined by the
equation.
T = 50+10Z
Where z is the standard score.
It is also used in the same way as the Z- score except that the negative signs are eliminated in T-
Scores.
Student Score in English Score in Math Total Rank
A 68 20 88 (8)
B 58 45 103 (1)
C 47 39 85 (9)
D 45 40 85 (10)
E 54 42 96 (3)
F 50 48 98 (2)
G 62 30 92 (7)
H 59 36 95 (4)
I 48 46 94 (5)
J 52 41 93 (6)
Consider the maximum scores obtained in English and Mathematics in the table above. We
cannot easily guarantee which of the subject was more tasking and in which the examiner was
more generous. Hence, for justice and fair play, it is advisable to convert the scores in the two
subjects into common scores (Standard scores) before they are ranked. Z–and T–score are often
used.
The Z–score is given by
Raw Score Mean X M
Z - Score =
Standard deviation SD
Task
Calculate the Z- and T-scores for students A, B, C and D in the table above.
Percentile
This expresses a given score in terms of the percentage scores below it i.e. in a class of 30,
Ibrahim scored 60 and there are 24 pupils scoring below him.The percentage of score below 60
is therefore:
24 100
80%
30 1
Ibrahim therefore has a percentile of 80 written P80. This means Ibrahim surpassed 80% of his
colleagues while only 20% were better than him. The formula for the percentile rank is given
by:
100 F
PR = (b ) where
N 2
PR = Percentile rank of a given score
b = Number of scores below the score
F = Frequency of the score
N = Number of all scores in the test.
Credit Units
Courses are often weighed according to their credit units in the course credit system. Credit
units of courses often range from 1 to 4.This is calculated according to the number of contact
hours as follows:
1creditunit = 15 hours of teaching.
2creditunits = 15x2 or 30 hours
3creditunits = 15x3 or 45 hours
4creditsunits = 15x4 or 60 hours
Number of hours spent on practical are usually taken into consideration in calculating credit
loads.
Total WGP
=
Total Credit Units registered
(The scores and their letter grading may vary from programme to programme or Institution to
Institution)
For example, a score of 65 marks has a GP of 4 and a Weighted Grade Point of 4x3 if the mark
was scored in a 3 unit course. The WGP is therefore 12. If there are five of such courses with
course units 4, 3, 2, 2 and 1 respectively. The Grade Point Average is the sum of the five
weighted Grade Points divided by the total number of credit units i.e. (4+3+2+2+1)
Task
Below is a sample of an examination transcript for a student
a. Determine for each course the
(i) GP and
(ii) WGP.
b. Find the GPA.
TOTAL
NOTE:
WGD
GPA =
Total Credit taken
Summary
In this unit, we have discussed the basic principles guiding scoring of tests and test
interpretations.
The use of frequency distribution, mean, mode and mean in interpreting test scores were
also explained.
The methods by which test results can be interpreted to be meaningful for classroom
practices were also vividly illustrated.
Task
1. State the various types of Tests and explain what each measure is?
2. Pick a topic of your choice and prepare a blue-print table for 25 objective items.
3. Explain why:
(a) We use percentile to describe student’s performance and
(b) Z-scores to describe in a distribution.
4. Give four factors each that can affect the reliability and validity of a test.
5. Use the criteria and basic principles for constructing continuous assessment tests
discussed in this unit to develop a 1hour continuous assessment test in your subject area.
By citing specific examples from the test you have constructed, show how you have used
the testing concepts learnt to construct the test. You should bring out from your test at
least ten testing concepts used in the construction of the test.
References
Bloom, B. S. – (1966) Taxonomy of Educational Objectives, Handbook 1 – Cognitive
Domain,David Mckay Co.Inc.
Federal Ministry of Education Science and Technology: (1985) A hand book on Continuous
Assessment.
Macintosh H.G. etal(1976), Assessment and the Secondary School Teacher: Routledge &Kegan
Paul.
Martin Haberman – (1968)“Behavioural Objectives: Band wagon or Breakthrough”
TheJournalofTeacherEducation19,No.1:91–92.